[mdlug] Evil URLs at OpenSuSE

Jeremy Bowers jerf at jerf.org
Sat Feb 24 17:05:20 EST 2007


Raymond McLaughlin wrote:
> This is the url of a directory containing about 50 files, all of which I
> wanted to download. I gave this url to "wget -r" and then the fun began.
> In stead of just pulling the directory and its 50 or so files wget went
> crazy downloading hundreds of files from a half dozen servers.
>   
I'm surprised nobody has said anything about this yet.

wget -r almost never does what you want. It simply retrieves pages 
recursively. It will go to other servers, it'll go up and down directory 
structures, and in any decently linked web page, will most likely try to 
download the entire web. With six clicks or so from that page I got to 
en.opensuse.org, from where I could probably get to the net at large, so 
wget -r would take a very long time...

You need to tune the recursion parameters. Pop open "man wget" and 
search for "Recursive Accept/Reject Options" (or some appropriate 
substring).

Your usage case is simple, and "-np" or "--no-parent" is what you want; 
that will prevent the recursion from following the "parent directory" 
link, which is almost certainly how you got in trouble. You'll still end 
up downloading the index pages sorted by the four columns at the top of 
the page, but those aren't worth worrying about.





More information about the mdlug mailing list