[mdlug] Evil URLs at OpenSuSE
Jeremy Bowers
jerf at jerf.org
Sat Feb 24 17:05:20 EST 2007
Raymond McLaughlin wrote:
> This is the url of a directory containing about 50 files, all of which I
> wanted to download. I gave this url to "wget -r" and then the fun began.
> In stead of just pulling the directory and its 50 or so files wget went
> crazy downloading hundreds of files from a half dozen servers.
>
I'm surprised nobody has said anything about this yet.
wget -r almost never does what you want. It simply retrieves pages
recursively. It will go to other servers, it'll go up and down directory
structures, and in any decently linked web page, will most likely try to
download the entire web. With six clicks or so from that page I got to
en.opensuse.org, from where I could probably get to the net at large, so
wget -r would take a very long time...
You need to tune the recursion parameters. Pop open "man wget" and
search for "Recursive Accept/Reject Options" (or some appropriate
substring).
Your usage case is simple, and "-np" or "--no-parent" is what you want;
that will prevent the recursion from following the "parent directory"
link, which is almost certainly how you got in trouble. You'll still end
up downloading the index pages sorted by the four columns at the top of
the page, but those aren't worth worrying about.
More information about the mdlug
mailing list