for the sake of argument
that you had a giant list
of partial URLs
(you know, like “www.example.com/blurf”)
and you needed to canonicalize them
and chase redirects
and remove duplicates
and dead sites
and further you were aware
that this is much harder than it might sound
not to mention
that many websites do not like urllib
you might be looking for this program
which was written by me
with a little help from serge broslavsky.
Comments are closed.