How to Download a Whole Collection from Internet Archive
A collection on archive.org is a curated group of items — books, recordings, software, videos — organized under a single page. The Prelinger Archives, the Grateful Dead collection, the NASA Images collection, and thousands of institutional sets are all structured this way. When you need everything in a collection rather than a single item, the approach changes. Clicking through hundreds of individual item pages is not realistic. You need a bulk method.
Understanding collections
Collections are hierarchical. The Internet Archive organizes content into top-level media types — texts, audio, movies, software, images, data, and web — and collections sit underneath them. A collection can contain subcollections, so you may need to decide whether you want the top-level collection only or its children too. Every collection has an identifier, which is the last segment of its URL. For example, the Prelinger Archives live at archive.org/details/prelinger, so the identifier is prelinger. You will need this identifier for every bulk method below.
To find a collection's identifier, navigate to its page on archive.org and look at the URL. You can also browse collections by media type from the Archive's homepage, or use the Advanced Search page at archive.org/advancedsearch.php to filter by collection name.
Method 1: The ia command line tool
The most reliable way to download an entire collection is the ia command line tool, a Python utility built specifically for archive.org. Install it with pip install internetarchive, then configure it with ia configure if you need authentication. To download every item in a collection, run: ia download --search 'collection:prelinger' — replacing prelinger with your collection's identifier. This will create one folder per item and download all files in each. For large collections, add --checksum to enable resumability and --log to keep a record of what was fetched. If a download fails partway through, re-running the same command with --checksum will skip files you already have. For a full guide to the ia tool, see How to Use the IA Command Line Tool.
To speed things up, you can pipe the item list into GNU Parallel: ia search 'collection:prelinger' --itemlist | parallel 'ia download {}' — this downloads multiple items concurrently. Be reasonable with parallelism; four to eight simultaneous downloads is usually a good balance between speed and not overwhelming the servers.
Method 2: wget
If you prefer wget, start by generating an item list. Use the ia tool (ia search 'collection:prelinger' --itemlist > itemlist.txt) or query the Advanced Search API directly. Then run: wget -r -H -nc -np -nH --cut-dirs=1 -e robots=off -l1 -i ./itemlist.txt -B 'http://archive.org/download/' — this walks each item's download directory and pulls every file. The -nc flag prevents re-downloading files you already have. For collections over roughly 10,000 items, wget can time out or become unwieldy. In those cases, the ia tool with parallel processing is more practical.
Limits and practical advice
Large collections can be enormous. Before you start, get a rough sense of the size: check the collection page for an item count, and spot-check a few items to gauge average file sizes. A collection of 5,000 scanned books at 200 MB each is a terabyte. Make sure you have the disk space and the patience — some collections take days to download even on a fast connection.
Internet Archive is a nonprofit running on limited infrastructure. Downloads are fastest during off-peak hours, roughly 2 to 6 AM US Pacific time. If your downloads are slow, see Why Is Internet Archive Download So Slow? for workarounds. For very popular collections, torrents may also help — see How to Use Internet Archive Torrent Downloads.
Arkibber helps on the discovery side: before committing to a full collection download, you can search and filter its contents to see whether the collection actually has what you need. This saves you from downloading thousands of items only to discover the material you wanted was in a different collection entirely.
If you only need the simplest download method for a single item, start with How to Download from Internet Archive.