How to Find the Original File on Internet Archive

May 9, 2026

When the Archive ingests a file, it generates several derivative copies for streaming and convenience. A scanned book becomes a PDF, an EPUB, a DjVu, plain text, single-page JP2s. A FLAC concert recording becomes 64kbps and VBR MP3s, an OGG, sometimes a spectrogram. The "original" is the file as the uploader sent it.

Spotting originals on the item page

Append ?show=files to the item URL, or click SHOW ALL on the item page. You will get a full file table with a Source column that marks each file as original, derivative, or metadata. The original files are the ones that were actually uploaded; everything else was generated from them.

Pulling only originals from the CLI

Use ia download IDENTIFIER --source=original to skip all derived files and grab only what was uploaded.

A programmatic view

For scripting, fetch the files XML at https://archive.org/download/IDENTIFIER/IDENTIFIER_files.xml. Each file block has a source attribute. Originals have source="original". The XML also includes file size, format, MD5/SHA1 hashes, and (for media) bitrate and duration — useful for picking the right derivative when you do not want the original's full size.

A nuance worth knowing

Some items have no original file marked at all because the uploader sent only what is now considered a derivative — an MP3, say, with no FLAC behind it. In that case the highest-quality file is whatever the uploader provided, and there is no lossless source to recover.

Arkibber surfaces clean metadata for Internet Archive items, making it easier to understand what formats and sources are available before you navigate to the item page to download. This is especially helpful when you are evaluating many items and need to quickly assess which ones have high-quality originals.

How to Find the Original File on Internet Archive

Spotting originals on the item page

Pulling only originals from the CLI

A programmatic view

A nuance worth knowing

Related posts