See linked posting. I’ve commented there with a link to a CLI tool in Python that allows downloading of IA collections. I’ve submitted a patch to enable specifying start and end points so that it’s easier to resume downloading a huge collection, or to allow multiple people to split up the work.

https://archive.org/details/georgeblood

https://archive.org/details/78rpm_bowling_green

F*ck the RIAA and absurdly long copyright.


EDIT: There is more than one collection of 78s on IA, so I updated the title.


The issue with these collections are that they’re absolutely HUGE. And yes, IA offers torrents for them, but as a separate torrent for every. single. album. And the torrents have all data in them – FLAC, fixed-rate MP3, VBR MP3, PDF liner notes, etc. etc… there may be some extremely hardcore data-hoarders out there who want everything, but IMHO as these are scratchy old 78 records, FLAC is overkill to just save the audio in a listenable format. The George Blood collection, just the VBR MP3s, is looking to be about 6TB. With ALL data it might be over 40TB! I can’t afford that many hard drives :)


So, my approach at the moment is to save just the VBR MP3s (they seem to be done at up to 320kbps VBR) and the JPEG album cover. If I have a chance and any storage left afterwards, I can make a separate pass to get the album liner PDFs…


Tool used: https://github.com/jjjake/internetarchive


Patch to allow setting start and end item indices for downloads: https://github.com/jjjake/internetarchive/pull/605


Example usage to grab just the VBR MP3 and record label JPG for each (note the --start-idx and --end-idx arguments):

#ia download --start-idx=4001 --end-idx=8000 -a -i --format="VBR MP3" --format="JPEG" --search collection:georgeblood

I’m going to concentrate on the George Blood collection for now… I’m starting at item 1. It would be great if others started at index 50,000, 100,000, 150,000, … and others started at the end and worked backwards in similarly-sized chunks, so that it’s assured someone gets each of them.

  • ArghblargOP
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    1
    ·
    1 year ago

    I wish the IA would offer a torrents of the overall collection but it’s over 400k separate torrents, one for each album. And they contain FLACs, fixed- and VBR MP3s, PDF jacket notes, JPGs … it’s just too much for one person (I am OK with buying an 8TB drive or two, but not a dozen!)

    I’m trying to at least grab the VBR MP3s (these are old scratchy records after all… I don’t know how much FLAC will really preserve). Maybe if I can get most of those, I’ll do a second pass and get the album cover JPGs, then liner PDFs… depending on if/how long the collection stays up.