The MP3.com Rescue Barge Barge

So back in November, I grabbed 1.78TB of media from the Internet Archive’s mp3.com Rescue Barge, and their Wayback Machine both, in pursuit of creating the most thoroughly complete archive I can muster of MP3.com’s music.

The Rescue Barge is a set of collections on Archive.org’s own site, up for grabs in a cool 960.6 GB. But I had a massive text file list of download.mp3.com links sitting around from Reddit from my previous Random MP3.com Playlist project that, while not fully archived by the Wayback Machine with lots of 404 urls, carried more audio even than represented in the Archive’s own barge! Realizing this drove me to grab all I could of both of them.

The Internet Archive’s legal status then in question in the news following a series of lawsuits from book publishers and music labels (this is kind of old news, but this project was cooking a while), I felt a sense of responsibility to make sure all the MP3.com downloads I could find on their servers was safe somewhere from the barge and beyond. Thus forming what I like to call, the MP3.com Rescue Barge.. barge – and subsequently, the metadata set I set to create from it with artist, track names and other information that you’ll be able to sift through at home in a nifty CSV set and spreadsheet.

(The resultant set all uses Archive.org URLs, so it’s not exactly resistant to if Internet Archive went down. This means I now have a copy on my own, though, so a bit of redundancy should the need arise, woot!)

Grabbing the audio

I wasn’t sure how much storage I needed at first. I started just on the terabyte data drive in my previous desktop, but I soon run up into a wall downloading a chunk of the Wayback Machine links. Fortunately, one of my friends in college let me use a 1.5 TB drive to finish the job, and that helped.

Though, that wasn’t big enough for the whole thing, so for a while, the Barge barge was split across an internal drive, and an external one. Fortune struck again when I caught wind of a 3 TB drive over at Free Geek, with a light amount of uptime hours, to boot! I swapped in the new drive on the tower I was working with this on, which was just enough for all of my data already on the terabyte drive, and both grabs.

With it all in one place, the obstacle for a while was, well, now what? The scripts I wrote to grab everything, admittedly, were a bit of a bodge. In hopes of preserving the source URLs of all the mp3s I let wget make a folder for each one it grabbed, making searching for anything a somewhat… inefficient affair. This was fortunately remedied by voidtools’ Everything, a tool I use every day for quickly indexing and finding files on disk, but I hoped to make a database that included all the mp3 tags, and maybe even mp3guessenc results in the future, for important research… How could I do that easily?

Compiling the data

Enter WACUP! Well, just Winamp 5, technically. It’s a music player that, turns out, handles extremely large music libraries quite well. The first time I tried this, under the x86 version, after I wanna say an hour, but maybe less, it did crash at 250k songs and corrupt the database while I tried to look up a song. Remember that this is while loading in and indexing all the tags!

I had to clear it under the same version before trying it again at 64-bit, but once all was said and done, I really envy the UI prowess the Nullsoft guys had, in terms of actually browsing the library. With this many songs, it still felt smooth as silk! I had just tried the same thing with a MAGIX MP3 Deluxe trial, and I felt bad offering the poor thing such a Sisyphean task. (I’ll admit that I’m assuming that the underlying Media Library plugin in WACUP is the same as originally from Nullsoft. I could be wrong, but I am very fond of what the project is doing either way to support the old player.)

Once that was all in WACUP, I realized you could pick and choose what fields you wanted it to display, including paths; and furthermore, that there was not only an m3u8 export option with just file paths, but a CSV one that included all the tags and information I wanted for my database! The resulting file was 137 MB, though NTFS compresses it down to 60.1 MB.

Look at the CSV format, and the player itself below:

A big motivator for this project was to index the tags of all of these loose URLs I had of mp3.com downloads, and possibly combine it with already existing mirrors of the website as it was for an easily browsable set – a possible future project, integrating this with archives of the artist pages themselves.

And now I had all of them for 533,046 songs! (In files I was able to download, anyway; a few of them were corrupted or non-audio, but because they returned a 200 and weren’t 404ed entirely, I left them in.)

Cleanup

So this is somewhat useful to people other than me, the next step after exporting the library in WACUP was to change the local paths to their appropriate source URLs. There are more complex ways I could have done this with scripting, but I just whipped out Excel after exporting the CSV.

In the process, I noticed that one of the MP3s slyly stuck a newline in the comment field, messing up the flow of the file a little (many programs that support CSV, including Excel, count a newline as the end of a record). Usually, this is just reserved for the username or URL to the author’s artist page, so I’m not too sure how that happened. Curious.

In any case, accounting for that and running a few find and replaces on the file path column to convert the format from what it is locally to where the source URL was completes the data set. (I realized I also had to be careful for Excel not to try to evaluate some of the artist names and song titles as math expressions.)

That all taken care of, this spreadsheet (and CSV I re-exported) contains a lot of useful data. Track names, artist names, and for a great deal of them, even the year and the mp3.com URL from which the mp3 was submitted. With some exceptions that mostly stemming from the Barge downloads (a few others are my error), most carry the modified date as they were on the server. There are duplicate rows as well to cover both the Barge upload and the web.archive.org download.mp3.com mirror.

Try it!!

It is not the most perfectly machine readable CSV (I think I could go further with UNIX times or something), and I think in the future I’d like to write a script that’ll make a new column for mp3 encoder identification with mp3guessenc, but for now I’m pleased with having it done and out there. For purposes of looking up an artist or title and seeing what’s out there it should work pretty well.

By the time this post has published, I’ll have updated my site with it in CSV and Excel format on the MP3.com Preservation page, so download and enjoy. I have an abbreviated history and how to use it there.

About dotcomboom

Old technology enthusiast and solo software developer who somehow reinvented Jekyll from first principles with AutoSite. Windows Forms enjoyer and language acquisition fanatic. Last seen watching lots of intermediate-level Spanish content and equally dutifully training to become a competitive Bejeweled 2 player.
This entry was posted in Projects and tagged , , . Bookmark the permalink.

7 Responses to The MP3.com Rescue Barge Barge

  1. mariteaux says:

    Doing God’s work here, lad. Hugely impressive! Honestly could not imagine how I’d tackle this project, but this is huge. Once I’m out from under all the CDs I’ve bought the past two years, I will be doing more MP3.com digging, including on more of those CD-ROMs…

    • dotcomboom says:

      Thank yous! Was a while coming so I’m really happy to have something dusted after having it all downloaded. Pretty fun coming back to it finally, really. One of those things. Godspeed man, always appreciate your excursions. Loong live MP3.com!

  2. Spenk says:

    Sadly not complete. My songs aren’t there nor are my friends’ songs. Great effort though! Good to see many others are there.

    • mariteaux says:

      No, the MP3.com grab wasn’t complete in the first place. Just the nature of things. Frankly, I’m impressed we ended up with any of it. For a long time, MP3.com seemed lost as a whole outside of the sampler CDs.

  3. aerozol says:

    In case you haven’t seen it yet, this is also a great resource in conjunction with the audio dump from IA: http://mp3-2003.computer-legacy.com/

    TLDR: This website contains a static copy of the MP3.com website as it existed during Thanksgiving November 2003.

  4. Paul says:

    I don’t know what would be the best choice now but back in the glorious days of WhatCD we folks with huge music library used to rely on FooBar2k for managing huge libraries, and Winamp would come second only. I remember managing a 2tb music library with it on a prehistoric computer.
    Maybe it’s worth a try for you too !

  5. aditya k says:

    25 or so years ago, I found an mp3.com CD on a used book shop in Indore, India, where I was living as a student. One artist stood out in particular and I listened to the three songs of that artist for a few years before I lost those tracks. I was hoping to find those songs here but I could not.

    However, searching through the CSV, made me think so hard of one of those songs and I suddenly remembered a few lines and I googled it. I could not find it still but then there was this one reddit thread where someone had been looking for that exact song and I found it there. That artist was Dana Mase and the song was “Dandelions”. There were two more songs of hers and I found them too, however, they are attributed to an album that was released in 2004, which was much before I had found them on the mp3 dot com CD (which I found in that used book store, in 2001).

    You’ve done some incredible work. This is what makes the internet beautiful.

Leave a Reply to Paul Cancel reply

Your email address will not be published. Required fields are marked *