Digital Tinker

 
 
 
  • About

    Some days you get the data, some days the data gets you …
 
Data Cleanser Humble Origins August 28th, 2008

Believe it or not, I started out in the sub-cellar of my old personal computer, cleaning up MP3 files that I had either downloaded or ripped from my CD collection. In the olden days, there was a clever website called CDDB.com (now gracenote.com) that had a database of CDs based on the number of tracks and the length of time for each track. MusicMatch Jukebox, the software I was using then, would connect to CDDB and provide the information from a newly ripped CD. If CDDB found a match, it would return all of the data for the CD!

Once I learned the method that CDDB was using to store its information, I marveled at the elegance of the service.
First of all, this was very much like giving a CD its own fingerprint, since it is highly unlikely that any two CDs would have the exact same number of tracks, with the exact minutes and seconds for each track in the exact same order! Indeed, the only time CDDB returned multiple matches was when the CD was rereleased. (That was just my experience, though. I’m sure there were false positives.)

Secondly, the user community that updated the database was very altruistic. As far as I know, the only interaction was through MusicMatch Jukebox: when a CD match was not found, I would laboriously type in the song titles and submit them to the CDDB database.


read more about CDDB on wikipedia.

During all of this activity, I learned about the extra data that gets stored along with the actual music in an MP3:

  • ID3v1 - the original MP3 tag stored a few items, such as title, artist, album, track number and genre
  • ID3v2 - an enhanced version allows more space for the ID3v1 items and has even more items, such as lyrics, cover art

As you might expect (or know from experience), downloaded music did not always have the correct tag information.
One of the benefits of using MusicMatch Jukebox to store my music was the ability to create a library and sort my music by ID3 tags.
Naturally, selecting a genre is subjective and I spent a lot of time changing ID3 tags.
MusicMatch Jukebox had a primitive ID3 tagging function that relied on the physical file name of the MP3. This became quite a chore, as I could never decide how best to name the files.

My first data cleansing project, therefore, was to buy a program called Dr.Tag. Since it was a dedicated application, it did a better job than the MusicMatch Jukebox software. After learning the nuances of renaming physical files based on ID3 tag information versus updating ID3 tags based on the file name, I was able to clean up my music library in a few hours.

Posted in Dirty Data ||

9 Responses to “Data Cleanser Humble Origins”

  1. organicsyes Says:

    Wow! A whole lotta cleanin’ going on! This is good news for me…now, to get the motivation to actually do it!

    Susan

  2. Teddy Towncrier Says:

    Mitch. … I just couldn’t resist to jump in and congratulate you on your new blog.

    It looks absolutely fabulous.

    Best wishes for much success and I’m looking forward to many opportunities to quote you.

    Best regards.

    Teddy.

  3. digitaltinker Says:

    Hi Susan!
    It helps to play some music while you “clean”!
    Dr.Tag is all grown up now and I don’t know if that’s a good thing …
    Thanks for stopping by.

    Hi Teddy!
    Thank you very much!
    I loved this theme the moment I saw it.
    Thanks for “jumping” in!

    Cheers,

    Mitch

  4. Blake Raab Says:

    I like the look of your new blog. Simple and clean. Does this mean I should drop your other one linked on my blog?

    BTW, your “social media” buttons below the post appear to be in Italian.

  5. digitaltinker Says:

    Hi Blake,

    This is an additional blog, more focused on one of my business services.
    I’d like to keep my current blog on your social wall, as that reflects the more social side of things.

    I passed the button info along to the webmaster :)

    Cheers,

    Mitch

  6. Bobbi Jo Woods Says:

    Hi Mitch

    I like your new site and your blog! Welcome to the MicroWebblogs community!

    I’m in the network too! http://www.microwebblogs.com/bwoodsdesign

    I used to use MusicMatch Jukebox for a long time. It came free with one of my old mp3 players. I haven’t downloaded or listened to an mp3 in a long, long time. Just shows you how busy I am!

    Anyway… keep up the good work!

  7. digitaltinker Says:

    Hi Bobbi,

    I just came back from visiting your excellent blog and leaving my two cents :)
    Thank you for the warm welcome!

    All that work I did on my MP3s has paid huge dividends: I love to listen to music while designing programs. I haven’t downloaded music in ages, but I have over 3,000 tunes to keep me happy.

    Cheers,

    Mitch

  8. Bobbi Jo Woods Says:

    You mean to tell me that your first data cleansing project wasn’t to HIRE YOURSELF? LOL

    I heard good things about Dr. Tag too. Glad it helped you organize your tunes better. Me? If I had that many on my machine, it would slow to a crawl. I’m seriously due for an upgrade! I keep things to the minimum. Photoshop, FTP client, browser, files. That’s about it. Oh, and Dreamweaver, too, but I mostly use Notepad or Crimson Editor for coding when needed.

    Keep up the good blog… we want more!

  9. digitaltinker Says:

    Much obliged, Bobbi Jo!

    See the latest story about my own coding techniques.

    Cheers,

    Mitch