Data Grooming – Case Study
Data Grooming is a key element of delivering a high quality digital music library that’s a joy to use. So after we rip each CD we apply our magic on the metadata, the key terms used to show artist, track name, album name, composer etc.
One of the most critical data items is the composer name, all the more important if you have a large number of classical albums. This was the case with a recent client as this screen shot shows – you might need to click on the image to see it clearly).
We ripped 212 albums – the majority (201) in the Classical genre. As you see there are 216 named composers. Actually there are typically far fewer composers than the “Composer” field suggests. This is because the normal music databases don’t standardise on one format of name. You get a vast permutation of first name / last name; last / first; initials / last; last / initials. Then are are the helpful educators who add things like the composer’s dates, or being in doubt about the best name format type in every permutation.
Our approach is to standardise on using just the composer’s surname. So W Mozart, W A Mozart, Mozart W, Mozart WA, Wolfgang Mozart and so on all become simply “Mozart”. This naming convention is applied across all CDs, whoever the source of data for each album, and we use our own music database of principal composers. Once we’ve applied our Data Grooming process this is what we reduced the library to.
You’ll see the previous 216 composers has been reduced to just 111 entries. Each composer appears just once, so if you’re looking for Haydn, Bach, Beethoven or Brahms all that composer’s music will appear against that (single) name. We believe this goes a very long way to ensure your music library is simplified, straightforward and a joy to use.
Even after we’ve “removed” 105 composer entries there’s some more tidying to be done. As you’ll see from the before and after images above, we needed to tidy up the genres – Books & Spoken and Spoken & Audio need to be looked into.
Thanks to the kind input of an eagle-eyed reader the error in the original post has been corrected and the “before” and “after” images are in the right places.
Data Grooming – creating an elegant digital music library from ripped CDs.