Music Metadata: The Compilation Problem

The free metadata often has quality problems. That's annoying, but you can do a quick manual cleanup before saving. Sure, that's a pain, but it's a lot better than entering it all from scratch.

Even with cleaned up information, I encountered a couple of thorny metadata problems. I'll describe one of them—and explain how I solved it. Tomorrow I'll talk about a problem I haven't solved yet.

I have two requirements for my music metadata:

  • They must be consistent, and should reasonably accurate.
  • They need to be contained completely in the music file.

On the first point, you may be surprised to hear that I care more about consistent than accurate. That's because so long as things are consistent, they will be filed together, and that's what's important to me. Of course, the little anal organizer gerbil that runs around my brain will force me to fix it up eventually, but I'd be happy to start from a place where everything is consistent.

The second point is a subtler one. I want all the metadata stored in MP3 tags or FLAC comments. That's the most obvious place to store metadata, but not the only place. Another common place is the filename itself. I don't like that—filenames are relatively fragile. For instance, the external hard drive that holds my MP3s has a directory called "R.E.M" without a trailing period, because the Microsoft VFAT filesystem won't allow that. I think that if a song filename becomes scrambled, I should be able to recreate the filename just by looking at the metainformation in the file.

So, let's see how my filenames map to the metadata. Here is the naming scheme I want to use:

Artist/Year - Album_Name/Track_Num - Song_Name

As I noted in my previous entry, here is the metainformation I'm storing in each song file:

  • Album Name
  • Artist Name
  • Song Name
  • Album Year
  • Track Number

You may think that metainformation would be sufficient to generate a song filename, but it isn't.

The problem comes in with compilation albums, like this Warren Zevon tribute album. When I ripped the first song, Searching for a Heart performed by Don Henley, it got saved to a file:

Various Artists/2004 - Enjoy Every Sandwich/01 - Searching for a Heart.mp3

If, however, I was generating the filename strictly from the metadata, I would have saved it to:

Don Henley/2004 - Enjoy Every Sandwich/01 - Searching for a Heart.mp3

I discovered this problem when I transcoded my music files from FLAC to MP3 format. The transcoding script named the files by the convention I just described. As a result, compilation albums ended up being scattered all over the disk, with each track filed under its artist.

You cannot look at the metainformation I listed above and determine that a track should be filed as a compilation instead of under the artist name. I researched both the FLAC documentation and the MP3 specification for a standards-based solution to the problem, but came up empty..

I did, however, find a non-standard solution to the problem. Apple iTunes has added a non-standard TCMP tag which is set to a "1" value for songs that are part of a compilation. When I was looking through the source code to Amarok I noticed that it recognized a non-standard FLAC comment called COMPILATION. It appears that MusicBrains uses these tags too.

So although there appears to be no standards-based solution to my problem, there is one that is supported by a number of leading platforms.

Although it sounds easy, it actually isn't. That's because the commonly used MP3 tagging tools (id3lib library and id3v2 program) have the tags compiled in, and source code changes are required to add TCMP support. Fortunately, Andrew Barnert has produced a source code patch that does just this.

(I didn't need to do any code changes to add COMPILATION fields to my FLAC files. The Vorbis comments that FLAC uses are free-form.)

With this change applied, my problem was solved. All I had to do was go back and mark my compilation discs. It is unfortunate, however, it required a source code change to make it work.

So, I'm marking this problem solved. That leaves just one remaining metadata problem. I'll write about that tomorrow.