Category: Music Taxonomy/Metadata attribution
Why is this of interest:
Musicbrainz is an example of a large community based organization (non-profit foundation) that depends on user submissions to develop a massive database. It provides an interesting juxtaposition both in relation to the CDDB/Gracenote DB system from which it developed as a response to commercialization and to the more dynamic relational databases of ‘entertainment’ and ‘social’ music sites such as last.fm and Pandora (simply in terms of populating a database). It is also interesting in its use of ‘audio fingerprinting’ to match media and user edited metadata. There is an interesting play between automated/autonomous submission and edited submission running through all these music taxonomy models that I think is worth pursuing in any attempt to build a dynamic and recursively populating system.
Musicbrainz was a project initiated in response to the commercialization of the CD metadatabase CDDB that was built rather haphazardly to account for the need to identify and supply information about the contents of music CD’s to cd-player applications. CDDB began with a single developer, Ti Kan in 1996, who managed email submissions manually from contributors who assumed the information they contributed would remain freely available. This assumption was based on the database source code being license under a GNU General Public User License. The developer of CDDB and his associates later sold the Database to electronics manufacturer Escient. The Database was commercialized and access sold under a commercial license (http://en.wikipedia.org/wiki/CDDB). CDDB became the ubiquitous Gracenote (http://www.gracenote.com/).
In the outcry that followed the commercialization of CDDB a number of determinedly open source and community based database systems were developed. ‘Freedb’ remains the DB of choice for commercial developers and is considered a ‘clone’ of CDDB in terms of functionality (http://www.freedb.org). Freedb is also licensed on a GPL and is committed to remaining free but this hardly seems assured given the CDDB/Gracenote commercialization, the exodus of the original Freedb team from the project, and the subsequent sale to commercial entity Magix. See forthcoming entries on CDDB/FreeDB for more information. Both FreeDB and Gracenote developed form user submissions. As Gracenote was commercialized this aspect of the DB was curtailed. Freedb submissions happen mainly through the frontend of the applications that users organize and append their own music libraries with. Both projects use a ‘nearly’ unique identifier in order to identify CD’s and then tracks according to their published order. The unique identifier is automagically generated according to the file information derived from the desk (song lengths and so on) this allows identification of the disc and the association of metadata. This is of interest because of the taxonomy for meta-data attribution that develops out of this model. Songs are attributable only as albums according to the particular qualities of the Compact Disc on which they were distributed. No doubt both system will need to evolve beyond this identifier.
Musicbrainz was another DB that developed out of the reaction to the CDDB commercialization. It is now based around the structure of a US Not for Profit Foundation (The Musicbrainz Foundation). Musicbrainz is not concerned with mirroring the functionality of CDDB. It uses two different ‘audio fingerprinting’ technologies that, in theory, allow individual tracks to be identified and associated with appropriate metadata according to their audio characteristics. This approach means that the perceived limits of the previously discussed projects, CDDB and FreeDB, are avoided by removing any reference to the original CD architecture. Instead of identifying tracks according to there position within a defined playlist fingerprinting technology allows (in theory) the track to be identified by audio analysis. Musicbrainz uses two technologies. Relatables (http://www.relatable.com/tech/trm.html) TRM (TRM recognizes music) and the MusicDNS system. MusicDNS is a proprietary system owned and operated by MusicIP (http://musicip.com) that assigns the PUID (Portable Unique Identifier) to a track according to the associations the fingerprinting algorithm provides. The MusicIP system is interesting for its end-user application the MusicIP Mixer which is a playlist generator that operates a little like an automated Pandora for those of us who have obscenely large music collections. I will review that application in a dedicated entry.
Musicbrainz uses this technology as a means of operating its Picard software which is designed to allow a user to assign metadata to the track in their music library automatically – repairing files with lost metadata. The Musicbrainz DB is a community centered project that actively calls for participants to enter and assess the validity of the database metadata. It is increasingly moving toward taxonomic categories of a more social dimension. Thes include categories for related artists, and projects: this should be seen as a move to a more relational DB model focussing on music discovery as well as metadata attribution. The non-commercial aspect of Musicbrainz means that development is relatively slow and so projects like Last.fm appear to be developing their database back-ends at a much faster pace unhindered as they are by an archaic infrastructure and mission. There is perhaps more of a reward function in last’s interface and submission process in terms of social interaction, profile development and playlist functionality means that ‘scrobbling’ -which is effectively giving your data away for a return of functionality ( the quintessential web2.0 model) – is a very fast and effective means of building a ‘generative’ relational system. That said MusicBrainz is stuck between its history as an open ‘meta-database’ for catalogeuing and attributing user submitted data and any future development as a useful relational database. The audio-finger printing of audio files does promise a very promising means of developing a powerful relational database with an end-user submission process that is potentially more automated than even last.fm’s ‘scrobbling’ model. It is unlikely a community based project of MusicBrainz history and pedigree will be the source of such an innovation though – its simply beyond their current means or the developer’s focus.