![]() |
Music Database Design | |
| Home « Music « Music Database Design | ||
During an unemployed few months in 1995 I spent several weeks of spare time performing data entry of the titles and tracks of most of the recorded music at Nancy Street into an Access database. Due to lack of time to create a beautifully designed database with a friendly GUI, the database was mainly 2 huge flat tables, one for volumes (albums) and one for tracks. There was a lot of duplicated data, but it was the quickest way at the time of performing bulk data entry.
That original rather flat database remained unchanged for many years, but the time finally arrived where it needed to be converted into a more elegant structure so that complex queries could be performed. In the old flat structure it was impossible to perform quite reasonable queries like these:
For years I puzzled in my spare time about how to reorganise my old database into a set of tables so that it was perfectly reduced to its elements in such as way that any sensible query could be performed upon it. In the database world, this reduction is called database normalisation, and it's such a fundamental skill that countless papers, books, magazine articles and web pages are devoted to the topic.
See Database Normalisation Basics for a very simple explanation of normalisation. Run a web search for other more sophisticated articles. Normalising a database is a mandatory skill for all software developers.
Because of its nature, a music collection is not a trivial thing to normalise. Consider the following sample design problems:
These sorts of problems resisted my casual analysis for many years, but finally in 2002 I decided it was time to stretch my brain as a serious self-training exercise and redesign the music collection database in an elegant normalised form. After several experiments and much scratching of pens on paper scraps late into the night the relationships evolved as shown in the following table (click to pop-up the full sized image).
![]() Click to Enlarge (876 x 635) |
This table relationship diagram shows how most of the design problems mentioned above are overcome. I invented a thing called a volume, which is a single item of music or a closely related set of musical items. For example, the vinyl LP of Abbey Road by the Beatles is a single volume, and so is their White Album a single volume even though it contains two discs. The number of discs in a volume isn't important, as any track can identified by its track number and disc sequence within the volume. The expression volume is only used internally and is not seen by users, who will see the more familiar words album and disc used instead.
The original database was created in Microsoft Access and later converted to SQL Server Express. The SQL Server diagram is too large to display, so I attached the important tables in Access and took a screen-shot of the Relationships window. The SQL Sever database actually contains a dozen more tables related to books, authors and pictures, but they are unrealted to this discussion of music and have been omitted.
The many-to-many relationships between albums and genres, artist names and covers is solved via join tables. You can see how the tables GenresJoin, NamesJoin and CoversJoin allow one volume to belong to multiple genres, to have multiple names (people) associated with it and to have multiple cover images. Each name associated with a volume also has an entry in the Roles table, this allows a volume to have the same name joined multiple times with different roles (Composer and Performer for example).
The VolumeInstance table allows us to have multiple copies of a volume. Each row in the VolumesInstance table represents a physical copy of a volume, and tells us who owns it, its cost and date of purchase. For example, if you look at the LP Library page and scroll down to Dire Straits you will see that the album Communiqué has 3 different owners, so we have 3 copies of the album, each one is an instance of the volume (album).
The Media, Tracks and Files tables are simple one-to-many links that tell us the physical media type and the tracks and files on a volume respectively. A "track" refers to an audio or video track, a "file" refers to a data file on a data CD or DVD..
Conversion
Converting the old flat style tables to the new structure required a mixture of programming and manual keyboard work. It was most entertaining to write my first Visual Basic code for a couple of years to manipulate the old Access tables using ADODB and delicately tear them apart and reconstruct them in the new form. No GUI or middle-tier existed over the new database at that time, so I spent several hours spread out over a few nights working through the sorted tables in Access 2002 inventing and assigning genres, names and cover images to volumes.
Hungarian Naming
One night I opened up Access and discovered that most of my queries were generating Recordset open errors with code 0x80004005. This was a surprise, as the night before everything was fine. After running hundreds of tests to eliminate theories about nulls, locked files, TEMP work spaces, permissions and other things mentioned in web sites I found the problem was possibly related the name of one of my table columns. In the Volumes table I had a field called Catalog, experimentally changing this to Catalogx fixed the problem. Therefore, you should never risk using simple words as table or field names in a database.
To avoid this problem in the future, I adopted the hungarian naming convention for table fields and renamed the whole database contents. You can see that Volume table fields are named V_xxx, Cover table fields are named COV_xxx, etc. For years I thought this was ugly, but I was finally converted after I saw it in action in a vary large database project. It does look unusual at first and it generates slightly strange looking code, but I feel the benefits of being able to instantly recognise any field outweigh any other perceived problems. You should indicate join fields by naming them like AA_BB_ID where AA is the owning table and BB is the foreign key table.
A
Compromise
Theoretically, every track on a volume can potentially have multiple genres and names associated with it. Imagine a record with some tracks being classical, some comedy and some jazz, and each track having different sets of performers and composers. In this case every track would need to link through joins to the other tables. I decided for practical reasons to push these potential relationships back up a level from the track to the volume that owns it. In practice, this means for example that you can't specifically list every track that Bing Crosby performs on, but you can list every volume (album) that he performs on, even though you're not explicitly told which tracks are his. This compromise was made for purely practical reasons, as the data entry of links from every track would have been incredibly time consuming, even with some automation.
.NET Notes
I have recently been working on a commercial product that has about 200 SQL Server tables with a complicated and deep structure of relationships. Manually writing database code for so many tables would be a fragile and incredibly taxing task, so I should mention to newcomers of .NET programming that the CodeSmith and netTiers tools are invaluable help in such a situation.
CodeSmith is a tool that generates code from XML templates. CodeSmith inspects the structure of a SQL Server database and generates classes based upon the names, keys, indexes and stored procedures within the database. Many templates are available to generate database layer code with various levels of sophistication. CodeSmith 2.6 is still free and buried somewhere inside the web site as a download. You have to pay for the latest versions, and they're piracy protected via an activation system.
netTiers is a complex set of CodeSmith templates that generate a comprehensive data tier project over a database. Reading the list of Core Features on the netTiers web site will give you an idea of how sophisticated it is, but beware that novice programmers may be bewildered at first by the complexity of the generated code, its dependency upon the Microsoft Application Blocks and the large number of generated components. However, if you're dealing with a very complex database, then the results will be worth the effort of overcoming the learning curve.
Since mid-2005 when I converted all of my databases from Access to SQL Server, I have refactored my projects to use netTiers and CodeSmith 2.6. Countless thousands of lines of old manually written code have been deleted and replaced by generated classes which are reliable and powerful. One of the most impressive methods of the table-wrapper classes is DeepLoad, which can walk down the tree of joins and load all child objects to any depth required. Many competitive products to CodeSmith are becoming available, so my comments here might be a bit out of date and I recommend you do some research before chosing the tool that might suit you.
Every few weeks, a utility program extracts a snapshot of the database and generates the following static web pages:
Downloads
| opus_sql2005_backup.zip (4.8MB) - A zip of a SQL Server 2005 backup of the Opus music database. The unzipped size is about 25MB. |