TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

The music classifying nightmare

98 pointsby uxover 12 years ago

19 comments

redbadover 12 years ago
You're making the problem a lot harder than it needs, or ought, to be. The most telling example, IMO, is the Aphex Twin one...<p><pre><code> &#62; Oh, and his real name is Richard David James. What are you &#62; supposed to use for the file system directories and files &#62; name? His name? The most common nickname? Both? One file &#62; system solution is to have symbolic links (do you link &#62; Richard David James to Aphex Twin, or vice versa?). For &#62; tags, if you don't want to lose information, this is &#62; another story... </code></pre> You should—obviously—not attempt to "link" AFX to Aphex Twin to GAK to Blue Calx to etc. etc. The artist behind all of the monikers has made a deliberate decision to release work under different names. Organize accordingly.<p>Many of your nightmare scenarios appear to be a result of the same kind of over-thinking, or invention of nonsensical requirements. How are you supposed to deal with Japanese artist names? It literally doesn't matter—pick a scheme you can understand, and be consistent. How are you supposed to deal with multiple artists? List them, separated by commas, in the artist field. If they appear on an album released by a different group or person, use the "album artist" ID3 field. And since (if?) you use the ID3 fields to store your metadata, and presumably navigate your collection through an interface over that metadata, all of your questions regarding how to store files physically on disk are totally irrelevant, as long as you pick some scheme which doesn't generate conflicts. The default iTunes structure (Artist/Album/01 - Song Title.mp3) seems to work fine, for example.
评论 #4494386 未加载
评论 #4495924 未加载
评论 #4495095 未加载
saucerfulover 12 years ago
Have you tried Quodlibet? <a href="https://code.google.com/p/quodlibet/" rel="nofollow">https://code.google.com/p/quodlibet/</a><p>It comes with a Musicbrainz plugin which allows you to select an album in your library and search for it on musicbrainz (e.g. by artist name and year or number of tracks-- so you really dont need to have much info) and then you can select a match (there are usually many releases of a given album and it will actually differentiate between them) and then tag your album with the musicbrainz tags.<p>In fact you don't even need to use quodlibet to get this feature. It has a separate tagging component called "ex falso" which you can run standalone and then use the player of your choice.<p>But I would strongly recommend Quodlibet for its organizational capabilities as well. For example it uses special internal tags (not id3 but stored in a separate db) that allow you to associate "people" and "performers" to a track so that the track will appear when you search for any of those people.<p>Also there are sort tags that allow you to customize where stuff shows up, e.g. I can have a track with Artist tag "London Symphony Orchestra", composer tag "Ludwig von Beethoven" (the proper ID3 tags) BUT I give it the artistsort tag "Beethoven" so it shows up under B. Perfect!<p>See <a href="https://code.google.com/p/quodlibet/wiki/AudioTags" rel="nofollow">https://code.google.com/p/quodlibet/wiki/AudioTags</a> or more info.<p>Lastly it has regex search! And you can make "saved searhes" e.g. playlists. And it's lightweight and has an uncluttered (but highly customizable) interface. And it's very easy to write plugins for it.
评论 #4494363 未加载
cletusover 12 years ago
I only go so far as sanitizing and standardizing my music collection through Tag&#38;Rename (and I haven't found a good OSX equivalent to this yet sadly). It gets the data from Amazon in 98% of cases, adds the album art (which I like having on my player), etc. Then I store the files in:<p>Artist/Album/Track# - Song name [- Artist name]<p>The last is only there for soundtracks and other "Various Artists" type collections.<p>This is Good Enough [tm] for me. I can sync this across hard drives (backup), minimize duplicates (although I end up with these through compilations of various sorts), etc.<p>Unfortunately the ID3 tag system is All Wrong [tm] for this in many ways stated (in this post and elsewhere). For example:<p>- Albums don't really have an artist; songs do;<p>- Programs for automating this that get info from Amazon and elsewhere tend to use what year the particular CD was released rather than when it was <i>originally</i> released, which is far more useful and relevant (eg if you want the Beatles White album you don't care the CD was released in 1998, it should come up under 1968;<p>- Albums don't really have years either. Or at least they have publication years. The songs have years. Normal studio albums have a common year. Compilations and soundtracks do not;<p>- Genres are coarse-grained, arbitrary and (IMHO) mostly useless;<p>- What I like is greatly influenced by the circumstances around the song, sometimes more than the song itself. I might like a song because it reminds me of a particular person, place or event. Or even <i>mood</i>. Sometimes its the lyrics. Sometimes its the sound. No recommendation engine is going to capture this sort of angle.<p>This goes beyond music: people just aren't interested in classifying, well, pretty much anything. Playlists seem to be about as far as most people are willing to go. Playlists are a fairly convenient way of coming up with s event-specific music eg for working out, for relaxing, for dancing, for a party, etc.<p>Efforts at far strong and more accurate metadata, classification and organization speak more about one's festidious--even anal-retentiveness--more than any real need or better outcome (IMHO). It's just rabbit-holing really.
评论 #4495920 未加载
评论 #4494678 未加载
评论 #4494786 未加载
评论 #4494660 未加载
评论 #4500300 未加载
nwienertover 12 years ago
Somewhat related but as I've been building a rails project I've been meaning to open source the song parser I've been building alongside it. It scans an mp3 and pulls out the artists along with the type of role they played on the song. Here's a quick gist I pulled from my model:<p><a href="https://gist.github.com/3680949" rel="nofollow">https://gist.github.com/3680949</a><p>Some examples:<p>Drake - The Motto (Jon Bellion Cover)<p>=&#62; [["Jon Bellion", :cover], ["Drake", :original]]<p>David Byrne and Brian Eno - Strange Overtones<p>=&#62; [["David Byrne", :original], ["Brian Eno", :original]]<p>Cheri Coke, MELO-X - Free<p>=&#62; [["Cheri Coke", :original], ["MELO-X", :original]]<p>Avicii - Street Dancer (Whelan &#38; Discala Remix)<p>=&#62; [["Whelan", :remixer], ["Discala", :remixer], ["Avicii", :original]]<p>RAC - Hollywood featuring Penguin Prison (The Magician Remix)<p>=&#62; [["Penguin Prison", :featured], ["The Magician", :remixer], ["RAC", :original]]<p>And a ridiculous example:<p>Eight, Nine &#38; Ten (Eleven cover - Song name feat. One, Two &#38; Three (Prod. by Four) (Five &#38; Six remix) (Seven cover)<p>=&#62; [["One", :featured], ["Three", :featured], ["Two", :featured], ["Four", :producer], ["Five", :remixer], ["Six", :remixer], ["Seven", :cover], ["Eight", :original], ["Nine", :original], ["Ten", :original], ["Eleven", :cover]]<p>If there's any interest, I'd love to turn it into a proper github repo and accept some pull requests.. it's far from perfect (both code-wise and generally) but works well for most cases.
评论 #4494456 未加载
dj2stein9over 12 years ago
Use hierarchical directories. They work. I sort my mp3's into <i>two</i> distinct music collections:<p><pre><code> /Albums/ /Singles/ </code></pre> Then I sort by genre:<p><pre><code> /Albums/%GENRE /Singles/%GENRE </code></pre> In Albums, I then sort by artist, then by album:<p><pre><code> /Albums/%GENRE/%ARTIST/%ALBUM/# - %SONG.mp3 </code></pre> Whereas in Singles, I organize by:<p><pre><code> /Singles/%GENRE/%ARTIST - %SONG.mp3 </code></pre> This system scales very well. I have a 100GB collection and can nail down any song or album in my collection in a few clicks.
评论 #4494933 未加载
评论 #4494947 未加载
przemocover 12 years ago
What I lack (and I doubt I'm the only one) is well-thought-out tag system and more advanced players.<p>All textual entries (title, album, author, ...) should be stored in original language using original alphabet. Player could transliterate them if user doesn't know the alphabet (e.g. doing romanization of hiragana, katakana and kanji using Hepburn system for japanese music). Such entry should be able to store also a translated text, usually at least English one for non-English stuff.<p>That would solve also another problem the blog post author mentions, first name and last name ordering issue. Quoting Wikipedia:<p>"In Hungary, along with China, Korea, Japan and in many other East Asian countries, the family name is placed before a person's given name."<p>Thus in original language the order would be original, but in English one, Western-style, i.e. placing last name after the first name (and of course transliterated already).<p>It would be up to a user to choose what player should show her or him: original names, transliterated names, translated names.<p>But AFAIK ID3v2 and Vorbis don't support such stuff (well, you can try going with custom keys, but non-standard means mess) and I haven't heard about player that would do any transliteration either.<p>---<p>As for filenames I think that the best is Latin alphabet, with the most-widely used romanizations of non-Latin alphabets and simplifications of extended Latin alphabets (like removing diactric marks etc.). Clean visible ASCII!<p>I know that Asian people would mostly disagree on such file naming rule. :)
andrewcookeover 12 years ago
doesn't the musicbrainz schema cover most/some of this? <a href="http://musicbrainz.org/doc/MusicBrainz_Database/Schema" rel="nofollow">http://musicbrainz.org/doc/MusicBrainz_Database/Schema</a><p>also, given the ubiquity of UTF-8, why the need for ASCII?
评论 #4494280 未加载
webjunkieover 12 years ago
The single most annoying thing I encountered was the inability of ID3 to handle multiple albums. Every artist sooner or later releases the exact same piece of music on another album. WHY DIDN'T THEY THINK OF THAT?
评论 #4494152 未加载
buro9over 12 years ago
There seems to be confusion between the filesystem and the metadata.<p>The filesystem is for storage... it's only important to be able to group tracks together in small batches (releases - albums/singles/EPs) to be able to manage the files<p>The metadata is for searching, grouping and locating in your music player.<p>With that in mind, a lot of the problems he's cited vanish.<p>I have ~84,000 tracks from over 6,000 albums. The result of a 5 year ripping spree after a decade worked in the music industry.<p>Every track, with no exception, has been scanned by MusicBrainz Picard, had it's PUID generated and meta data normalised.<p>I've allowed my definition of genre be influenced by general opinion... I simply learned how the mass tag things.<p>The result is that I can find everything in my interface (Squeezebox) within seconds.<p>The file storage I only need to care about for management of the files, and copying to my portable player or deleting old releases (which I do rarely, but it does happen).<p>The file storage starts with file type:<p>MP3|FLAC|FLAC_9624<p>Below those are folders for high level type of content:<p>Artists|Classical|Spoken Word|Compilations|Soundtracks<p>Within Artists I used<p>Artist/Release Name[ - Catalogue Number]/[Volume/][Media/]Track Number - Track Name<p>I only fill in catalogue number if I have two versions of the same titled release... i.e. Quadrophenia by The Who I have a couple of versions of.<p>I only fill out Volume if this is a multi-volume release.<p>I only fill out Media if this is a release that spans multiple CDs or DVDs.<p>So the short version of that might be:<p>FLAC/Artists/The Who/A Quick One/02 - Boris the Spider.flac<p>And a long version might look like:<p>FLAC/Artists/The Who/Quadrophenia - Polydor 2777840/CD2/01 - 5.15.flac<p>I have no problem at all storing and finding tracks, and I've no problem at all searching for tracks.<p>One of the good things about Squeezebox is that when you search it searches all metadata and the full file path. So a search for "quadro poly" would turn up the Polydor version of Quadrophenia.
评论 #4500340 未加载
评论 #4495472 未加载
mjwover 12 years ago
Having worked with it for a few years, modelling music metadata is indeed an absolute nightmare.<p>There are some efforts to standardise this stuff though, see <a href="http://www.ddex.net" rel="nofollow">http://www.ddex.net</a> which a lot of the digital supply chain is starting to adopt. It's something of a set of scary great kitchen-sink XML schemas (schemata?) but might be of interest to those who get massively nerdy about this sort of thing.
Figsover 12 years ago
This sounds like exactly the kind of problem that relational databases were designed to solve. You can organize it with an entity-relationship model fairly easily. Once you have stored your information in a database, the filename doesn't really matter as long as each mp3, ogg, etc. gets a unique name; you can look up the file by querying the database for files that have the properties you care about.
评论 #4494834 未加载
detoxover 12 years ago
Oh god, I tried renaming files on my own (and I'm kind of a beginner programmer). Then I found out about unicode problems and then came 2 problems.<p>problem #1 was apparently I set the encoding wrong while renaming the ID3 for the music files so the foreign languages turned into question marks. I thought scratch that, time to use someone else's tool. foobar2000 solved everything I had a problem with except problem #2.<p>for problem #2, I have no idea if it'll ever be solved. The question marks or blank boxes ending up on my computer deal with the encoding for the OS itself (or something). And even with multi-language support, windows doesn't let me fix that. There's obviously something wrong when my blank box issue disappears when I restart my computer from time to time...
评论 #4494437 未加载
teycover 12 years ago
Rob Pike said it best when he quoted his friend:<p><pre><code> My late friend Alain Fournier once told me that he considered the lowest form of academic work to be taxonomy. And you know what? Type hierarchies are just taxonomy. You need to decide what piece goes in what box, every type's parent, whether A inherits from B or B from A. Is a sortable array an array that sorts or a sorter represented by an array? If you believe that types address all design issues you must make that decision. </code></pre> Reference: <a href="http://commandcenter.blogspot.com/2012/06/less-is-exponentially-more.html" rel="nofollow">http://commandcenter.blogspot.com/2012/06/less-is-exponentia...</a>
sampsover 12 years ago
It looks like the author has come to an understanding with his/her disorganized music collection, but beets makes it pretty easy to bring sanity back to a messy music library: <a href="http://beets.radbox.org" rel="nofollow">http://beets.radbox.org</a>
评论 #4495626 未加载
评论 #4494354 未加载
niklaslogrenover 12 years ago
I feel your pain. I have also stopped caring about music classifying, for much the same reasons you listed. Nowadays I use only Spotify, and I trust in their tagging abilities, and in Last.FM's auto-correcting ability.<p>My main concern when I used foobar used to be how to handle multiple artists on the same album, which always resulted in the album being split up when displayed in a list. I was unbelievably happy when I discovered the "Album artist"-tag, which unites all songs in an album under the same banner, while still preserving (and scrobbling) the original artist name.
评论 #4494317 未加载
ANTSANTSover 12 years ago
I don't think there is a perfect way to organize a music collection in a hierarchical manner, so I don't even bother. Good tag metadata and foobar's search does all the work for me.<p>To me, the purpose of a filesystem is not to implement fine-grained categorization, but to provide basic grouping of related files so that I can easily operate on them all at once. To this end, my music collection mostly consists of one folder per album in a root music directory. Folders are usually named "Artist/Group Name - Album Title". That naming scheme doesn't always fit (albums featuring various artists, soundtracks in which I'm more likely to care about the title of the work rather than the artist that composed it, etc), but I don't try to separate soundtracks from regular albums or anything like that, I just throw them in the same root directory. With this scheme, it's easy to delete/share/transcode an album when it's contained within a single directory, convenient for people I share with, and I don't waste any time obsessing over something that I rarely need to see.<p>Some people have advocated a more database or metadata-oriented approach where you strip all metadata from the filename and folder hierarchy and stuff all your files in one directory. It's an interesting idea, for sure, more closely resembling the way web services like Youtube store their content. It makes one begin to imagine a desktop operating system that featured a metadata database as the primary filesystem organization scheme in place of the traditional hierarchical filesystem.<p>With our currently available tools, however, having some kind of useful metadata in the filename and/or filesystem hierarchy, even if it is redundant, is incredibly useful when performing manual file manipulation, especially the aforementioned sharing of files. You'd need ubiquitous categorization metadata in files (that is, not just ID3 and company for music files) and ubiquitous support for parsing this metadata in everyday applications (that is to say, when beginning a download of a song or a document, your web browser would show you the relevant metadata and hide the filename, if it exists. when opening a file, one would have to be greeted with a search box instead of a traditional hierarchy dialog) before we could ever entirely transition from having meaningful filenames to having meaningless hashes, timestamps, or garbage as the primary identifiers of files.
michaelhoffmanover 12 years ago
Some of these problems (like creators with the same name) have been solved by librarians years ago.
评论 #4494408 未加载
rishonikover 12 years ago
I have two classifications for my digital collection: Old Music and New Music. Old Music is everything recorded before I was born. New Music is everything recorded after I was born. It cuts down on putting too much time into it all.
Zakharovover 12 years ago
Another annoyance the author didn't mention is that unzippers frequently mangle the filename and/or metadata if it uses Unicode. Archive Manager is the worst at this.