Skip to main content

Topic: Directory structure and file naming (Read 8485 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.
Directory structure and file naming
--------------------------------------------------
Background
--------------------------------------------------


Directory structures and file naming conventions have been discussed at least 3 times on these forums:
   http://www.hydrogenaudio.org/forums/lofive...php/t21847.html
   http://www.hydrogenaudio.org/forums/index....showtopic=20699
   http://www.hydrogenaudio.org/forums/index....showtopic=17258

I learned much from reading the posts above, but I have my own ideas too, so I would love to receive your feedback before ripping my CD collection.

(I will be doing this using EAC along with the latest LAME--3.97beta at the time of this writing--with these settings
   -V 0 --vbr-new --id3v2-only --pad-id3v2 --ta "%a" --tt "%t" --tl "%g" --ty "%y" --tn "%n" %s %d
which should yield archival quality mp3s.  Yeah, yeah, I know: probably could get away with -V 2 or -V 3, but this unassailable sound test
   http://www.hydrogenaudio.org/forums/index....showtopic=36465
found that -V 2 and -V 3 still suffer slightly, and I have a lot of classical music, and I only want to do the ripping once.  By the way, guruboolez, if you are reading this posting, how well does -V 0 sound for your classical music?  Any artifacts left at that point?  I was very impressed with your listening procedure.)

I have decided that I will put most of the metadata (artist, album, year, etc) into the filename, even tho some of it is redundant with the directory structure that I will store stuff in, as well as redundant with the ID3v2 tags that I will use.  One reason for this is because if I copy the mp3s to a CD (e.g. to play on a portable player), I may need to put the mp3 files into a flat file structure because the portable player may not understand how to drill down into directories and play the files.  (Does anyone know for sure how common of a problem this is?  I recall having this problem a year ago with a player when I last tried burning an mp3 CD, but have no idea how common it is.)

Along with many others in the above postings, I think that a top-down hierarchical organization is the only way to go for information like this (e.g. so that the file system puts the files in usually correct sort order, at least for how I want things to appear).




--------------------------------------------------
Directory structure\file name proposals
--------------------------------------------------


The essential cases are:


1a) single artist album (most common case):

   artist\yyyy ~ album\artist ~ yyyy ~ album ~ nn ~ trackTitle.mp3


2a) multiple artist (i.e. compilation) album:

   _variousArtists\yyyy ~ album\yyyy ~ album ~ nn ~ artist ~ trackTitle.mp3


Here:
   --items before each backslash char ('\') are directory names (i.e. windows is assumed) and what comes after the last '\' is the file name

   --yyyy is the 4-digit year that the album came out

   --nn is just the track number for single disk albums.  However, if there are multidisks, then handle by inserting
      Disk dd
   before
      nn
   where dd is the disk number.  For example, case 1a) when there are multidisks becomes
      artist\yyyy ~ album\artist ~ yyyy ~ album ~ Disk dd ~ nn ~ trackTitle.mp3


Complications arise when you realize that "artist" can have multiple meanings.  Consider, for example, classical music in which composers and performers are usually distinct and both can be considered as the "artist".  To handle these subcases, I propose that "artist" be understood as the compound "composer ~ performer".  To make this explicit:


1b) single composer and single performer album (e.g. typical classical album):

   composer\yyyy ~ album\composer ~ performer ~ yyyy ~ album ~ nn ~ trackTitle.mp3


2b) multiple composers and/or multiple performers album (the nightmare case):

   composerPrimary\yyyy ~ album\yyyy ~ album ~ nn ~ composer ~ performer ~ trackTitle.mp3
or   
   _variousArtists\yyyy ~ album\yyyy ~ album ~ nn ~ composer ~ performer ~ trackTitle.mp3




--------------------------------------------------
EAC codes
--------------------------------------------------


The EAC filename codes for the above cases are:


1) single artist:

   %A ~ %Y ~ %C ~ %N ~ %T


2) multiple artist:

   %Y ~ %C ~ %N ~ %A ~ %T


where artist may need to be adjusted (e.g. in freedb) to be the coumpound "composer ~ performer" (see comments below).


Note: apparently, you need to manually type in the disk number for multidisk albums, as EAC appears to have no code for this; correct me if I am wrong.

Also, does anyone know how EAC's %A tag ("which it describes as "CD or track artist") differs from the %D tag ("CD artist")?




--------------------------------------------------
Comments
--------------------------------------------------


1) every metadata item, without exception, is separated by a special char.  This is both for visual clarity and human readabiliy, as well as it makes it trivial to write a computer program to parse the info.

I choose the tilde ('~') as my separator char because:

   a) I have never seen it as part of normal text (e.g. part of a song title); this is an absolutely critical in order to eliminate ambiguity (e.g. for the computer program mentioned above, as well as a human looking at it); if you do not choose a char like this, then you need some sort of character escape mechanism to distinguish meta charaters from text characters
   
   b) it is a valid file name char in all operating systems that I am aware of (please enlighten me otherwise)
   
   c) it visually looks good as a separator char

Note that the usual choice that people make for a separator char is the hyphen ('-').  This is something that people have just not thought thru: it violates condition 1), which means that its use as a separator can be totally confused with its use inside text.  This gets even worse when you consider that a lot of text like a work's name contains colons (':') in it, but colons are illegal file name chars, so you need to replace them with something else, and a natural replacement char is a hyphen.  (See also comment 6a below.)


2) I put a leading underscore char ('_') in front of the _variousArtists words since that will cause all such directories to be listed first, and then come the normal single artist files.  Why?  Because I think that it is strange to have all of the compilation albums in the middle of the V section with normal albums listed both before and after.  To my mind, they should all consistently either be listed first or all listed last.  If you have a suggestion for a better looking or more appropriate non-alphanumeric character which achieves this, please suggest it.


3) yes, in the future, I will probably have multiple genre directories to further sort things, each of which will contain subdirectories with the above form.


4) I am aware that modern versions of windows are limited to 256 chars in the names of directories and files, and that you can only have 256 levels of directories.  Intelligent operating systems, such as various flavors of unix, may have much higher limits.

But the real problem comes when you try to burn files to optical disks: the utter piece of shite ISO 9660 format (esp level 1, which imposes old dos 8.3 style names!) is hopeless; you have to use UDF (ISO 13346)
   http://en.wikipedia.org/wiki/Universal_Disk_Format
which apparently gives you 255 char long filenames and total path names up to 1023 chars
   http://msdn.microsoft.com/library/default....leio/fs/udf.asp
I think that the above format should almost always fit into UDF.

Does anyone have comments on the filename constraints of typical portable players like iPods and iRiver?


5) I have been toying with the idea of eliminating the distinction between cases 1 and 2 by always using the format
   yyyy ~ album ~ nn ~ artist ~ trackTitle.mp3
The problem is that I really want stuff to be sorted by artist for those (the majority) of cases where there is a single artist.


6) freedb, which is used by EAC for disk information has a lot of issues:

   a) the text often contains characters that are illegal in filenames, especially colons (':') and slashes ('/').  I did a test just now and found that EAC replaces '/' with comma (',') and '\' & ':' with hyphens ('-').  Should I just accept the transforms that EAC will do, or should I change the text manually to a file name safe form and resubmit to freedb?
   
   b) a lot of the people who have submitted info to freedb seem not to have thought thru all the issues that I raised above, particularly in terms of how to handle compound artists (i.e. I have seen the artist field contain sometimes just the composer or sometimes just the performer).
   
   The closest to a discussion on the freedb site that I found concerning this is the freedb rules for compilation albums
      http://www.freedb.org/modules.php?name=Sec...le&artid=26#2-2
   which recommend using
      "artist / track-title"
   But no mention is made in that url on how to handle coumpound artists, not to mention their problematic recommendation to use the '/' char which cannot appear in file names.

   If this forum can come to a consensus as how to handle some of these issues, should we lobby freedb to include them in their guidelines?


7) this is a bit off topic, but why is the official lame site
   http://lame.sourceforge.net/
so out of date?  They list "Latest LAME release : v3.96.1 July 2004" when hydrogenaudio has proclaimed that 3.97beta is the latest recommended release.

Speaking of lame, if lame appreciation month
   http://www.hydrogenaudio.org/forums/index....showtopic=38190
is ever reopened and I am aware of it, then I would gladly donate some money (I failed to notice the last one in time).  To any lame developer reading this: thanks much, your eforts are very appreciated.

  • sTisTi
  • [*][*][*][*]
Directory structure and file naming
Reply #1
What you write about your tagging and naming structure makes sense to me.
However, i would strongly suggest to you to rip to a lossless format in addition to or instead of MP3. Ripping all your CDs will be a tedious task (I'm currently ripping my ~400 CD collection myself), and you will never, ever want to do it again, especially if you put a lot of energy and thought into file names, tagging etc. I can assure you that you will regret it in a few years time if you don't have those cool FLAC or WAVPACK files on your hard disk that you can transcode to whatever codec or bitrate that you (or your portable player) like by simply pressing a button, and your clean, well organized tags will be copied as well to these new files. Think about it!
Proverb for Paranoids: "If they can get you asking the wrong questions, they don't have to worry about answers."
-T. Pynchon (Gravity's Rainbow)

Directory structure and file naming
Reply #2
Quote
However, i would strongly suggest to you to rip to a lossless format in addition to or instead of MP3...


I have gone back and forth on that, and I definitely see your point.

Right now I am leaning to just ripping to an extremely high quality mp3 since if it is truly transparent (no audible difference compared to the wav data) then that is good enough for me.  I am choosing mp3 bcause it is a universal and open (in practice, if not legally) format available right now and I bet will be still supported in the future, with very good open source support like lame.  But I am definitely ripping at -V 0 for this reason since I really want to achieve transparency.  I can always then convert it back to a wav and then encode it to a different format if need be.

But feel free to defend your viewpoint further--I would rather change now then later!


Quote
those cool FLAC or WAVPACK files


Which do you prefer: FLAC or WAVPACK?  What are the best discussions here on HA regarding this?

  • sTisTi
  • [*][*][*][*]
Directory structure and file naming
Reply #3
Quote
Which do you prefer: FLAC or WAVPACK?  What are the best discussions here on HA regarding this?
[{POST_SNAPBACK}][/a]

This is a great comparison:
[a href="http://wiki.hydrogenaudio.org/index.php?title=Lossless_comparison]http://wiki.hydrogenaudio.org/index.php?ti...less_comparison[/url]
I personally use FLAC because I found it easier to use (comes with an easy to use frontend) and it has better software & hardware support. On the other hand, Wavpack is more flexible (hybrid and lossy mode) and compresses slightly better.

Quote
Quote
However, i would strongly suggest to you to rip to a lossless format in addition to or instead of MP3...


I have gone back and forth on that, and I definitely see your point.

Right now I am leaning to just ripping to an extremely high quality mp3 since if it is truly transparent (no audible difference compared to the wav data) then that is good enough for me.  I am choosing mp3 bcause it is a universal and open (in practice, if not legally) format available right now and I bet will be still supported in the future, with very good open source support like lame.  But I am definitely ripping at -V 0 for this reason since I really want to achieve transparency.  I can always then convert it back to a wav and then encode it to a different format if need be.

But feel free to defend your viewpoint further--I would rather change now then later!

But if you convert it back to wav and recompress to another lossy format, you will have a much lower quality file than transcoding directly from a lossless source - and MP3 is considered to be the worst source for transcoding, even at high bitrates! There have been several transcoding listening tests here at HA.
To convince you further:
MP3 has certain problem samples that are not even transparent at -V0. MP3 is an immensely popular but dated format which will never be transparent on all samples, although the LAME encoder is really great. But who knows what audio formats will be dominant in 5 or 10 years?
If you get a portable player which can play AAC or Ogg Vorbis, you might want to use these formats because they are more efficient for low bitrate encodings - remember, your -V0 files will be HUGE! For e.g. Ogg Vorbis, quality is really quite acceptable for portable devices already at 96 kbps. So if you have a lossless archive, just press a few buttons, and a few hours later your computer will have encoded your whole music archive e.g. to Ogg Vorbis, AAC or simply lower bitrate MP3 from a pristine lossless source, with all your tags and filenames intact!  No matter what audio formats will be dominant in 5 or 10 years, you never ever will have to rip your CDs again and can enjoy all the efficiency and other benefits of the latest and greatest audio formats and encoders by just pressing a button to transcode your FLAC files.

If you use EAC, it is pretty easy to rip directly to FLAC. It's also faster than MP3 because the FLAC encoder is a lot faster than Lame. There are also tools to rip directly to FLAC and MP3 at the same time. There are also threads about these tools here at HA; I've never used them, but I think they are called 'FLACattack', 'REACT', maybe there are others too.
  • Last Edit: 02 November, 2005, 02:54:07 PM by sTisTi
Proverb for Paranoids: "If they can get you asking the wrong questions, they don't have to worry about answers."
-T. Pynchon (Gravity's Rainbow)

Directory structure and file naming
Reply #4
I have an amendment to make to my original posting: for the compound artist case (e.g. composer and performer), I now think that it is not a good idea to separate the various entities with tildes (i.e. "composer ~ performer" is bad).

I say this because I think that it is important to reserve tildes as separator chars for the major categories (e.g. distinguishing between artist and album) and not as a separator for minor categories (e.g. the different artists in the artist section).  Keeping this strict use of tildes makes parsing easier.

To separate minor categories, I am hesitant to use hyphens (i.e. do "composer - performer") because this again has the problem that some entity like a performer has an actual hyphen in their name (e.g. a woman who got married and appends her husband's name to her old one).

Anyone have any good suggestions here?  I want the symbol to be relatively common (e.g. US ASCII and reachable by typing on an ordinary keyboard).

Perhaps asterisks ("composer * performer") or vertical pipes ("composer | performer") might work.

  • sTisTi
  • [*][*][*][*]
Directory structure and file naming
Reply #5
Quote
Perhaps asterisks ("composer * performer") or vertical pipes ("composer | performer") might work.
[a href="index.php?act=findpost&pid=339114"][{POST_SNAPBACK}][/a]

I don't think * can be part of a filename, can it? Or are you just talking about the tags?
Proverb for Paranoids: "If they can get you asking the wrong questions, they don't have to worry about answers."
-T. Pynchon (Gravity's Rainbow)

Directory structure and file naming
Reply #6
Quote
Quote
Perhaps asterisks ("composer * performer") or vertical pipes ("composer | performer") might work.
[a href="index.php?act=findpost&pid=339114"][{POST_SNAPBACK}][/a]

I don't think * can be part of a filename, can it? Or are you just talking about the tags?
[a href="index.php?act=findpost&pid=339118"][{POST_SNAPBACK}][/a]


How right you are: neither asterisks nor vertical pipes can be in file names (on windows at least).

Maybe I will just use hyphens (e.g. "composer - performer") and hope that cases where a composer performer has a hyphen in their name are rare.  (Actually, all that need to worry about is if they have both a space char before and after any hyphen, that is, the substring " - " in their name; this, I bet, is rare, so maybe it will be OK.)

  • AnEnigma66
  • [*][*]
Directory structure and file naming
Reply #7
Might be stupid.... but what about --  double hyphens?

Directory structure and file naming
Reply #8
Quote
Might be stupid.... but what about --  double hyphens?
[{POST_SNAPBACK}][/a]


Thats a decent suggestion.  Sometimes, in plain text files, people use"--" to stand for an em dash (see [a href="http://en.wikipedia.org/wiki/Dash#Em_dash)]http://en.wikipedia.org/wiki/Dash#Em_dash)[/url] but it is highly unlikely to be part of an artist's name.

Directory structure and file naming
Reply #9
One slight correction: it appears that for optical media, the conventional spelling is Disc and not Disk as in my original posting.

Thus, I will do "Disc dd" instead of "Disk dd" for multidisc albums.

Directory structure and file naming
Reply #10
OK, here is the entire discussion summarised and is what I will be using:

1) single artist album:

   artist\yyyy ~ album\artist ~ yyyy ~ album ~ nn ~ trackTitle.mp3


2) multiple artist album:

   _variousArtists\yyyy ~ album\yyyy ~ album ~ nn ~ artistForThisTrack ~ trackTitle.mp3
or
   <thePrimaryArtist>\yyyy ~ album\yyyy ~ album ~ nn ~ artistForThisTrack ~ trackTitle.mp3


Here:
   --items before each backslash char ('\') are directory names (i.e. windows is assumed) and what comes after the last '\' is the file name

   --artist or artistForThisTrack is normally just a single person.  However, if there are multiple ones, use " - " to separate the items.  Use a consistent listing order: the composer first, then any conductor, then any performer(s); for example:
      Michael Praetorius - Paul McCreesh - Gabrieli Consort & Players
   When using freedb with EAC, will probably need to resubmit this full info, since many of the original submissions are incomplete.  Also, if <thePrimaryArtist> is used for the topmost directory name, have it be the composer. 
   
   --yyyy is the 4-digit year that the album came out

   --nn is normally just the track number for single disk albums.  However, if there are multidisks, then it becomes
      Disc dd, Track tt

EAC codes:
   case 1): %A ~ %Y ~ %C ~ %N ~ %T
   case 2): %Y ~ %C ~ %N ~ %A ~ %T
where you will need to come back and hand edit %N for the case of multi-disks since apparently EAC has no code for the disk number.

  • stuntman
  • [*]
Directory structure and file naming
Reply #11
Quote
Right now I am leaning to just ripping to an extremely high quality mp3 since if it is truly transparent (no audible difference compared to the wav data) then that is good enough for me.  I am choosing mp3 bcause it is a universal and open (in practice, if not legally) format available right now and I bet will be still supported in the future, with very good open source support like lame.  But I am definitely ripping at -V 0 for this reason since I really want to achieve transparency.  I can always then convert it back to a wav and then encode it to a different format if need be.

If the amount/cost of storage is even a slight factor in your decision to use a lossy format (even at a 'transparent' quality setting) for archival purposes, at least consider how quickly the price of storage will fall in a relatively short space of time. In a few years, inexpensive storage will be measured in terabytes, and filesize won't be an issue.
  • Last Edit: 09 November, 2005, 06:45:09 PM by stuntman

  • kwanbis
  • [*][*][*][*][*]
  • Developer (Donating)
Directory structure and file naming
Reply #12
instead of using "VA", i:

1) Use the CD name as the artist, and the year as the CD name, as in:
Lounge For Lovers - 2005 - tracks

or

2) Use the music type as the artist, and the CD name as the CD name, as in:
Reggaeton - Best Of Reggaeton - etc

i really don't think V.A. or VA is very usefull.

Directory structure and file naming
Reply #13
After using the naming scheme advocated above for ripping several CDs, I have decided that I need to abandon it for a simpler format.

The sole reason is the deeply problematic win32 file system constraints.  In particular, in my original post above, my claim

>4) I am aware that modern versions of windows are limited to 256 chars in the names of directories and files,
>and that you can only have 256 levels of directories.

is technically correct, but incomplete: it turns out that there is the additional constraint that the total path length cannot exceed 260 chars (the famous MAX_PATH var in windows; see http://blogs.msdn.com/brian_dewey/archive/...1/19/60263.aspx for more details).

I was finding that when ripping to mp3 files using EAC/Lame that EAC would sometimes (especially with classical CDs, with their complicated multiple artist and track title problems) either fail to create files due to windows objecting to the path, or even worse the file might get created but then windows would not allow me to move the file into its proper subdirectory.  This confused me for a while.

I decided that the least bad way to cope with microsoft's poor excuse of an OS is simply to use shorter filenames, essentially having just the track number, possible track artist (if different on different tracks, and track title.  Its not a huge loss, since the id3 tags contain all the meta data anyways.


My simplified scheme summarized:

1) single artist album:

   genre\artist\yyyy ~ album\nn ~ trackTitle.mp3


2) multiple artist album:

   genre\_variousArtists\yyyy ~ album\nn ~ artistForThisTrack ~ trackTitle.mp3
or
   genre\<thePrimaryArtist>\...
   genre\<theLabel>\...


Here:
   --items before each backslash char ('\') are directory names (i.e. windows is assumed) and what comes after the last '\' is the file name

   --artist or artistForThisTrack is normally just a single person.  However, if there are multiple ones, use " - " to separate the artists.  Use a consistent listing order: the composer first, then any conductor, then any performer(s); for example:
      Michael Praetorius - Paul McCreesh - Gabrieli Consort & Players
   When using freedb with EAC, will probably need to resubmit this full info, since many of the original submissions are incomplete.  Also, if <thePrimaryArtist> is used for the topmost directory name, have it be the composer. 
   
   --yyyy is the 4-digit year that the album came out

   --nn is normally just the track number for single disk albums.  However, if there are multidisks, then it becomes
      Disc dd, Track tt
      
   --trackTitle is normally a simple name.  However, there can be compound names (e.g. in many classical albums there is often the overall work's name as well as the movement's name), so for cases like this, use " - " to separate the names.  Use a consistent top-down naming order; for example, for the Antonin Dvorak album "Aus der neuen Welt":
      Stabat Mater, op. 58 - Stabat Mater dolorosa (Andante con moto)
   When using freedb with EAC, will probably need to resubmit this full info, since many of the original submissions are incomplete.

The EAC code, REGARDLESS OF CASE ABOVE, will always be
   %N ~ %T
In particular, note that case 2) does NOT get the code %N ~ %A ~ %T because the artist info has to be entered as part of the individual track name (i.e. %T) text.  Furthermore, you will often need to edit %N for the case of multi-disks, since apparently EAC has no code for the disk number.

Directory structure and file naming
Reply #14
[Edit: I completely changed around my naming scheme so I rewrote this post]

I finally figured out how to handle all of my albums the same way (without needing to flag Various Artist albums, for example). I did it by abandoning %Artist% for two separate tags, %Sort Artist% and %Track Artist%, on all albums. Here's the format:

%Sort Artist%\%Date% %Album%\CD%Disc% %Discname%\%Tracknumber% -%Track Artist%- %Title%

It's a little confusing at first, but so far it has worked great.

Tool\1996 Ænima\CD0\14 -~- (·) Ions.flac

The tilde (~) is shorthand for "%Sort Artist%," so here the %Track Artist% and %Sort Artist% tags would both be "Tool"

If a track has a featured artist it looks like:

(hed) pe\2000 Broke\CD0\03 -~, Serj Tankian, Morgan Lander- Feel Good.flac

Here the %Sort Artist% would be "(hed) pe" and the %Track Artist% ("~, Serj Tankian, Morgan Lander") gets tagged as "(hed) pe, Serj Tankian, Morgan Lander"

Here's how it works with classical CDs:

Frédéric Chopin\1997 Chopin_ Preludes & Nocturnes\CD0\12 -Tzimon Barto- Prelude, Op. 28_ No. 12 In G Sharp Major (Presto).flac

Here the %Sort Artist% is "Frédéric Chopin" so it is kind of equivalent to the %Composer% tag. The %track artist% is "Tzimon Barto" making it equivalent to a %Performer% tag. Oh, and the underscore (_) in the %Title% just means there is a character that would be illegal in Windows filenames (here it was a : ).

This could also work for, say, a tribute album. You could set the %Sort Artist% to the band whose songs are being played (say, "Pink Floyd") and the %Track Artist% could be the band performing the songs ("Easy Star All Stars").

It also works for albums where only one song is by another artist. The first 7 songs on this CD are by Denim and Diamonds, but the last track is a cover/remix by Magic Recording Eye:

Denim And Diamonds\2004 Street Medics Unite!\CD0\08 -Magic Recording Eye- Street Medics Exit.flac

Here's a Various Artist album:

Various Artists\2001 Support Your Local Musician\CD0\20 -Badgerlore- Beetle Kill Wood Lord.flac

For all single disc albums I use "CD0" but here is a double album (with a discname...if there weren't a discname it would just be "CD2")

Cradle Of Filth\1998 Cruelty And The Beast\CD2 Bonus Disc\03 -~- Hallowed Be Thy Name.flac

Like I say, so far this works for every CD I own, but lemme know if you can think of any problems with it.
  • Last Edit: 30 November, 2005, 04:11:23 PM by digidistortions