Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: [howto] My current Linux CD ripping process (Read 7917 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

[howto] My current Linux CD ripping process

Ripping CDs in Linux (Tested in Fedora 20)

I probably should start a blog or something. I'm just hoping someone finds this useful. Sorry it is so long.

This is long and pedantic, but the script makes it easy for me to be pedantic.

There is an equivalent of EAC for Linux, written in python I believe by one of the GStreamer developers.

I haven't used it in several years because it is fracking slow.
I use cdparanoia because it is faster and still usually gets bit for bit good copy.

The only CD I've ever had a problem with using cdparanoia was a Shakira CD where EAC on Windows didn't do any better, it had intentionally flawed bits as a copy protection measure, so I ended up pirating it (I owned the CD so...)

This script uses the following CLI utilities that should be readily available in any modern Linux distribution:


EXECUTABLE (PACKAGE VERSION)
cd-info (libcdio 0.90)
bc (bc 1.06.95)
cdparanoia (cdparanoia 10.2)
sox (sox 14.4.1)
flac (flac 1.3.0)
shnsplit (shntool 3.0.10)

It rips and flac encodes in /tmp

I do this because I use tmpfs for /tmp and thus it is all done in memory and no disk writing is needed.
But don't use tmpfs for /tmp if you don't have a decent amount of memory, or you'll run out of space. tmpfs usually is configured to only use up to half of available system memory, and there are other things using tmpfs as well.

If you have 4GB or less and are using tmpfs for /tmp then change the line that says

TMP=`mktemp -d --tmpdir=/tmp paranoid.XXXXX`

to use something on a hard drive (like maybe --tmpdir=/home/username)

The script initially rips to a single file. I then do something controversial and re-sample to 48kHz.

Some here have expressed that is somewhat silly and perhaps they are right but I'm trying to keep everything 48kHz. I won't argue my reasons, but I do flac encode the 41.1kHz as ripped for archival purposes.

The script splits the re-sampled WAV using the cue information from the CDROM.

Since cdparanoia starts ripping from the beginning of the first track which often is not the beginning of the disc, the cue sheet has to be changed to split it.

I subtract the value of the first entry from all the entries, and the match I use for that does it by frame.

Then since shnsplit won't split a 48000 kHz WAV using frames, even though 48000 divides evenly by 75, the frames are converted to milliseconds. I do this using bc, dividing the number of frames by 75. It always rounds down but we are talking 1 millisecond.

I hope shntool is patched to allow MM:SS:NN but until it is, there will usually be a fraction of a 44.1 kHz frame difference in where a track is split because we are converting from base75 to decimal but we are talking less than a thousandth of a second difference, I doubt it ever makes a difference that can be audibly noticed.

After it splits the tracks, it attempts to make stubs for tagging.
The script does not do any tagging other than the tags added by flac on encode (replay-gain)

One stub it creates is called metaMaster.txt

the tags it puts in that file are almost all empty:

Code: [Select]
ALBUMARTIST=
ALBUM=
GENRE=
DATE=
RECORDED=
REMASTER=
CDUNIVERSE=


I personally use original release date for DATE because when I play Caress of Steel, I want 1975 displayed, not the 1997 remaster date. REMASTER is where I put that date, I only need it in the event I want to check what version of an album a song is from.

The CDUNIVERSE tag is also a personal tag, I plan to use it with an HTML5 icecast client to make it easy for a listener who wants the album to have a link where they can get it.

It also adds a RIPPED tag including the date the media was ripped. Again I use this for personal information.

If the number of created tracks matches the number of ISRC codes, the script then creates stubs for individual files and adds a TOTALTRACKS tag filled in to the metaMaster.txt file. My understanding is that when ISRC info is not there, cd-info will give a value filled with 0s so in theory the script *should* always create the track specific stubs but I am not positive.

The stubs for individual files contain an empty ARTIST and TITLE tag, and the filled in ISRC and TRACKNUMBER tag.

Finally, the script creates a shell script that makes it easy to add tags once you complete the stubs and are ready to tag the flac files.

The script is not run automatically as initially the tag information isn't there.

It uses metaflac to load the track data first. That if, say, the GENRE for the album is Rock but one of the songs is Classical Guitar, you can put GENRE=Classical Guitar into the track specific file and that will show in media players because it is the first GENRE tag.

After the generated metadata script adds the track specific metadata, it adds the tags from the metaMaster.txt file that are common tags for all the tracks.

Here is an example of a track specific stub I filled in :

Code: [Select]
ARTIST=Rush
TITLE=The Fountain Of Lamneth
TRACKNUMBER=5
ISRC=USMR17500048
CHAPTER001=00:00:00.000
CHAPTER001NAME=I. In The Valley
CHAPTER002=00:04:16.500
CHAPTER002NAME=II. Didacts And Narpets
CHAPTER003=00:05:17.500
CHAPTER003NAME=III. No One At The Bridge
CHAPTER004=00:09:37.000
CHAPTER004NAME=IV. Panacea
CHAPTER005=00:12:53.000
CHAPTER005NAME=V. Plateau
CHAPTER006=00:16:08.000
CHAPTER006NAME=VI. The Fountain


I filled ARTIST and TITLE, ISRC and TRACKNUMBER were already filled in for me, and I added the CHAPTER* tags that are not currently supported in any player I use but hopefully will be at some point. The script added the TRACKNUMBER and ISRC tags.

When the ripping script is finished ripping creating the various files, it puts them all in a tarball in the directory the script was called from and removed the temporary directory it ripped in.

I put it in a tarball because when I'm ready to tag the files, if I frack things up too badly I still have the tarball to start over from.

I call the script cdrip.sh and put in ~/bin/ and make it executable.

To then use it - put CD in the tray and run

cdrip.sh "Rush - Caress of Steel"

While it can handle spaces (but you must then use quotes as above), I would suggest the argument be sane characters that don't have special meaning in shell. About things like $ and !. Stick to alpha-numeric, -, _ and space. You can put special characters in the tags after ripping, but the argument is dictates file name and you don't want those in file names anyway. Perhaps I should update it to change anything that is not a sane character to an underscore, but I wrote the script for me and I know to use sane characters.

The script will do its thing and in above example, create the result as Rush_-_Caress_of_Steel.tar

-=-=-=-=-

You will need to make some modifications.

If your CDROM drive is not /dev/sr0 then you need to change the line that says

DEVICE="/dev/sr0"

You also need to set the offset for your drive model.

DRIVE_OFFSET="667"

is where you set that.

You can find out your drive model using the cd-info command, e.g. cd-info /dev/sr0.
Then find your offset at http://www.accuraterip.com/driveoffsets.htm

You may want to make other modification.

If you prefer the generated tagging script not encode to opus or you want different opus options, obviously modify that part.

If you prefer not to re-sample to 48000 then obviously change the line that says

Code: [Select]
sox --norm cdda.wav -b 16 ${DISK}.wav rate 48000 dither -s


to

Code: [Select]
mv cdda.wav ${DISK}.wav


and remove the

Code: [Select]
flac --best --replay-gain cdda.wav -o ${DISK}-as_ripped.flac


command and finally change the line that reads

Code: [Select]
cat split.txt |shnsplit ${DISK}.wav


to

Code: [Select]
cat fixed.NN |shnsplit ${DISK}.wav


That will result in splitting based on the frame information, which can do since it is 44.1 kHz.

One final note - when you run the command, as it gets towards the end of the CDROM during the rip you may get a bunch of warnings about invalid SCSI requests. According to the cdparanoia man page, this is normal when using the -O switch.

-=-

Here is the script:

Code: [Select]
#!/bin/bash
#~/bin/cdrip.sh

#The offset is specific to model of drive
DEVICE="/dev/sr0"
DRIVE_OFFSET="667"
#Good chance the drive will bitch at you about SCSI errors
# this is mentioned in cdparanoia man page and is correct

DISK=`echo "$1" |sed -e s?" "?"_"?g`

CWD="`pwd`"
TMP=`mktemp -d --tmpdir=/tmp paranoid.XXXXX`

pushd ${TMP}

cd-info ${DEVICE} > cdinfo.txt

grep "ISRC:" cdinfo.txt > ISRC.txt

N=`grep -n "^Media Catalog" cdinfo.txt |head -1 |cut -d":" -f1`
N=`echo "${N} -1" |bc`
head -${N} cdinfo.txt > tmp.txt
M=`grep -n "^CD-ROM" tmp.txt |tail -1 |cut -d":" -f1`
T=`echo "${N} - ${M}" |bc`
tail -${T} tmp.txt \
|grep " audio " \
|sed -e s?"^[ \t]*"?""? \
|cut -d" " -f2 > cue.NN

rNN=`head -1 cue.NN |cut -d":" -f3`
rSS=`head -1 cue.NN |cut -d":" -f2`
rMM=`head -1 cue.NN |cut -d":" -f1`
#make them integers
rNN=`echo "${rNN} + 0" |bc`
rSS=`echo "${rSS} + 0" |bc`
rMM=`echo "${rMM} + 0" |bc`

cat cue.NN |while read line; do
  NN=`echo "${line}" |cut -d":" -f3`
  SS=`echo "${line}" |cut -d":" -f2`
  MM=`echo "${line}" |cut -d":" -f1`
  NN=`echo "${NN} + 0" |bc`
  SS=`echo "${SS} + 0" |bc`
  MM=`echo "${MM} + 0" |bc`
  if [ ${rNN} -le ${NN} ]; then
    NN=`echo "${NN} - ${rNN}" |bc`
  else
    NN=`echo "75 + ${NN}" |bc`
    NN=`echo "${NN} - ${rNN}" |bc`
    if [ ${SS} -gt 0 ]; then
      SS=`echo "${SS} - 1" |bc`
    else
      SS=59
      MM=`echo "${MM} - 1" |bc`
    fi
  fi
  if [ ${rSS} -le ${SS} ]; then
    SS=`echo "${SS} - ${rSS}" |bc`
  else
    SS=`echo "60 + ${SS}" |bc`
    SS=`echo "${SS} - ${rSS}" |bc`
    MM=`echo "${MM} - 1" |bc`
  fi
  MM=`echo "${MM} - ${rMM}" |bc`
  #fix format
  if [ ${NN} -lt 10 ]; then
    if [ ${NN} -eq 0 ]; then
      NN="00"
    else
      NN="0${NN}"
    fi
  fi
  if [ ${SS} -lt 10 ]; then
    if [ ${SS} -eq 0 ]; then
      SS="00"
    else
      SS="0${SS}"
    fi
  fi
  if [ ${MM} -lt 10 ]; then
    if [ ${MM} -eq 0 ]; then
      MM="00"
    else
      MM="0${MM}"
    fi
  fi
  echo "${MM}:${SS}:${NN}" >> fixed.NN
done

SKIP=0
cat fixed.NN |while read line; do
  MM=`echo "${line}" |cut -d":" -f1`
  SS=`echo "${line}" |cut -d":" -f2`
  NN=`echo "${line}" |cut -d":" -f3`
  sss=`echo "scale=3;${NN} / 75" |bc`
  if [ ${sss} == "0" ]; then
    sss=".000"
  fi
  if [ ${SKIP} -gt 0 ]; then
    echo "${MM}:${SS}${sss}" >> split.txt
  else
    SKIP=`echo "${SKIP} + 1" |bc`
  fi
done

cdparanoia -d ${DEVICE} -O ${DRIVE_OFFSET} -w 1-

if [ $? != 0 ]; then
  exit 1
fi

sox --norm cdda.wav -b 16 ${DISK}.wav rate 48000 dither -s

flac --best --replay-gain cdda.wav -o ${DISK}-as_ripped.flac
flac --best --replay-gain ${DISK}.wav

cat split.txt |shnsplit ${DISK}.wav

flac --best --replay-gain split-track*

rm -f *.wav

#start the meta-data tracks if possible
cat <<EOF > metaMaster.txt
ALBUMARTIST=
ALBUM=
GENRE=
DATE=
RECORDED=
REMASTER=
CDUNIVERSE=
EOF

RIPPED="`date +%Y-%m-%d`"
echo "RIPPED=${RIPPED}" >> metaMaster.txt

TRACKS=`ls split-track* |wc -l`
ISRCN=`cat ISRC.txt |wc -l`

if [ ${TRACKS} -eq ${ISRCN} ]; then
echo "TOTALTRACKS=${TRACKS}" >> metaMaster.txt
cat ISRC.txt |while read line; do
  ISRC=`echo "${line}" |cut -d ":" -f2 |sed -e s?" "?""?`
  NUM=`echo "${line}" |cut -d":" -f1 |sed -e s?"  "?":"? |sed -e s?" "?":"?g |cut -d ":" -f2`
  MNUM=${NUM}
  if [ ${MNUM} -lt 10 ]; then
    MNUM="0${NUM}"
  fi
  echo "ARTIST=" > meta${MNUM}.txt
  echo "TITLE=" >> meta${MNUM}.txt
  echo "TRACKNUMBER=${NUM}" >> meta${MNUM}.txt
  echo "ISRC=${ISRC}" >> meta${MNUM}.txt
done
fi

cat <<EOF > loadMetadata.sh
#!/bin/bash
# CAUTION - Fill in the track stubs and metaMaster data first

N=\`ls ${DISK}-track* |wc -l\`
COUNT=0
while [ \${COUNT} -lt \${N} ]; do
  NN=\`echo "\${COUNT} + 1" |bc\`
  if [ \${NN} -lt 10 ]; then
    NN="0\${NN}"
  fi
  metaflac --import-tags-from=meta\${NN}.txt ${DISK}-track\${NN}.flac
  COUNT=\`echo "\${COUNT} + 1" |bc\`
done

ls *.flac |while read flac; do
  metaflac --import-tags-from=metaMaster.txt \${flac}
done

#encode to opus
ls ${DISK}-track* |while read flac; do
  opus=\`echo \${flac} |sed -e s?"\.flac$"?".opus"?\`
  opusenc --bitrate 64 \${flac} \${opus}
done
EOF


ls split-track* |while read line; do
  N="`echo ${line} |sed -e s?"^split-track"?""? |sed -e s?"\.flac$"?""?`"
  mv split-track${N}.flac ${DISK}-track${N}.flac
done
mkdir ${DISK}
mv *.flac ${DISK}/
mv ISRC.txt ${DISK}/
mv split.txt ${DISK}/
mv cdinfo.txt ${DISK}/
mv meta* ${DISK}/
mv loadMetadata.sh ${DISK}/
tar -cf ${DISK}.tar ${DISK}
mv ${DISK}.tar ${CWD}/
popd
sync && sync
rm -rf ${TMP}

[howto] My current Linux CD ripping process

Reply #1
change

Code: [Select]
grep "ISRC:" cdinfo.txt  > ISRC.txt


to

Code: [Select]
grep "ISRC:" cdinfo.txt |grep -v "000000000000" > ISRC.txt


The issue is some CDs that aren't all music will have an extra ISRC reported by cd-info containing all 0s, resulting in a mis-match between number of ISRC codes and number of audio tracks, and tagging stubs thus won't be created.

[howto] My current Linux CD ripping process

Reply #2
change

Code: [Select]
grep "ISRC:" cdinfo.txt  > ISRC.txt


to

Code: [Select]
grep "ISRC:" cdinfo.txt |grep -v "000000000000" > ISRC.txt


The issue is some CDs that aren't all music will have an extra ISRC reported by cd-info containing all 0s, resulting in a mis-match between number of ISRC codes and number of audio tracks, and tagging stubs thus won't be created.

Great script! Very useful and some neat tricks. I think that this modification though won't work properly in the case where there are no valid ISRC found and cdinfo sets them all to 000000000000. This results in an empty ISRC.txt file.


[howto] My current Linux CD ripping process

Reply #4
change

Code: [Select]
grep "ISRC:" cdinfo.txt  > ISRC.txt


to

Code: [Select]
grep "ISRC:" cdinfo.txt |grep -v "000000000000" > ISRC.txt


The issue is some CDs that aren't all music will have an extra ISRC reported by cd-info containing all 0s, resulting in a mis-match between number of ISRC codes and number of audio tracks, and tagging stubs thus won't be created.

Great script! Very useful and some neat tricks. I think that this modification though won't work properly in the case where there are no valid ISRC found and cdinfo sets them all to 000000000000. This results in an empty ISRC.txt file.


Yes, I've redone the ISRC thing all together.
I'll update it shortly

[howto] My current Linux CD ripping process

Reply #5
Any reason why not use morituri?
https://github.com/thomasvs/morituri


Yes, it's painfully slow and I when I tried it, I never got a different result from cdparanoia when I looked at md5sum of resulting wav files.

I know it can happen, but it is rare enough that I won't worry about it unless I can hear something wrong with a file during playback.

[howto] My current Linux CD ripping process

Reply #6
Okay looks like I can't edit the original post, so this is tweeked version of the script with better ISCR handlng:

Code: [Select]
#!/bin/bash
#~/bin/cdrip.sh

#The offset is specific to model of drive
DEVICE="/dev/sr0"
DRIVE_OFFSET="667"
#Good chance the drive will bitch at you about SCSI errors
# this is mentioned in cdparanoia man page and is correct

DISK=`echo "$1" |sed -e s?" "?"_"?g`

CWD="`pwd`"
TMP=`mktemp -d --tmpdir=/tmp paranoid.XXXXX`

pushd ${TMP}

cd-info ${DEVICE} > cdinfo.txt

grep "ISRC:" cdinfo.txt > ISRC.txt

N=`grep -n "^Media Catalog" cdinfo.txt |head -1 |cut -d":" -f1`
N=`echo "${N} -1" |bc`
head -${N} cdinfo.txt > tmp.txt
M=`grep -n "^CD-ROM" tmp.txt |tail -1 |cut -d":" -f1`
T=`echo "${N} - ${M}" |bc`
tail -${T} tmp.txt \
|grep " audio " \
|sed -e s?"^[ \t]*"?""? \
|cut -d" " -f2 > cue.NN

rNN=`head -1 cue.NN |cut -d":" -f3`
rSS=`head -1 cue.NN |cut -d":" -f2`
rMM=`head -1 cue.NN |cut -d":" -f1`
#make them integers
rNN=`echo "${rNN} + 0" |bc`
rSS=`echo "${rSS} + 0" |bc`
rMM=`echo "${rMM} + 0" |bc`

cat cue.NN |while read line; do
  NN=`echo "${line}" |cut -d":" -f3`
  SS=`echo "${line}" |cut -d":" -f2`
  MM=`echo "${line}" |cut -d":" -f1`
  NN=`echo "${NN} + 0" |bc`
  SS=`echo "${SS} + 0" |bc`
  MM=`echo "${MM} + 0" |bc`
  if [ ${rNN} -le ${NN} ]; then
    NN=`echo "${NN} - ${rNN}" |bc`
  else
    NN=`echo "75 + ${NN}" |bc`
    NN=`echo "${NN} - ${rNN}" |bc`
    if [ ${SS} -gt 0 ]; then
      SS=`echo "${SS} - 1" |bc`
    else
      SS=59
      MM=`echo "${MM} - 1" |bc`
    fi
  fi
  if [ ${rSS} -le ${SS} ]; then
    SS=`echo "${SS} - ${rSS}" |bc`
  else
    SS=`echo "60 + ${SS}" |bc`
    SS=`echo "${SS} - ${rSS}" |bc`
    MM=`echo "${MM} - 1" |bc`
  fi
  MM=`echo "${MM} - ${rMM}" |bc`
  #fix format
  if [ ${NN} -lt 10 ]; then
    if [ ${NN} -eq 0 ]; then
      NN="00"
    else
      NN="0${NN}"
    fi
  fi
  if [ ${SS} -lt 10 ]; then
    if [ ${SS} -eq 0 ]; then
      SS="00"
    else
      SS="0${SS}"
    fi
  fi
  if [ ${MM} -lt 10 ]; then
    if [ ${MM} -eq 0 ]; then
      MM="00"
    else
      MM="0${MM}"
    fi
  fi
  echo "${MM}:${SS}:${NN}" >> fixed.NN
done

cat fixed.NN |while read line; do
  MM=`echo "${line}" |cut -d":" -f1`
  SS=`echo "${line}" |cut -d":" -f2`
  NN=`echo "${line}" |cut -d":" -f3`
  sss=`echo "scale=3;${NN} / 75" |bc`
  if [ ${sss} == "0" ]; then
    sss=".000"
  fi
  echo "${MM}:${SS}${sss}" >> split.txt
done

cdparanoia -d ${DEVICE} -O ${DRIVE_OFFSET} -w 1-

sox --norm cdda.wav -b 16 ${DISK}.wav rate 48000 dither -s

flac --best --replay-gain cdda.wav -o ${DISK}-as_ripped.flac
flac --best --replay-gain ${DISK}.wav

tail -n +2 split.txt |shnsplit ${DISK}.wav

flac --best --replay-gain split-track*

rm -f *.wav

#start the meta-data tracks if possible
cat <<EOF > metaMaster.txt
ALBUMARTIST=
ALBUM=
GENRE=
DATE=
RECORDED=
REMASTER=
CDUNIVERSE=
RECORD_LABEL=
CATALOG=
EOF

RIPPED="`date +%Y-%m-%d`"
echo "RIPPED=${RIPPED}" >> metaMaster.txt

TRACKS=`ls split-track* |wc -l`
echo "TOTALTRACKS=${TRACKS}" >> metaMaster.txt

COUNT=0
while [ ${COUNT} -lt ${TRACKS} ]; do
  NUM=`echo "${COUNT} + 1" |bc`
  MNUM=${NUM}
  if [ ${MNUM} -lt 10 ]; then
    MNUM="0${NUM}"
  fi
  echo "ARTIST=" > meta${MNUM}.txt
  echo "TITLE=" >> meta${MNUM}.txt
  echo "TRACKNUMBER=${NUM}" >> meta${MNUM}.txt
  ISRC="`grep " ${NUM} ISRC:" ISRC.txt |cut -d":" -f2 |sed -e s?"^ "?""?`"
  echo "ISRC=${ISRC}" >> meta${MNUM}.txt
  COUNT=${NUM}
done

cat <<EOF > loadMetadata.sh
#!/bin/bash
# CAUTION - Fill in the track stubs and metaMaster data first

N=\`ls ${DISK}-track* |wc -l\`
COUNT=0
while [ \${COUNT} -lt \${N} ]; do
  NN=\`echo "\${COUNT} + 1" |bc\`
  if [ \${NN} -lt 10 ]; then
    NN="0\${NN}"
  fi
  metaflac --import-tags-from=meta\${NN}.txt ${DISK}-track\${NN}.flac
  COUNT=\`echo "\${COUNT} + 1" |bc\`
done

ls *.flac |while read flac; do
  metaflac --import-tags-from=metaMaster.txt \${flac}
done

#encode to opus
ls ${DISK}-track* |while read flac; do
  opus=\`echo \${flac} |sed -e s?"\.flac$"?".opus"?\`
  opusenc --bitrate 64 \${flac} \${opus}
done
EOF


ls split-track* |while read line; do
  N="`echo ${line} |sed -e s?"^split-track"?""? |sed -e s?"\.flac$"?""?`"
  mv split-track${N}.flac ${DISK}-track${N}.flac
done
mkdir ${DISK}
mv *.flac ${DISK}/
mv ISRC.txt ${DISK}/
mv split.txt ${DISK}/
mv cdinfo.txt ${DISK}/
mv meta* ${DISK}/
mv loadMetadata.sh ${DISK}/
tar -cf ${DISK}.tar ${DISK}
mv ${DISK}.tar ${CWD}/
popd
sync && sync
rm -rf ${TMP}



[howto] My current Linux CD ripping process

Reply #7
It seems to me that you've recreated ABCDE (A Better CD Encoder), which is also a script frontend to CDParanoia, FLAC, LAME, ID3Tag etc.

http://code.google.com/p/abcde/

 

[howto] My current Linux CD ripping process

Reply #8
One thing I just found out I need to do (maybe abcde does it?) is detect pre-emphasis so the ripped wav can be adjusted if needed.

Hopefully the cd-info command will have something in its output I can use to detect that.

With respect to abcde - it does some CDDB lookup stuff I don't want to do, I prefer to manual tag - and I don't think it resamples to 48000 which I prefer to do though some might find that to be silly.