BASIC PROGRAM UTILITY TOOL
TO INSPECT BLOCKS OF DIFFERENCES
IN MULTIPLE SETS OF ALMOST IDENTICAL .WAV FILES
The following BASIC program analyzes a set of nearly identical .wav files, finds block-chunks areas of difference and displays a grid of numbers for interactive inspection. It analyzes each such block and collects summary statistics into a results file.
I am experimenting with a set of 7 copies of a track from a set of audio CDs that have become unplayable (rotted). I was only able to get data from 2 discs: 5 copies from one disc and 2 copies from another.
Just eyeballing the grid of numbers, with the differing values highlighted, the first impression is that this data recovery project looks hopeful. In many cases the errors are isolated and should be easy to weed out. The blocks of errors are mostly not long runs. Usually there are not more than two values appearing for a given sample.
This 50 second stereo track is somewhat over 2,000,000 L-R 16-bit sample pairs. Running the program on all 7 copies produced this (highly edited) output:
E1.wav E2.wav E3.wav E4.wav E5.wav N1.wav N2.wav
2200888 BLOCK 8363: BLKLINESBAD 1938[1853,3677,2247,222,220,70,17,12,10]35 OVERS, MAX 1938
, VALSMAX 2[0,8216,144,2,1,0,0] // TOTMINMAX 3[3085,4054,1176,46,2,0,0]
Interpratation:
2200888 samples processed
8363 blocks (chunks) of errors
1853 blocks contained 1 line of differing samples
3677 blocks contained 2 lines of differing samples
2247 blocks contained 3 lines of differing samples...
35 blocks contained over 9 lines of differing samples
the longest block contained 1938 lines of differing samples
Each set of samples is analyzed to count how many different values appear, and count the total of "minority" values (how many values are different from the one that is most common). Each block has a max result for those two measures; these results are tallied for all the blocks.
8216 blocks contained a max of 2 different values for any one sample
144 blocks contained a max of 3 different values for any one sample
2 blocks contained a max of 4 different values for any one sample
1 blocks contained a max of 5 different values for any one sample
Usually there were only 2 different values, and there were no cases of 6 or 7 different values.
3085 blocks contained a max of 1 minority value for any one sample
4054 blocks contained a max of 2 minority value for any one sample
1176 blocks contained a max of 3 minority value for any one sample...
There were only 48 cases where the most common value was not in the majority, and no cases where all the values were different -- there were always at least two of the same value.
E4 and E5 were made by fre:ac without jitter correction. The other copies were made by fre:ac with jitter correction. It was not feasible to get any data with cdparanoia enabled in fre:ac. EAC was also unable to make any progress (it is very difficult to get any data at all from these rotted CD-Rs). Excluding E4 and E5 produces this output:
E1.wav E2.wav E3.wav N1.wav N2.wav
2201472 BLOCK 7524: BLKLINESBAD 588[1716,3405,1914,194,187,49,13,10,9]27 OVERS, MAX 588
, VALSMAX 0[0,7412,110,2,0,0,0] // TOTMINMAX 0[2688,4801,35,0,0,0,0]
OVERS:,10,11,10,10,23,10,10,11,14,10,12,13,10,31,13,10,10,11,10,425,11,10,11,10,10,11,588
The number of bad blocks of data has decreased by 8%. The number of blocks that had over 9 bad lines has decreased from 35 to 27. Only 35 blocks contained a case of 3 "minority" votes (out of 5), so the count of "majority-minority" blocks has been reduced from 48 to 35. The new "OVERS" output shows that almost all the blocks contained no more than 14 bad lines; the exceptions were 23, 31, and one big run of 425. (The 588 seems to be exactly one CD sector of glitch in the junk at the end of the track.)
The program takes about an hour to run on these sets of 50 seconds tracks, on an underpowered WinXP computer. Not good, but tolerable considering this is a free BASIC interpreter, that is pretty easy to use to test ad hoc changes, to get this utility tool to do what you need...
I've spared you the details of the grids of numbers that are output, which are really the simplest and most useful feature of this tool. It only takes the program about a minute to start producing such output. Staring at the numbers will give you a good idea whether this approach is likely to be fruitful for your data recovery project.
It seems like just auto-picking the median value would be a very effective approach to reconciling these test copies. The next step is to find a way to evaluate that idea. Access to the numerical realm is fundamental, but also having access to the associated waveforms graphics would be much better.
' CDrot.bas (SmallBASIC FLTK 0.10.6 Windows XP)
' CD disc rot processing
' kd 27 jan 2012
' FIND DIFFERENCES IN .WAV FILE AND DUMP DATA BLOCK
' DUMP3 - BEFORE AND AFTER DATA BLOCK VERSION, WITH BLOCK STATISTICS AND OUTPUT FILE
PRINT DATE$;" ";TIME$;" START of CDrot-DUMP3-27JAN12.bas"
?
OUT=FREEFILE
NEWLINE$=CHR(13)+CHR(10)
NEWLINE$=CHR(13) ' TRY TO FIND A WAY IN WINDOWS FOR TEXT FILES TO WORK IN BOTH NOTEPAD AND WORDPAD
OPEN "DUMP3-OUT.TXT" FOR APPEND AS #OUT
PRINT #OUT,;NEWLINE$
PRINT #OUT, DATE$;" ";TIME$;" START of CDrot-DUMP3-27JAN12.bas";NEWLINE$
DIM PAST$(9) ' STORE PAST DATA LINES
SAMECNT=999 ' COUNT SUCCESSIVE LINES OF ALL-SAME DATA
BLOCK=0 ' COUNT DATA BLOCKS DUMPED OUT
BLKBAD=0 : MAXBLKBAD=0 ' COUNT BAD LINES IN A DATA BLOCK
VALSMAX=0 ' MAXIMUM NUMBER OF DIFFERENT VALUES IN ONE CHANNEL SET
TOTMINMAX=0 ' MAXIMUM NUMBER OF MINORITY VALUES IN ONE CHANNEL SET
DIM VMAX(1 TO 7),TMAX(1 TO 7) ' ARRAYS OF VALSMAX AND TOTMINMAX BLOCK RESULT TALLIES
BLKBADOVR=0 : BBOVR$="" : DIM BLKBADA(1 TO 9) ' TALLY BLKBAD RESULTS
N=0 ' COUNT THE INPUT/OUTPUT DATA SAMPLES
REM OPEN INPUT FILES
DIM INFILE$(9),F(9)
INFILE$(1)="E1.wav"
INFILE$(2)="E2.wav"
INFILE$(3)="E3.wav"
' INFILE$(4)="E4.wav"
' INFILE$(5)="E5.wav"
INFILE$(6)="N1.wav"
INFILE$(7)="N2.wav"
INFILES=0
' N=15000 : PRINT "SKIPPING SAMPLES: ";N
FOR I = 1 TO 9
IF INFILE$(I)<>""
PRINT INFILE$(I);" ";
PRINT #OUT, INFILE$(I);" ";
F(I)=FREEFILE
OPEN INFILE$(I) FOR INPUT AS #F(I)
INFILES += 1
REM SKIP OVER FILE HEADERS
JJ=11 : IF LEFT(INFILE$(I),1)="N" THEN JJ=JJ+588 ' 588 SAMPLES ARE ONE SECTOR SHIFT
JJ=JJ+N ' SPEED UP START
FOR J = 1 TO 4*JJ
DISCARD=BGETC(F(I))
NEXT J
ENDIF
NEXT I
PRINT : PRINT INFILES;" INPUT FILES" : PRINT
PRINT #OUT,;NEWLINE$ : PRINT #OUT, INFILES;" INPUT FILES";NEWLINE$
REM READ AND PROCESS ONE SET OF SAMPLES FROM EACH INPUT FILE
K215=2^15 : K216=2^16 : KBIG=K216+K215
DIM OUT$(9)
DIFF=0
100
L$="" : R$=""
LD=0 : RD=0
LOLD=KBIG ' INITIALIZE WITH INVALID SAMPLE VALUES
ROLD=KBIG
DIM LA(9),RA(9) ' ARRAYS OF LEFT AND RIGHT VALUES
FOR I = 0 TO 9
LA(I)=KBIG : RA(I)=KBIG
NEXT I
FOR I = 1 TO 9
IF INFILE$(I)<>""
L1=BGETC(F(I))
L2=BGETC(F(I))
R1=BGETC(F(I))
R2=BGETC(F(I))
LL=L2*256+L1
RR=R2*256+R1
IF LL>=K215 THEN LL=LL-K216
IF RR>=K215 THEN RR=RR-K216
LA(I)=LL : RA(I)=RR
IF LL<>LOLD THEN LD=LD+1
IF RR<>ROLD THEN RD=RD+1
LLL$=" " : IF LL<>LOLD THEN LLL$=" *"
RRR$=" " : IF RR<>ROLD THEN RRR$=" *"
LOLD=LL : ROLD=RR
L$=L$+LLL$+FORMAT("#####0",LL)
R$=R$+RRR$+FORMAT("#####0",RR)
ENDIF
NEXT I
N=N+1 : ' IF N>15 THEN 900
LLL$=LEFT(" *********",LD)
RRR$=LEFT(" *********",RD)
'IF (LD>1 OR RD>1) THEN PRINT N,L$,R$
INSERT PAST$,0,(FORMAT("##,###,000",N)+L$+" /"+R$)
DELETE PAST$,10
SAME = NOT (LD>1 OR RD>1)
IF NOT SAME
SORT LA ' GROUP SAME VALUES IN ORDER
V=LA(0) : LA(0)=1 : I=1 ' REPLACE THE VALUES WITH A COUNT OF SAME ONES
WHILE I<=UBOUND(LA)
IF LA(I)=V
DELETE LA,I
LA(I-1) = LA(I-1)+1
ELSE
V=LA(I) : LA(I)=1 : I += 1
ENDIF
WEND
DELETE LA,UBOUND(LA) ' DELETE LAST VALUE WHICH IS COUNT OF INVALID KBIGS
SORT LA ' SORT THE VALID COUNTS OF SAME VALUES
VALS=UBOUND(LA)+1 : IF VALS>VALSMAX THEN VALSMAX=VALS
TOTMIN=SUM(LA)-LA(UBOUND(LA)) : IF TOTMIN>TOTMINMAX THEN TOTMINMAX=TOTMIN
SORT RA ' GROUP SAME VALUES IN ORDER
V=RA(0) : RA(0)=1 : I=1 ' REPLACE THE VALUES WITH A COUNT OF SAME ONES
WHILE I<=UBOUND(RA)
IF RA(I)=V
DELETE RA,I
RA(I-1) = RA(I-1)+1
ELSE
V=RA(I) : RA(I)=1 : I += 1
ENDIF
WEND
DELETE RA,UBOUND(RA) ' DELETE LAST VALUE WHICH IS COUNT OF INVALID KBIGS
SORT RA ' SORT THE VALID COUNTS OF SAME VALUES
VALS=UBOUND(RA)+1 : IF VALS>VALSMAX THEN VALSMAX=VALS
TOTMIN=SUM(RA)-RA(UBOUND(RA)) : IF TOTMIN>TOTMINMAX THEN TOTMINMAX=TOTMIN
ENDIF
SAMEMAX=3
IF SAMECNT=SAMEMAX ' POSSIBLE END OF DATA DUMP BLOCK
IF SAME
SAMECNT += 1 ' YES- END OF BLOCK
IF BLKBAD>9
BLKBADOVR += 1
BBOVR$=BBOVR$+","+STR(BLKBAD)
ELSE
BLKBADA(BLKBAD) = BLKBADA(BLKBAD)+1 ' TALLY RESULTS
ENDIF
IF BLKBAD>MAXBLKBAD THEN MAXBLKBAD=BLKBAD
VMAX(VALSMAX)=VMAX(VALSMAX)+1 : TMAX(TOTMINMAX)=TMAX(TOTMINMAX)+1 ' TALLY RESULTS
PRINT "BLOCK ";BLOCK;": BLKLINESBAD ";BLKBAD;BLKBADA;BLKBADOVR;" OVERS, MAX ";MAXBLKBAD;
PRINT ", VALSMAX ";VALSMAX;VMAX;" // TOTMINMAX ";TOTMINMAX;TMAX
PRINT "OVERS:";BBOVR$
PRINT #OUT,N;" BLOCK ";BLOCK;": BLKLINESBAD ";BLKBAD;BLKBADA;BLKBADOVR;" OVERS, MAX ";MAXBLKBAD;;NEWLINE$
PRINT #OUT, ", VALSMAX ";VALSMAX;VMAX;" // TOTMINMAX ";TOTMINMAX;TMAX;NEWLINE$
PRINT #OUT, "OVERS:";BBOVR$;NEWLINE$
JUNK$="" : ' IF VALSMAX>2 THEN INPUT "WAITING ";JUNK$
IF JUNK$="STOP" THEN 900
VALSMAX=0 : TOTMINMAX=0 ' RESET FOR NEXT BLOCK
ELSE
SAMECNT = 0
PRINT PAST$(0) ' NOT END OF BLOCK - KEEP DUMPING
BLKBAD += 1
ENDIF
ELSEIF SAMECNT>SAMEMAX ' NOT CURRENTLY OUTPUTTING
IF SAME
SAMECNT += 1 ' SAME BLANKNESS CONTINUES
ELSE
SAMECNT = 0 ' START DUMPING A BLOCK OUT
BLOCK += 1
PRINT "BLOCK ";BLOCK
PRINT PAST$(3)
PRINT PAST$(2)
PRINT PAST$(1)
PRINT PAST$(0)
BLKBAD=1 ' START COUNTING BAD LINES IN THIS BLOCK
ENDIF
ELSE
IF SAME
SAMECNT += 1
ELSE
SAMECNT = 0
BLKBAD += 1
ENDIF
PRINT PAST$(0) ' CONTINUE OUTPUTTING
ENDIF
' IF LD>4 THEN 900
' IF NOT EOF(F(1)) THEN 100
FOR I = 1 TO 9
IF INFILE$(I)<>""
IF EOF(F(I)) THEN 800
ENDIF
NEXT I
GOTO 100 ' LOOP BACK UP TO ANOTHER SET OF SAMPLES IF NO EOF
800
PRINT "EOF"
900
PRINT
PRINT N;" BLOCK ";BLOCK;": BLKLINESBAD ";BLKBAD;BLKBADA;BLKBADOVR;" OVERS, MAX ";MAXBLKBAD
PRINT ", VALSMAX ";VALSMAX;VMAX;" // TOTMINMAX ";TOTMINMAX;TMAX
PRINT "OVERS:";BBOVR$
PRINT #OUT,N;" BLOCK ";BLOCK;": BLKLINESBAD ";BLKBAD;BLKBADA;BLKBADOVR;" OVERS, MAX ";MAXBLKBAD;;NEWLINE$
PRINT #OUT, ", VALSMAX ";VALSMAX;VMAX;" // TOTMINMAX ";TOTMINMAX;TMAX;NEWLINE$
PRINT #OUT, "OVERS:";BBOVR$;NEWLINE$
FOR I = 1 TO 9
IF INFILE$(I)<>""
CLOSE #F(I)
ENDIF
NEXT I
PRINT #OUT, DATE$;" ";TIME$;" END OF CD rot DUMP3 27JAN12";NEWLINE$
PRINT #OUT,;NEWLINE$ : PRINT #OUT,;NEWLINE$
CLOSE #OUT
PRINT DATE$;" ";TIME$;" END OF CD rot DUMP3 27JAN12"
STOP