Skip to main content
Topic: Have a working 'expander' based on DolbyA (not same design) -- works well. (Read 11344 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Commentary about doing an encoder and/or future work

Reply #125
I have gotten questions about the DolbyA decoder, and why I didn't do a DolbyA encoder also?   Well -- there are at least two reasons, but the first and most important is that most of the problem is to copy music from old archives and be able to use it (process/mix/finalize/produce) with current technology.  The second reason is that DolbyA is so much weaker NR than more recent technologies, that I don't suggest using it for encoding.  Perhaps I could do an encoder that produces much less distortion than anything else (like the decoder has less distortion than anything else), but why encode into DolbyA?   If/when I do a DolbySR decoder, then an encoder with that technique might be more useful.  The big problem with SR is that it is incredibly more complex than DolbyA.

Please note that this DolbyA compatible decoder truly sounds similar to a real DolbyA (sans fuzz, distortion, lack of clarity) with a very similar freq response balance.  (Even if someone uses the cat22/360/361 as a design basis in digital form, the result will not likely sound similar because the filters don't emulate well.  I found that my decoder would sound similar (but cleaner) to another known DolbyA decoder if I used the DolbyA design as a reference.)  I rejected those filters -- and I was worried that I could come up with something compatible that sounded more accurate, but I was lucky to find a better solution.

Doing a REALLY compatible/similar sounding DolbyA encoder would be a project similar in scope to the decoder project, but with even fewer users and less usefulness.   Frankly, even if I had a reel-to-reel deck, I would NOT bother encoding anything into DolbyA form.  DolbyA MIGHT be more useful than SR for very long term archival purposes (because of the simpler DolbyA decoder design), but still -- I'd try to find something BETTER QUALITY than DolbyA.   Knowing what I know now -- I am somewhat suspect of the quality of any of the dynamic gain schemes which can-not mathematically be reversed, and DolbyA (however being close to being reversable) is not reversable enough.

My criteria for a long-term analog compatible NR (and possibly dynamic range extension) system would be a constant compression ratio, multi-band system with mathematically designed analog&digital compatible characteristics for the filters.  At least, if properly executed and working on a deck with a fairly flat response, it will be totally reversable (distortion products even better cancelling.)

So, what I suggest if someone wants to use a nearly analog compaitble NR system, it would be closer to the HI-COM (afair -- not sure) type design.   The multi-band approach is good, but the DolbyA filters are kind of finicky, and I'd rather see the system designed from specification rather than HW design.   If there is an interest, and it would be really used if it works -- I could do a rough specification, and an implementation of both a HW and SW compaible design that has the best features of DolbyA and DBX.   One good thing about a compatible SW design is that it can be prototyped using software modules that act very similarly to real hardware.  For example, I'd base the design on dB linear technology (like THATCORP) stuff, and use standard filter design techniques that can be implemented in HW & SW, e.g. well constrained IIR filters that emulate well in HW.  FIR filters can be more ideal, but are also not easy to emulate in HW.   After doing a rough design and a SW implementation (probably 2X easier than my DolbyA effort), then a real HW design could be started.  Before that, I'd do as much of a spice simulation as possible.

The end result of such an effort would be at least 25dB NR, almost no level sensitivity, almost no modulation-type noise, very good transient response, much better distortion than almost any other system.  Also, encoding/decoding could be done on computer or in hardware, and the result of the encoding could be designed to be listenable.   So, it would have all of the advantages of DBX, DolbyA and DolbySR, and almost none of the disadvantages of any.

But doing new DolbyA encoding operations are only useful in museums where there are demos of ancient technologies :-).


Usage hint for the DolbyA compatible decoder

Reply #126
I have a usage hint -- and let you know how I normally use the decoder.  Except for producing my own listening archives or demos, I use the decoder in realtime most of the time.  When I use it -- I don't use the default quality settings, but usually use the highest quality setting available.  There is also a 'heroic effort' setting when there might be a lot of high frequency intermod (e.g. lots of kids voices will do it.)

The normal setting runs a bit more quickly than the higher quality settings, and an upcoming version runs about 20% faster in general (at least that much better -- I just started on optimizations.)  However -- we are talking about listening quality here...

There is a very aggressive intermod removal mode, which doesn't have much of a change in frequency response or anything like that, but helps to keep voices and instruments clearly separate (intermod makes things mush together.)  The magical setting is: "--ai=high", which means set the anti-intermod on high.   There is another super-aggressive setting, which uses the aggressive improvements in the --ai=high mode, but also narrows the frequency bands that lend themselves to distortion production.  It really doesn't decrease the ultimate frequency response, but can make the music have a little less clarity.  That aggressive setting is "--ai=max".

The default mode is "--ai=med", but you should never need to specify it because it is currently the default.  The biggest disadvantage of the --ai=high mode is that it runs about 20% slower than the --ai=med mode.  Once I speed up the code to run perhaps 2-3X faster (yes, I have some ideas), I might move the default to be --ai=high at that point.

In casual listening, I always use "--ai=high", because my computer is fast enough.  The perhaps overly aggressive "--ai=max" doesn't run any slower than "--ai=high", it is just that some of the parameters are slightly different.


Answer to possibly encountering 1/2 encoded DolbyA

Reply #128
Thanks for that pointer!!!  I have been hoping to find more concrete evidence that the 1/2 encoder technique was used, and below is my comment on my possible encounters with the use of that technique.   DolbyA just about maxes out the fastest compression that can be done in HW without splatting intermod all over the place -- so I can understand the use for compression only.

I do agree that some examples of my DolbyA demos just might be the playback of the 'half encoded' DolbyA HF compressed music (but unlikely in the present cases.)  My earlier decoder releases had a terrible bug that made me think that there was more use of the two HF channels only than there really was.  I had even advertised that some of my Carpenters examples failed because the specific Carpenters music was enhanced by partially disabled DolbyA.  Practically every major step of development of my DolbyA decoder has been done with public scrutiny, and versions up until about 3-4wks ago had a bug that some kinds of music wouldn't work right -- almost sounding as if it was trying to decode 1/2 encoded music -- but that was wrong.  The bug was in the decoder.  * Some of my decoding examples DO SHOW that there might still be some latent HF DolbyA encoding, but in those cases, I do believe that there just might have been HF-only use of the DolbyA encoder in those cases -- esp on vocal channels only.

However, my guess that I was trying to decode 1/2 encoded DolbyA was WRONG IN THE SPECIFIC CASES.   I had made a rather frustrating error in the conversion of feedback to feedfoward conversion.  (My expander is a compatible feedforward design -- works MUCH better than feedback and much better control of intermodulation.)  So -- after some embarrassment, I had to admit that the first versions of my DolbyA decoder were broken for some kinds of long duration changes in music patterns.  Frankly, it was just broken, and I had to rework the decay times just as the attack times had to be done differently in the conversion between feedback and feedforward.  A simple use of a 1msec/30msec and 2msec/60msec attack/decay times for the HF and LF/MF channels would have been terribly broken for a feedforward design -- even though tests showed that it ALMOST sounded like that it would work.

When doing ad-hoc comparisons in the results for decoding, a simple use of the straightforward 1/30 and 2/60 attack/decay pairs in feedforward ALMOST sound correct -- and are correct enough in perhaps 1/2 of the decoding attempts to sound okay.  But, that is simply wrong.  I have an almost 100% capable design where even the most evil and syncopated timing will cause the decoder to respond correctly.  A few weeks ago, I got a rather frustrating complaint that the decoder didn't work correctly on 'Band On The Run'.   After finally believing that complaint to be true (getting past my ego), I re-visited the basics of my design, and realized that the bug was certainly my own, and corrected the decay code so that EVERYTHING so far that seems to be completely DolbyA encoded is properly decoded.

I am not going to disclose the technique - because I have gotten some feedback  that this decoder (which my design is decoder only) is much more compatible than ANY OTHER software technique available, and also has much less distortion in difficult cases.  I have a proprietary copy of music, provably DolbyA encoded where a true DolbyA decode produces a chorus of childrens's voices that sounds like a blob of voices,  the Satin sounds like it has a messed up HF balance, and mine sounds nearly like a real DolbyA except the childrens voices are clearly distinct -- but closer to real DolbyA HF/MF balance.  This is partially due to the vastly superior intermod handling, where the amount of intermod is much closer to that which is theoretically necessary (especially when in --ai=high mode.)  Also a direct conversion between the commonly available DolbyA schematics and SW will definitely result in sound that is more similar to the Satin.  Use of that technique will result in something that kind of works, but won't sound like a real DolbyA.

I am still thinking about possible improvements, and there are even some stealth capabilities in the decoder which I am not disclosing -- some errors in the DolbyA encoding process which are compensated for.  I think that I have found the reason for that quality-loss syndrome, and somecapability for the correction can be disabled by a switch.  There is almost NO reason to disable this feature, as I have heard no music which is damaged by the default setting.  There is a difference in audible character, but the disabled version only discloses more distortion from the ENCODING process.  The default setting uses a rather tricky technique to hide the distortion from the decoder.  I think that I even know the theoretical basis for that distortion.

Truly, if you want to hear music that is closer to the pre-DolbyA encoded version than has been previously possible (including that which was decoded by a real DolbyA or Satin SW) -- my decoder will enable that ability.  Actually, '--ai=max' MIGHT be more accurate yet, but I believe that '--ai=high' is the best/safest bet.

I admit that my decoder is a decoder-only, and it will be staying that way.  i don't think that DolbyA encoding is of high enough quality in this perfect digital world to be of any long term future use.   There are too many flaws in the assumptions made, but it was a genius design for the middle 1960s.   Ray Dolby was a REAL genius.

I do have a better (easier to HW & SW replicate) design concept that is closer to a Hi-Com (AFAIR) technique, where it is almost a mix between the DBX & DolbyA techniques -- but with almost none of the disadvantages.   The amazing thing is that in the deep deep future, it would even be easier to replicate in HW only if needed.  (There are some aspects of semiconductor physics which are more consistent than the newer DolbyA design depends upon, and the reason for so damned many tweaks.)  For long term (Library Of Congress) type applications, I think that I have an idea for a better compander system which is very stable and dependent mostly just on physics.  It only requires matching -- which is easy to do on chips, but tweaking FETS is NOT so easy -- FETS do not replicate ideally so very well on chips.  The original DolbyA design, which does have some advantages over the 360, also has some disadvantages, but also comes closer to what is needed for long term archive recovery.

But, again, thanks for that pointer!!!


Possible (and likely crazy) attempt to remove the extra add-on DolbyA sheen

Reply #129
Apparently, in the olden days, some record companies used a partially disabled DolbyA unit to add an additional high frequency sheen (or exciter-type) sound to their recordings.  (Apparently, CBS records had that habit.)   My decoder can provide a removal service for that sheen, but only if the entire recording was processed to create the extra sheen, and was not used for just the voices.  Any settings using the decoder for that purpose would be very experimental, but I can give you some hints.

Firstly, you would use the 'sheen removal' only after the full-on DolbyA decode (either from my compatible decoder or using an unmodified true DolbyA unit.)   Then, the way to invoke my decoder would be something like this:

sox infiles.type  --type=wav --encoding=floating-point --bits=32 - rate -v 96k | da-avx --lfoff --mfoff  --ai=med --thresh=15.75 | sox - outfiles.type  <also add needed EQ because of sheen removal>

Note the full details of using sox need to be figured out using the sox documentation, and I am just showing the general jist of doing the 'sheen removal'.  Note the specific switches for the decoder program...  -lfoff and --mfoff turn off the dynamic gain on the 20-80Hz and 80-3kHz channels.  By turning off the MF channel, some special decoding features are also disabled -- so that the decoder acts more focused towards sheen removal.  Also, the threshold (--thresh=-15.75) will likely have to be very different from the norrmal -15.00 to -16.00 values.

The additional EQ will probably look like this:  "treble 3 3k 0.707q treble 3 9k 0.707q", but this suggestion just might be a little too strong or weak -- not sure.

Another note -- do you see where I convert the input to floating point and 32bits -- that is the optimum format for the decoder, and also the decoder has maximum performance between 88.2kHz and 124kHz.  The perf falls off a little at 192kHz, and I suggest that after the rework, there is never a real benefit to running at 192kHz unless there is a specific need to support 192kHz on the output.

Nowadays, the processing at 44.1kHz is pretty good, but it is almost always beneficial to bump it up to 48k before the decoder -- this is because the internal DSP operations work at exactly the sample rate frequency, and there is little room for distortion removal and filtering when running at 44.1kHz.  44.1kHz is more of a distribution rate than an ideal production rate anyway.   The biggest benefit to running at 48k instead of 96k is that the decoder runs much faster, but the quality should be barely noticeably better in the 88.2kHz through 124kHz ranges.

There will be an upcoming new release (probably end of day today -- 14May2018, or early tomorrow), and it runs about 10-20% faster and have a few very minor quality improvements.


Slightly faster (10-20%), slightly better decoder.

Reply #130
The new version is available here and on the repository site below.  I haven't updated all of the demos yet, but there are not major changes in sound quality.  This new version does some calculations now in single precision float rather than double precision.  (The reason for previously using double precision was because of an arcane behavior of the compiler that I am using.  I reworked the calculation, which both allowed using single precision -- which is more than enough, and also twice as many operations can be done because the attack/decay can be calculated for both (L+R) channels at once rather than one at a time.

The slight improvement in quality results because of quicker calculation, so better time precision can be provided by doing the longer time decay calculations more time-accurately.  Also, the longer term decay can calculate further into diminishing returns providing a better decay match in possible situations that I have not yet encountered.   Also, the --ai=high vs. --ai=max are more different than before.   --ai=max removes more of the encoder 'hash' at the slight expense of certain dynamics at HF.  --ai=high continues to be the best, safest aggressive quality, while --ai=med allows slightly faster encoding, and has a very slightly simpler decoding calculation (marginally closer to a normal decoder behavior.)  Using both  --ai=none and --raw=8192 gives a barebones decode which will seem brighter (the extra brightness is believe it or not,  distortion), but would be essentially the results of doing a traditional decoding in software.  I only use those modes for development comparisons, but thought that it might be interesting to show what all of the fast gain control can sound like, and shows one reason why SW based compressor/expanders sometimes sound different than a HW equivalent.   Naturally, a bare bones software design will produce much more distortion than HW because of the effects of sampling.  This is one case where higher-than-44.1k sample rates really DO make a difference, but it is purely because of the nonlinearity (producing beyond 20kHz signal components), and sampling.  The decoder is designed to fully mitigate those uglies.

More detail:  because of how the compiler generated code for the newer and older CPUS, I did the math for attack/decay in a math vector size 4 with a data type of double.  This fit perfectly the capability of the CPU and the number of audio bands on each of the L+R channels.  This also matched well when compiling for less capable CPUs, where they supported a vector size of 4, but only in single precision.  SO, the code was designed to use the different data type for the older CPUs -- but with absolutely no difference in quality.
I decided to do both the L+R calculations at the same time because newer CPUs can do 8 single precision operations at the same time, so now both L and R attack/decay are calculated at the same time.   The effect on the older CPUs should also have etiher a slight improvement or break even because the locality of reference is a bit better.

Anyway -- -probably more details than what you really want to know.




Found an attack time error -- fixed

Reply #131
Someone on another forum turned me on to 'Howard Jones'.  He was curious if the specific recording was DolbyA encoded.  Well, I found that it was, but there was something wrong with the decoding.  After listening to a few more pieces, I found the problem, and it was related to the anti-intermod parts of the code (but not the anti-intermod code per-se.)  So, I pulled back and reverted the code to an earlier version, and the attack time problem (however small) is now gone.   The newer code was just a little too aggressive in trying to avoid unnecessary intermod, but actually the code now enabled is better all around.

There are so many variables in the code -- especially since this decode goes FAR BEYOND a basic decoder in trying to extract every bit of quality out of the music.  Anyway -- here is a new one -- and this release IS justified.

New version -- not worth downloading if you got the previous version

Reply #132
This version has some clean-ups -- per some commentrary from another forum's leader.  Made some documentation a little more accurate/clear.  Also, the code rejects mono files now (before, it just went nuts.)  Two new switches added (for windows convienience), where you can specify --input=filename or --output=filename instead of using the '<' and/or '>'.   This is meant for windows sensibilities, not really any underlying change.

For the first time we ran a harmonic distortion analysis -- and it is unmeasurable on the normal settings of the Audacity spectrum analyzer.  A real DolbyA does show some distortion. 

The major new thing learned is that the program drops about 3k samples during the conversion.   This is true for every version, and hasn't been fixed yet.  Just letting you know, and it will be fixed soon!!!

Also, MAKE SURE YOU USE THE --outgain=-3 switch on EVERY version.  Because of precise emulation of a real DolbyA in certain regards, it will clip if you supply 0dB input, and the --outgain=-3 keeps that from happening.  Actually, it has 0.33dB excess gain, so you can fix it in the way that you want.

Again, don't bother downloading if you have the 16may version (unless you like to always keep up with the latest-latest :-).)


Really 'cool' demo song.

Reply #133
Minor off topic, not being actually technical -- but is related to the current subject...

It sometimes isn't easy to find a really good demo of how the decoder maintains the stereo (and unflattens the depth), and actually sounds REALLY natural.  This demo has no 'love' put on it -- that is no, EQ.  This is raw decode from my original DolbyA encoded copy.  REALLY NEAT!!!  It doesn't have much artificiality, so you can really hear of something bogus is happening.   The only thing that I did was to normalize it to -0.25dB.   I don't have space for the original, but will provide if really asked for.

The file:  Nat-LOVE-DAdecode.mp3



Decoder update -- moderately helpful

Reply #134
This update gives one audio quality improvement, one audio quality fix, and a usage feature.

The audio quality improvement is for --ai=max mode only, where the intermod is better controlled yet (smoother sound without loss of detail.)  There is an additional slowdown, but the extra calculation is worth it, if needed or desired.

The audio quality fix is for a calculation error on the 3k-9k and 9k-20k ranges.  You might notice slightly crisper, more detailed HF sound without an increase in intermod.  In fact, the intermod might be slightly decreased with better clarity.  I made a math error on one of the 'careful' attack time calculations (the attack/decay time calculations last for a total of about seven hundred lines in two seperate sections -- not simple -- VERY careful shaping.)  The error was the truncation of a variable name and a mistaken use of a zero initialized variable instead of a specially filtered signal level.  Because it was utilized in a 'max' calculation and was part of shaping the attack time, it wasn't fatal, just not good.  It was an obvious transcription error from a previous version of the calculation.

The usage feature is the ability to display the gains for the individual channels instead of the average gains.  So, there is one extra line per 1second display if you specify '--info=10'.  It isn't fully documented yet, but this is a heads-up that the feature exists and is intended to be permanent.
(I don't intend on working on this today -- so this will be the last version released today -- unless I get a complaint about a major failing.)

Helpful usage note...

Reply #135
I have been chasing down some of the last of the edginess like distortion, and I found some filter issues which I am addressing.  However, a quick RIGHT NOW workaround for those who have copies of the decoder that they are using is fairly easy.

Through studying the filters and the skirts of the filters, I found that the ideal sample rate for quality is between 56k and 64k samples per second.  Those rates give the advantage of being high enough above the minimum necessary sample rate to help to avoid any noticeable aliasing given practical filters (if someting leaks beyond 20kHz, there is enough room to filter it out.)  Additionally, as the sample rate goes up, then fixed size filters become less effective at lower frequencies.  So -- running at 96k does work, but produces lower quality than running at, say 64k.

The filters in the processor currently only partially support 'wide' mode where the 21.5kHz limitation is not enforced, so running at a very high sample rate has no benefit at all given the structure of the decoder (the filters being used/etc.)  The fact that out of band filtering is really necessary for best audio quality (out of band filtering internally -- lots of nonlinear stuff going on), that there is actually a real disadvantage to allowing more than a flat 21.5kHz bandwidth.

From now on -- to infinity, I am going to be running/testing predominently at 56k-64k (still haven't decided), but that does NOT limit the sample rates being used for the data source and data sink (I ALWAYS use sox for the conversions.)  The decoder is planned to always work at 44.1k through 192k, it is just that it might not be as good as when running at 64k, for example.  Actually, there is a mathematical reason for choosing around 64k as the ideal rate  -- it happens to be approx ideal given a lot of arcane DSP factors.


New version -- just quality improvements

Reply #136
New version of the DolbyA compatible decoder.  The best yet.

You can check the location above -- sometimes I forget to post about updates.  More and more people are realizing that they have DolbyA encoded material and didnt know it!!

SimplePortal 1.0.0 RC1 © 2008-2018