HydrogenAudio

CD-R and Audio Hardware => Audio Hardware => Topic started by: danielm on 2012-07-07 18:13:28

Title: Bell Speakers
Post by: danielm on 2012-07-07 18:13:28
Okay, so I hope this is the right place to post this, but I had an idea (and very little actual know-how) and wanted to know what people in the know thought of it. Okay, so here goes... stop me if any of my presumptions are faulty, but...


(1) recorded music is just a complex waveform over time (different frequencies superimposed on one another)
(2) "regular" speakers recreate this with an electromagnet and a diaphragm...

so here is the idea... could a recording of, say, a person singing, be discernibly (not perfectly by means) reproduced using a rig as described below.

(1) x number of precisely tuned bells, such that, given the available frequencies, most frequencies within the human hearing range can be more or less recreated by some combination (i know there is an embedded math problem, but my uneducated instinct says some application of the birthday paradox (http://en.wikipedia.org/wiki/Birthday_paradox)) of the given bells
(2) each bell fitted with an individual ringing mechanism (servo motor?), possibly something resembling a bicycle spoke with multiple strikers on a wheel that can be spun quickly to emulate some sustain.
and
(3) a computer and program to translate the given audio file into something the bell rig can play.




Do you think this is possible? Are there any huge holes in my logic or understanding? Something I am not thinking of? Bear in mind you are speaking to someone who studied russian lit and only has a passing familiarity with this stuff. Thanks in advance for your time or comments.
Title: Bell Speakers
Post by: pdq on 2012-07-07 18:47:45
First off, you would need to be able to stop each bell ringing as quickly as you start it.

Second, a bell (any kind that I know of) doesn't emit a single frequency.

More importantly, you are not just talking about a bunch of independent sine wave sources. Each frequency needs to be in exactly the right phase withrespect to its associated rrequencies.

All of this makes what you describe pretty close to impossible.
Title: Bell Speakers
Post by: danielm on 2012-07-07 18:54:10
how disappointing, thanks pdq
Title: Bell Speakers
Post by: markanini on 2012-07-07 19:00:53
Sounds fun but it couldn't possibly be hi-fi. For starters the partials in voices instruments etc don't always correspond to partials of bells, the result would be many residual tones. It would an awesome musical instrument though.
Title: Bell Speakers
Post by: danielm on 2012-07-07 19:04:04
markanini, yes, hi-fi is not the goal at all. God i want to hear what it would sound like
Title: Bell Speakers
Post by: Kees de Visser on 2012-07-07 19:41:32
It's an interesting idea for sure. A similar attempt has been made to make a piano "speak".
Unfortunately the video is in German, but is subtitled and the "spoken" words are in English, so you'll get the idea.
http://www.youtube.com/watch?v=muCPjK4nGY4 (http://www.youtube.com/watch?v=muCPjK4nGY4)
Title: Bell Speakers
Post by: danielm on 2012-07-07 21:10:03
It's an interesting idea for sure. A similar attempt has been made to make a piano "speak".
Unfortunately the video is in German, but is subtitled and the "spoken" words are in English, so you'll get the idea.
http://www.youtube.com/watch?v=muCPjK4nGY4 (http://www.youtube.com/watch?v=muCPjK4nGY4)

this is incredible... yes, just like this but with bells!
Title: Bell Speakers
Post by: dhromed on 2012-07-08 20:57:51
Next step is the reverse: are there people trained in making any sound they like, as though they were speakers?

I imagine it'll sound like a telephone. 
Title: Bell Speakers
Post by: mixminus1 on 2012-07-08 22:31:29
Michael Winslow - who played Larvell Jones in the Police Academy movies - comes to mind.
Title: Bell Speakers
Post by: mzil on 2012-07-09 00:16:24
I suspect the piano speaking video is fake. [Although I don't speak German, so perhaps they explain more than I am aware of] It is a trick. We are NOT hearing just an unmodified piano. We are hearing either a gimmicked piano or a secondary "enhancement" soundtrack has been mixed in. The give away is the clarity of some of the the voiceless fricatives (such as the "/s/" in "responsible")
[Examples of voiceless fricatives may be heard in the demonstration videos of a face to the far right, here. Click "Fricative" and then try the five shown: http://www.uiowa.edu/~acadtech/phonetics/e...mp;scrollbar=no (http://www.uiowa.edu/~acadtech/phonetics/english/frameset.html?resizable=yes&width=700&height=450&menubar=no&titlebar=no&statusbar=no&scrollbar=no)
should you need a better understanding of what they are]
which a piano would have a hard time emulating.

Here's another video and notice how much harder the words are to make out [in fact nearly impossible if you close your eyes and stop reading the text accompanying the sound.] Try it and see how few words you make out!

http://www.youtube.com/watch?v=bFsRe6YSWws (http://www.youtube.com/watch?v=bFsRe6YSWws)

I suspect on this second version I found, they have scaled back the "enhancement track".
Title: Bell Speakers
Post by: Dynamic on 2012-07-09 09:09:04
Bizarre as it may seem I think it's probably real. I hadn't seen it on youtube before but heard this many months ago on a podcast - probably Scopes Monkey Choir. I think pareidolia (sp? - ie. expectation bias) helps us hear what we expect to hear (hence the words help). I think fricatives are plausible given the impulsive nature of the transient at the start of a piano note. The sustain and decay parts sound pretty tonal, which would be the case. I seem to recall the podcast talking about an academic paper written about this piece, and how, once the timing was tight enough with the electronic actuators, the choice of notes to be played was an attempt at a best fit for the spectrum of a superposition of piano notes to the time-varying spectrum of the human speech it was trying to reproduce. They, I think, played the original human speech, which I think was a child making a declaration about human rights or something before the European Parliament or something like that. It does help that piano notes can be curtailed rapidly by releasing the key and letting the damper mute the sound. I think the singing tone of the speech helps greatly to make it amenable to this form of reproduction.
Title: Bell Speakers
Post by: DVDdoug on 2012-07-09 18:45:19
Quote
(1) x number of precisely tuned bells, such that, given the available frequencies, most frequencies within the human hearing range can be more or less recreated by some combination (i know there is an embedded math problem,
danielm,

It almost sounds like you are describing the Fourier Transform (http://en.wikipedia.org/wiki/Fourier_transform).  So we know bells would not work, but if you had some sort of other sound-generating device for "every frequency"*, you could accurately reproduce the human voice (or any other sound).  There's nothing wrong with your thnking-reasoning...  You just you don't understand the physics/acoustics of bells.    It's nearly impossible (maybe entirely impossible) to make a mechanical device that vibrates at a single-pure-tone without ringing.    You can do it with electronics, but, building such a device is, of course, not practical. 

The Fourier Transform (or FFT = Fast Fourier Transform, or DFT = digital Fourier Transform)  is used "everyday" in DSP (digital signal processing), including audio processing.    But, the data is converted-back into the "normal" time-domain before being converted to analog and sent to a loudspeaker.





* There is no such thing as "every frequency", since frequency is not an integer value...  It's a continuous real value (http://en.wikipedia.org/wiki/Real_number) (like distance), and there are an infinite number of frequencies within the audio range.  But, our ears/brains don't have infinite resolution.
Title: Bell Speakers
Post by: benski on 2012-07-09 19:36:54
But an FFT has both a magnitude component and a phase component (more accurately, a sine and cosine response from which magnitude and phase can be computed).  Getting the frequency-magnitude aspect correct is the easy part.  In fact, some digital additive synthesizers such as the Kawai K5 had a "resynthesis" feature that would create the required harmonic envelopes to roughly match a sampled waveform.  But the precise control over phase was not present. 
For a purely mechanical device, the largest limitation would the length of sound that could be recreated.  In order to reproduce a longer sound, you would need more sound generating devices (just as the needed size of an FFT scales linearly with the number of time-domain samples).
Title: Bell Speakers
Post by: Porcus on 2012-07-10 05:13:37
It's nearly impossible (maybe entirely impossible) to make a mechanical device that vibrates at a single-pure-tone without ringing.    You can do it with electronics, but, building such a device is, of course, not practical.


Following this train of thought:

A loudspeaker working this way, would have a very large number of speaker elements, each doing 'only one frequency'.
That is, essentially an N-way loudspeaker, for very large N (finite, by the limitations of human hearing) and very steep crossovers – DSP'ed, I'd guess.

But the single-pure-tones (i.e. sines) constitute but one basis for the vector space. Who says we should use that one? We could replace it with your favourite basis (wavelet, whatever ... it need not even be orthogonal!), and feed each of the N loudspeaker elements one basis vector.

Now who says an array of bells cannot form such a basis? 'Practicalities' would be this 'who' of course, but in principle? Indeed, the talking piano is a projection down to a subspace of dimension eighty-something, with piano strings for the loudspeaker elements.

And reducing dimensionality – that is, reducing N – is really a kind of lossy compression, decoded 'at the loudspeaker level' – or if you like, in the air in front of the elements. And by choosing a different basis, you might optimize to reduce artifacts (of which the piano had a few ... oops, TOS#8).



So ... here is a research project:
- build such a loudspeaker. Heck, some manufactorer of computer-grade speakers should sponsor this, I doubt we will need the high-end.
- Pick various basis choices. Play. Listen. Pick more bases. Tune. ABX with different music (what statisticians call out of sample).
- and whatever you do, don't forget to youtube it!
Title: Bell Speakers
Post by: dhromed on 2012-07-10 09:02:12
DFT = digital Fourier Transform


*discrete fourier transform.
Title: Bell Speakers
Post by: 2Bdecided on 2012-07-10 10:16:39
Of course its possible. The question is only how bad it would sound (i.e. how close an approximation to the original can you create).

With bells you'd get a horrible racket because of all the harmonics, though you could try to take account of that in the analysis and design bells with purer/nicer harmonics (that's been done).

If you used something closer to a sine wave with easily controlled start+stop (e.g. blowing air down a tuned tube), it would probably be easier.


Regarding phase: if you FFT something, reset all the phase information, and then reconstruct the waveform with this (zeroed) phase information, the result sounds horrible, but not unrecognisable. The block length of the FFT imposes its signature strongly on the output in this simple experiment, but that effect could be reduced. You don't have to use a fixed block length FFT. You can even use the data from different block lengths at different frequencies.

Cheers
David.
Title: Bell Speakers
Post by: danielm on 2012-07-17 00:13:16
Okay, i get that bells ring at a number of frequencies.... but isnt that over time? if you were to cut off the sound almost immediately after striking, isn't the beginning of the sound (the attack, i believe you audiophiles call it) a fairly uniform frequency? like, after ringing the bell it starts somewhere and gradually shifts frequencies as it loses energy? or is that a complete misunderstanding, and they all always are ringing at multiple harmonics?
Title: Bell Speakers
Post by: 2Bdecided on 2012-07-17 10:32:12
They are all always ringing at multiple harmonics.

The brief initial attack itself is percussive, and has even less well defined tonal qualities. It's more like a click.

Cheers,
David.
Title: Bell Speakers
Post by: mzil on 2012-08-15 00:04:28
I suspect the piano speaking video is fake. [Although I don't speak German, so perhaps they explain more than I am aware of] It is a trick. We are NOT hearing just an unmodified piano. We are hearing either a gimmicked piano or a secondary "enhancement" soundtrack has been mixed in. The give away is the clarity of some of the the voiceless fricatives (such as the "/s/" in "responsible")
[Examples of voiceless fricatives may be heard in the demonstration videos of a face to the far right, here. Click "Fricative" and then try the five shown: http://www.uiowa.edu/~acadtech/phonetics/e...mp;scrollbar=no (http://www.uiowa.edu/~acadtech/phonetics/english/frameset.html?resizable=yes&width=700&height=450&menubar=no&titlebar=no&statusbar=no&scrollbar=no)
should you need a better understanding of what they are]
which a piano would have a hard time emulating.

Here's another video and notice how much harder the words are to make out [in fact nearly impossible if you close your eyes and stop reading the text accompanying the sound.] Try it and see how few words you make out!

http://www.youtube.com/watch?v=bFsRe6YSWws (http://www.youtube.com/watch?v=bFsRe6YSWws)

I suspect on this second version I found, they have scaled back the "enhancement track".


Peter Kirn, who knows much more about music synthesis than I do, concurs with me (upon his secondary examination), noted in his edited text, that there is an enhancement track mixed in, which he believes to be simply the original speech. We are NOT hearing just a piano, all by itself, in the Youtube video:

"Edit: Listening again, the short answer to how you can hear so much of the voice through the piano seems to be, you can’t; the original is almost certainly mixed in. It’s nonetheless an interesting effect, and I’d like to hear the piano on its own."

http://createdigitalmusic.com/2009/10/the-...-audio-to-midi/ (http://createdigitalmusic.com/2009/10/the-speaking-piano-and-transforming-audio-to-midi/)
Title: Bell Speakers
Post by: knutinh on 2012-08-15 08:12:18
Actually, the Hammond organ predates the additive synthesizers by several decades (and some lesser known instruments before it).

It had a "free-running" "sine-generator" of 91 pitches, and each key could mix 8 harmonically related pitches using a global set of mixing parameters ("drawbars").

Phase was free-running (fixed phase relationship between the 91 sines, pressing a key basically just trigged an envelope). Interestingly, when playing polyphonically, any sine that was used in several places, would (necessarily) have the same phase everywhere. Also, mechanical limitations meant that pitches had to be rounded, not necessarily well-tempered.


Whenever a classical composer makes a score for 100 musicians. Isn't that sort of the same thing? Synthesizing some complex waveform using a large(ish) set of other complex waveforms. Now, composers might not think of the instruments as vectors in a large space, and musicians might not be so strict about following the score as a computer.

-k
Title: Bell Speakers
Post by: Porcus on 2012-08-15 08:37:40
Whenever a classical composer makes a score for 100 musicians. Isn't that sort of the same thing? Synthesizing some complex waveform using a large(ish) set of other complex waveforms. Now, composers might not think of the instruments as vectors in a large space


Those 100 are certainly not  playing e.g. voice. Probably not only because the vector space is too small, but also because it may fail to be closed under vector operations



@ mzil: Thanks for the update, you killjoy
Title: Bell Speakers
Post by: knutinh on 2012-08-15 09:54:55
Whenever a classical composer makes a score for 100 musicians. Isn't that sort of the same thing? Synthesizing some complex waveform using a large(ish) set of other complex waveforms. Now, composers might not think of the instruments as vectors in a large space


Those 100 are certainly not  playing e.g. voice. Probably not only because the vector space is too small, but also because it may fail to be closed under vector operations

Shure, but some classical composers seems more interested in recreating certain "timbres" rather than following established notions of tonality and rhythm. The art of composing for orchestra might consist of having a mental model of how an orchestra reacts to "stimulus" (score), an idea of how one wants the final waveform to sound, and then doing an inverse lookup to figure out how the score must be. Or I might be totally wrong and composers might just make a "pretty polyphonic song" on their piano, delegating each voice to an instrument.

As musicians are not machines, I doubt that it is possible to control an orchestra with the precision/predictability that is needed in order to make a convincing voice simulation. After all, the score is not a MIDI-file, but a coarse suggestion that is further reinterpreted by conductor and musicians. I would love to be proven wrong, though.

One might claim that church organs are crude "non-sinoid" additive synthesis instruments.
"A typical and distinctive sound of the organ is the cornet, composed of a flute and ranks making up its first four overtones, sounding 8', 4', 2?', 2', and 1?'."
http://en.wikipedia.org/wiki/Organ_stop (http://en.wikipedia.org/wiki/Organ_stop)

I think that this topic is interesting, and extends beyond purely additive synthesis. Say that you have got a synthesizer at your disposal. It offers a large set of parameters that generally inter-op in complex, non-linear ways. One parameter may choose between a large set of sampled waveforms. Another may set the cutoff frequency of a filter. A third may allow mixing different simple/complex oscillators. How do you resynthesize any waveform onto this synthesis engine (minimizing e.g. squared error) except for the obvious brute-force way?

-k
Title: Bell Speakers
Post by: zima on 2012-08-18 06:15:26
Somehow this thread reminded me about http://en.wikipedia.org/wiki/Katzenklavier (http://en.wikipedia.org/wiki/Katzenklavier)

...any guesses as to how many "notes" might be needed in this case, to give an impression of human speech?
Title: Bell Speakers
Post by: Porcus on 2012-08-20 13:29:57
One might claim that church organs are crude "non-sinoid" additive synthesis instruments.


The number of stops could also be fairly impressive. Most likely you could get an organ to do much better talking than an orchestra could. (Why does System of a Down's “Cigaro” keep popping up in my brain?)
Title: Bell Speakers
Post by: DonP on 2012-08-20 16:19:37
This is a hipshot speculation to spur discussion.

Doesn't the ear, like our eyes, have a number of discrete wavelength receptors (hair cells)?  Unlike the eye, there are many more, and I don't know any reason to think that the wavelengths are the same for different people since the response would be based on size rather than chemistry.

Anyway, it would seem that with the correct frequencies (or "primary pitches") you could give "full spectrum" sound in the same way 3 primary colors can represent the whole visible spectrum
Title: Bell Speakers
Post by: Porcus on 2012-08-20 16:58:26
Doesn't the ear, like our eyes, have a number of discrete wavelength receptors (hair cells)?  Unlike the eye, there are many more, and I don't know any reason to think that the wavelengths are the same for different people since the response would be based on size rather than chemistry.

Anyway, it would seem that with the correct frequencies (or "primary pitches") you could give "full spectrum" sound in the same way 3 primary colors can represent the whole visible spectrum


The eye basically has four receptors, and if we can disregard the rod cells, then you have a three-dimensional colourspace. The analogy would be that you had a chord of thee tones. However, the eye can also tell the colour of to nearby spots from each other, so each eye's perception of a still picture is a 'red-volume', a 'green-volume' and a 'blue-volume' for each pixel (down to the eye's resolution) on a two-dimensional surface. Each ear's perception of a sound, is a volume for each 'frequency pixel' (that is, down to the ear's resolution). Here it looks like the eye catches much more (2 dimensions into 3 rather than 1 into 1), but it does of course boil down to resolution.

But to get a sound analogue -- with noise cancelling -- I guess you would have to stick to the light-through-a-slit experiment?

You need a time dimension, of course, as the problem about the bell speakers is that they ring out in time in a less-than-perfectly-controlled way. If we suppose that the 'bell speaker' cannot dampen the bells, only cancel the noise by a phase-inverted noise, then I guess a picture analogy could be as follows:
- a 'good synthesizer' (no bells yet) would be akin to a set of lasers pointing at the screen behind the slit.
- the 'bells' are at first less precise than the synth, and it has overtones, so: exit lasers, enter some lightbulbs.
- the bells would also reverberate out; if we assume that the bells cannot be dampened, only their sound cancelled by ringing another bell, that would correspond to the screen behind the slit being phosphorecent?


Hm.  Even the analogy is tricky.
Title: Bell Speakers
Post by: dhromed on 2012-08-20 17:20:17
Quote
Anyway, it would seem that with the correct frequencies (or "primary pitches") you could give "full spectrum" sound in the same way 3 primary colors can represent the whole visible spectrum


Ears are spectral analyzers with some limited bandwidth, where each input frequency stimulates a single group of cilia, which' output is mapped to a mental scale.

Eyes are a completely different device. It responds to only three distinct frequencies, though with fairly gradual rolloff between them. The logical combination of these three inputs is what creates colour perception.

You cannot compare them so lightly.

That was not a pun.
Title: Bell Speakers
Post by: DonP on 2012-08-21 12:54:00
Quote
Anyway, it would seem that with the correct frequencies (or "primary pitches") you could give "full spectrum" sound in the same way 3 primary colors can represent the whole visible spectrum


Ears are spectral analyzers with some limited bandwidth, where each input frequency stimulates a single group of cilia, which' output is mapped to a mental scale.

Eyes are a completely different device. It responds to only three distinct frequencies, though with fairly gradual rolloff between them. The logical combination of these three inputs is what creates colour perception.



DOn't the cillia work analogously to the cones in having overlapping bandwidths so you can detect and place tones between the center frequencies of adjacent cillia by the relative response?

Title: Bell Speakers
Post by: dhromed on 2012-08-21 13:43:34
Some googling tells me that the overlap is significant and the response rolloff rather shallow (http://neuroscience.uth.tmc.edu/s2/chapter12.html) (see bottom image, fig 12.8).

In any case, the ear is like a 1-pixel eye that responds to many frequencies, while the eye has many pixels and only three frequencies.

So, back to your original point, you can get proper sound by combining sine waves (after all, this is essentially what lossy audio compression is based on), but I still would not call it "in the same way as primary colors", since audio is a plain mapping of frequency onto reponse and vision is about combining relative intensities of three values.

Lossy image compression of the jpeg variety also exploits decomposition into frequencies, but this happens in the spatial domain, per-channel and after a transform of RGB to Lab, so again, I don't want to compare audio and vision so easily and prefer to keep similes like that out of the discussion.