科技回声

9 条评论

I've seen a lot of people commenting about the artifacts you hear when the samples are stretched. These happen because of phasing issues, where frequencies in each of the grains are interfering with one another.I'm surprised I don't see it mentioned here, but there's a rather interesting extension to this technique made by Paul Nasca[0], which midigates these artifact by (1)carefully choosing the size and placement of grains and (2)randomly changing the phase of each grain before recombining. You can see the algorithm here[1].The results are absolutely incredible. You can end up slowing a sample down by 800% or more with no artifacts. For example, here[2] is the Windows 95 startup sound extended to be a little over 6 minutes long. The reverb you hear isn't added, that's just what is sounds like.Also, if you didn't notice from the page, it's one of the default plug-ins in Audacity.[0]: <a href="http://www.paulnasca.com/" rel="nofollow">http://www.paulnasca.com/</a> [1]: <a href="http://www.paulnasca.com/algorithms-created-by-me#TOC-PaulStretch-extreme-sound-stretching-algorithm" rel="nofollow">http://www.paulnasca.com/algorithms-created-by-me#TOC-PaulSt...</a> [2]: <a href="https://www.youtube.com/watch?v=FsJdplLB1Bs" rel="nofollow">https://www.youtube.com/watch?v=FsJdplLB1Bs</a>

评论 #16540070 未加载

mgeorgoulo大约 7 年前

Very good results and embarrassingly easy to implement!The very stretched waveform did contain some audible artifacts, but I think other methods like FFT would introduce some as well.This kind of trick works because our hearing is frequency-based. So the crucial thing is to preserve the frequencies and it is going to sound exactly the same.Spatial mapping of frequencies in the human ear here: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2394499/" rel="nofollow">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2394499/</a> (see fig 5.)Trying this with an image for example wouldn't work, because our vision is sample-based. Imagine splitting an image in tiny fragments and repeating/interpolating them on top of one another.

jancsika大约 7 年前

> What this does is make it so you can put any grain next to any other grain, and they should fit together pretty decently. This gives you C0 continuity by the way, but higher order discontinuities still affect the quality of the result. So, while this method is fast, it isn’t the highest quality. I didn’t try it personally, so am unsure how it affects the quality in practice.It's not just about continuity. It also removes an entire set of concerns from the process.For example-- suppose someone analyzes an audio recording, splits it into grains, then does some fancy re-organization based on the timbral content of the recording/grains.Now suppose they are subjectively unhappy with the result. Perhaps it sounds "wimpy," "fluttery," or some other such vague complaint. Is that sound due to a) their process of re-organizing the grains, b) the quality of the original recording, c) the envelopes they used, or d) something else entirely?If instead one uses grains which begin and end at zero, the answer can't be C because it doesn't exist. I can say that the quality sounds fine in the few examples I've heard that use this technique.I'd imagine the reason the latter isn't used as often is because it's simply more difficult to program if each grain can be an arbitrary size (or at least not quantized).

aidenn0大约 7 年前

I think the speed-up sounds much better than the slow-down. With the slow-down there are very noticable artifacts; I'm not sure if it's because of the envelope they choose or just because repeating a grain adds harmonics.

vladimirralev大约 7 年前

As far as I see this is basically naive TDHS (Time Domain Harmonic Scaling). It's a great starter project as an intro to audio-effect coding, since you can visually observe where you go wrong and where the noise comes from at the edges. Just great for learning how audio works for beginners. It's very rare to have an audio effects algorithm so cool and so easy to observe without special analysis tools.Some more famous algorithms that work this way and are similarly easy to implement are TDHS and PSOLA. They all work in the time domain but find different ways to smooth out the discontinuities and to make more extreme shifts sound better.

amelius大约 7 年前

Perhaps a better way of looking at it is this. Basically, a sound triggers hair cells in the ear. A single harmonic tone triggers a single group of hair cells. Through modeling, you can compute which hair cells are triggered at what moment for a given signal. Your task is then to compute a new signal for which the same haircells are triggered but faster.

评论 #16541008 未加载

jeffreyrogers大约 7 年前

This is pretty neat. One frustrating thing I found while doing some audio programming recently is how hard it was working with different audio formats. Most of the libraries I found for doing so were GPL or required a commercial license.

评论 #16537830 未加载

评论 #16538183 未加载

recentdarkness大约 7 年前

Title should be “Granular Audio Synthesis”Couldn’t find anything about C++ in that article on a quick scan - feel free to correct me

评论 #16536173 未加载

评论 #16537504 未加载

luk32大约 7 年前

How does granular analysis differ from pcm representation, Fournier transformation and sampling? Or is it a different name for the same thing. I think it's natural to whoever worked with sound on a Pc.It's probably debatable, but I don't agree with the statement that shortnening the "sound" changes pitch. It depends on your representation of the sound. If you represent it as a function of amplitude vs time then scaling the time axis does change pitch.This makes a sensational tone about a fallacy. No instrument plays sound faster or slower to make it shorter or longer.... It just stops playing it or doesn't. If one thinks about the phenomenon this way, it becomes natural why you cannot compress time, to play shorter sounds.

评论 #16537017 未加载

评论 #16536446 未加载

评论 #16537954 未加载