TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Using Deep Learning to Reconstruct High-Resolution Audio

114 点作者 yurisagalov将近 8 年前

7 条评论

eggoa将近 8 年前
I hesitate to even post this, but I listened to the audio examples and it seems like this project was not yet a success. I'm not trying to be a jerk or snarky, but the reconstructed audio sounded terrible.
评论 #14622557 未加载
评论 #14623166 未加载
评论 #14623350 未加载
评论 #14623123 未加载
评论 #14623522 未加载
评论 #14624881 未加载
评论 #14623766 未加载
评论 #14622525 未加载
volkuleshov将近 8 年前
I&#x27;m one of the authors of the paper that proposes the deep learning model implemented in the blog post, and I would recommend training on a different dataset, such as VCTK (freely available, and what we used in our paper).<p>Super-resolution methods are very sensitive to the choice of training data. They will overfit seemingly insignificant properties of the training set, such as the type of low-pass filter you are using, or the acoustic conditions under which the recordings were made (e.g. distance to the microphone when recording a speaker).<p>To capture all the variations present in the TED talks dataset, you would need a very large model and probably train it for &gt;10 epochs. The VCTK dataset is better in this regard.<p>For comparison, here are our samples: kuleshov.github.io&#x2F;audio-super-res&#x2F;<p>I&#x27;m going to try to release the code over the weekend.
评论 #14626577 未加载
hackpert将近 8 年前
I&#x27;m interested in seeing how computationally efficient this method turns out to be and how well it generalizes to other audio data and perhaps other signals as well. Going on a hunch by the model, I think there are some more efficient methods to do bandwidth extension on audio samples with better quality results, but it is great to see more deep learning people take an interest in this domain. I do believe that deep learning can have tremendous impact in DSP and compression.<p>(Disclaimer: I developed a somewhat similar method earlier this year applied in audio compression, yet to be published)
评论 #14623020 未加载
crazygringo将近 8 年前
While something like this is bound to fail for most music of any complexity (e.g. a singing voice), I&#x27;ve often wondered if this would be highly successful on, say, old solo piano recordings, where the possibilities of the instrument are extremely well-defined and limited.
starchild3001将近 8 年前
Thanks for sharing. The possibilities for this kind of technology are endless. Maybe one day we&#x27;ll start having crystal clear conversations over telephone :)
bob1029将近 8 年前
I am a little curious as to how this factors into fundamental information theory.<p>In my mind, you are simply taking a 0-2khz signal and combining it with an entirely different 0-8khz signal that is generated (arbitrarily IMO) based on the band-limited original data. I can see the argument for having a library of samples as additional, common information (think many compressor algorithms), but it is still going to be an approximation (lossy).<p>&quot;The loss function used was the mean-squared error between the output waveform and the original, high-resolution waveform.&quot; - This confuses me as a performance metric when dealing with audio waveforms.<p>I think a good question might be - &quot;What would be better criteria for evaluating the Q (quality) of this system?&quot;<p>THD between original and output averaged over the duration of the waveforms? Subjective evaluations (w&#x2F; man in the middle training)? etc...
cnxhk将近 8 年前
The title is good. The performance is limited and number of examples is not enough to make any useful conclusion.