TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Shazam: not magic after all

77 点作者 mariorz超过 15 年前

13 条评论

tptacek超过 15 年前
Don't I feel vindicated:<p><a href="http://news.slashdot.org/comments.pl?sid=7310&#38;cid=823710" rel="nofollow">http://news.slashdot.org/comments.pl?sid=7310&#38;cid=823710</a><p>(from 2000).<p>I remember this because I got in a big argument with someone about whether this could possibly work. Of course, I never got off my ass and implemented it, which I guess makes me a huge loser.
评论 #909729 未加载
allenbrunson超过 15 年前
quote from the article:<p>"Unfortunately, there's no indication in the paper of what software was used to develop the process (although the scatterplots in the paper do look decidedly R-like)."<p>I used to work with Avery Wang, the guy who devised the algorithm. He used Matlab.
评论 #910184 未加载
transmit101超过 15 年前
The article and comments seem to suggest that the use of Matlab or R is a prerequisite for performing calculations such as this. However, MIR (Music Information Retrieval) libraries exist for a number of languages, including Java[1] and Ansi C[2], amongst others. A good dynamic language for experimenting with this sort of thing is SuperCollider[3].<p>By the way, the psycho-acoustically spectral measurements referred to in the article are called MFCCs[4] - basically an FFT reading weighted according to the sensitivity of our ears. They are often used in both music and (especially) speech recognition because they tend to accurately sum up the timbre we perceive in a given sound. Timbre is much easier to extract from a digital audio file than pitch or vocal information, hence why it tends to be successful in applications such as this.<p>Shazam is still pretty cool too<p>[1] <a href="http://jmir.sourceforge.net/" rel="nofollow">http://jmir.sourceforge.net/</a><p>[2] <a href="http://libxtract.sourceforge.net/" rel="nofollow">http://libxtract.sourceforge.net/</a><p>[3] <a href="http://supercollider.sourceforge.net/" rel="nofollow">http://supercollider.sourceforge.net/</a><p>[4] <a href="http://en.wikipedia.org/wiki/Mel-frequency_cepstrum" rel="nofollow">http://en.wikipedia.org/wiki/Mel-frequency_cepstrum</a>
mhansen超过 15 年前
This is exactly how you identify chemical compounds using X-Ray Crystallography. You shine x-rays of different frequencies onto a compound, measure the magnitude of the reflections, noting down the 3 highest peaks.<p>Then, you look up those peaks in a book, which has compounds ordered by the wavelength of the highest peak.<p>It takes minutes to do it by hand, I'm not surprised computers can do it better.
评论 #911009 未加载
joezydeco超过 15 年前
Aww man, when I read the headline I was expecting a SpinVox-like scandal. Like a room full of idiot-savants in Bangalore that knew every pop hit for the last 50 years or something.
评论 #909428 未加载
评论 #909407 未加载
jyothi超过 15 年前
An interesting app.<p>Airtel, a leading telecom provider in India had a SongCatcher service long back (3 yrs ago) <a href="http://www.techtree.com/India/News/Catch_a_Catchy_Song_with_Airtel/551-77435-663.html" rel="nofollow">http://www.techtree.com/India/News/Catch_a_Catchy_Song_with_...</a> I never tried it - may this one worked for a predefined set of songs.
评论 #909437 未加载
eric_t超过 15 年前
It would be much better if you could hum or whistle a tune, and it would recognize it. I saw a PhD thesis once about this, with an actual implementation that worked pretty well. The only problem was that the database of songs was very small. It's probably hard to scale this type of search.
评论 #909790 未加载
评论 #910175 未加载
评论 #909699 未加载
thomasswift超过 15 年前
I was working on a little side startup that used crowdsourcing to help ID songs, I was just getting into researching how programs like shazam and midomi worked until I killed the project. His paper and the way it works is quite nice, but it's not perfect for other rare music and songs without elements that really stand out(frequencies or otherwise like house music). Thanks for the link!
bockris超过 15 年前
Reminds me a little bit of<p><a href="http://astrometry.net/" rel="nofollow">http://astrometry.net/</a><p>here is an overview:<p><a href="http://cosmo.nyu.edu/hogg/research/2006/09/28/astrometry_google.pdf" rel="nofollow">http://cosmo.nyu.edu/hogg/research/2006/09/28/astrometry_goo...</a><p>It's fun to read about but way over my head mathematically.
headShrinker超过 15 年前
I have often sat in coffee shops wondering what method of data extrapolation Shazam used to parse audio to be able to search it's music db. I would think about how I would do it. I use Shazam all the time so it's nice to finally know the basic idea.
Anon84超过 15 年前
<a href="http://news.ycombinator.com/item?id=908201" rel="nofollow">http://news.ycombinator.com/item?id=908201</a>
JoeAltmaier超过 15 年前
Now recognize people! Or cars, or engine problems, or birds...Rats! I missed the yc deadline by 1 day!
anonjon超过 15 年前
I disagree, any type of programming is magic.<p>I'm tired of these software 'engineering' types who insist that computers are run by using 'maths' and 'numbers' (whatever those are).<p>Clearly, computers are run by aphasic tonally-separated spinning disks. These disks fire puffs of air out the sides of the computer, creating little tiny tornados, which summon air spirits to call the fire spirits, which causes the screen to light up and the keys to make tappy-tap noises.<p>anyway. to be clear: not statistics. not math. not regulated pulses of electrons. MAGIC!
评论 #909930 未加载
评论 #909545 未加载
评论 #909514 未加载