TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Microsoft Supercharges Bing Search With Programmable Chips

122 点作者 l31g将近 11 年前

10 条评论

chollida1将近 11 年前
This has been going on in the HFT space for a number of years. FPGA&#x27;s are used to parse data feeds as the sheer volume of quotes overwhelms most systems.<p>In fact after moving the networking stack into user land and using inifiniband networking gear, its probably the third most common optimization I&#x27;ve seen&#x2F;heard of for HFT systems.<p>Here&#x27;s a quick, but surprisingly accurate description of a common HFT setup: <a href="http://www.forbes.com/sites/quora/2014/01/07/what-is-the-technology-stack-like-behind-a-high-frequency-trading-platform/" rel="nofollow">http:&#x2F;&#x2F;www.forbes.com&#x2F;sites&#x2F;quora&#x2F;2014&#x2F;01&#x2F;07&#x2F;what-is-the-tec...</a><p>Some one had asked about hte number of quotes that need to be parsed. From forbes...<p>&gt; Mr. Hunsader: The new world is now a war between machines. For some perspective, in 1999 at the height of the tech craze, there were about 1,000 quotes per second crossing the tape. Fast forward to 2013 and that number has risen exponentially to 2,000,000 per second.<p>Keep in mind that the &quot;tape&quot; is the slow SIP line that exchanges use to keep prices in sync and show customers that don&#x27;t use the exchanges direct feeds. ie it aggregates all the quotes from all venues and throws a way alot as they can&#x27;t be parsed in time or didn&#x27;t change the top level quote.<p>With 40+ venues at which a HFT fund can get feeds from 2,000,000 second is a fraction of what a cutting edge HFT would have to parse to keep up with all venues.<p>The typical setup is that you&#x27;ll run strategies across multiple machines so you have the gateway machine that directs the quote to the appropriate machine. The biggest problem is the speed at which the quotes arrive.<p>Unlike a web request, that you can take 300 milliseconds to parse and return, if you don&#x27;t parse and respond to the quote in under 10-20 micro seconds you&#x27;ve already lost.<p>So the FPGA transition is to make sure there is never a back log of quotes or any pauses in the handling of bursty quotes. This can&#x27;t be overstated enough. Margins are squeezed so tightly now that your algo will appear to be working fine until a big burst of quotes happen and your machines can&#x27;t keep up and when the dust settles in 20 seconds, you&#x27;ll find you lost $5000, which might be your entire day&#x27;s profit from that one symbol&#x2F;algo pair.
评论 #7899853 未加载
评论 #7899984 未加载
评论 #7900594 未加载
dmmalam将近 11 年前
Is there any breakthrough in programming these things? From a quick glance at the paper it seem like the kernels are still hand written in Verilog. Though there seem to be some significant software infrastructure in integrating the FPGAs into cluster management systems.<p>I think easily and uniformly programming disparate compute devices (CPUs, SIMD, GPUs, FPGAs, ISPs, DSPs, and eventually quantum) is the next BIG problem in programming languages. Several Haskell projects seem promising, but these still tend to be nice DSLs that generate verilog or shaders.<p>On most mobile SoCs the CPU usually takes an increasingly smaller part of the die; there are 10gbe network cards with FPGAs on them; and we&#x27;ve got parrella. The hardware exists, we sorely need the next breakthrough programming environment.
评论 #7901541 未加载
评论 #7901444 未加载
评论 #7899741 未加载
评论 #7899918 未加载
评论 #7900017 未加载
评论 #7900632 未加载
评论 #7900016 未加载
l31g将近 11 年前
<a href="http://research.microsoft.com/apps/pubs/default.aspx?id=212001" rel="nofollow">http:&#x2F;&#x2F;research.microsoft.com&#x2F;apps&#x2F;pubs&#x2F;default.aspx?id=2120...</a>
jacquesm将近 11 年前
I&#x27;d rather have <i>better</i> results than <i>faster</i> results. Faster is only important once you have the quality problem worked out, first make it good and then make it fast has been a long time mantra. The reason is that it is usually very expensive to make something really fast because optimizing code is hard and expensive (case in point they use custom hardware here).<p>The upside is that they&#x27;re doing something innovative but if Bing really wants to steal marketshare from Google they have to improve on their quality, not on their speed. I&#x27;d rather see them take 10 seconds and deliver an absolutely perfect answer than 0.001 second and deliver something not on par with Google but 10 times faster.<p>Impressive to see them backing an exotic solution like this though, and if and when they <i>do</i> get it to be better than Google it may pay off.<p>Are there any developments like this underway at Google?
评论 #7901616 未加载
评论 #7902284 未加载
评论 #7901331 未加载
评论 #7901099 未加载
azakai将近 11 年前
Actually, I already find bing quite fast. Comparing to google search, bing results tend to load a little faster but to be a little lower in quality.<p>FPGAs may make bing twice as fast as it already is, but I don&#x27;t feel like it needs to be faster. Although, I guess if its faster they can trade that off for more work done and so better results, perhaps.
评论 #7900475 未加载
valarauca1将近 11 年前
Sounds like there is a market niche starting to develop for FPGA in server applications. I&#x27;m not saying rush out and make PCIe powered and communicating FPGA&#x27;s I&#x27;m just saying there maybe a market developing for it.<p>Especially with good open source dev tools.
评论 #7899949 未加载
th0ma5将近 11 年前
Seems like I read about Google doing this almost 10 years ago. I know that IBM has the Netezza product which also uses FPGAs for accelerating queries.
zackmorris将近 11 年前
I&#x27;ve been ranting about the inadequacies of mainstream processors for almost twenty years. I remember even back in the late 90s, seeing processors that were 3&#x2F;4 cache memory, with barely any transistors used for logic. It&#x27;s surely worse than that now, with the vast majority of logic gates on chips just sitting around idle. To put it in perspective, a typical chip today has close to a billion transistors (the Intel Core i7 has 731 million):<p><a href="https://en.wikipedia.org/wiki/Transistor_count" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Transistor_count</a><p>A bare minimum CPU that can do at least one operation per clock cycle probably has between 100,000 (SPARC) and 1 million (the PowerPC 602) transistors and runs at 1 watt. So chips today have 1,000 or 10,000 that number of transistors, but do they run that much faster? No of course not.<p>And we can even take that a step further, because those chips suffered from the same inefficiencies that hinder processors today. A full adder takes 28 (yes, twenty eight) transistors. Could we build an ALU that did one simple operation per clock cycle with 1000 transistors? 10,000? How many of those could we fit on a billion transistor chip?<p>Modern CPUs are so many orders of magnitude slower than they could be with a parallel architecture that I’m amazed data centers even use them. GPUs are sort of going the FPGA route with 512 cores or more, but they are still a couple of orders of magnitude less powerful than they could be. And their proprietary&#x2F;closed nature will someday relegate them to history, even with OpenCL&#x2F;CUDA because it frankly sucks to do any real programming when all you have at your disposal is DSP concepts.<p>I really want an open source billion transistor FPGA running at 1 GHz that doesn’t hold my hand with a bunch of proprietary middleware, so that I can program it in a parallel language like Go or MATLAB (Octave). There would be some difficulties with things like interconnect but that’s what things like map reduce are for, to do computation in place rather than transferring data needlessly. Also with diffs or other hash-based algorithms, only portions of data would need to be sent. And it’s time to let go of VHDL&#x2F;Verilog because it’s one level too low. We really need a language above them that lets us wire up basic logic without fear of the chip burning up.<p>And don’t forget the most important part of all: since the chip is reprogrammable, cores can be multi-purpose, so they store their configuration as code instead of hardwired gates. A few hundred gates can reconfigure themselves on the fly to be ALUs, FPUs, anything really. So instead of wasting vast swaths of the chips for something stupid like cache, it can go to storage for logic layouts.<p>What would I use a chip like this for? Oh I don’t know, AI, physics simulations, formula discovery, protein folding, basically all of the problems that current single threaded architectures can’t touch in a cost-effective manor. The right architecture would bring computing power we don’t expect to see for 50 years to right now. I have a dream of someday being able to run genetic algorithms that take hours to complete in a millisecond, and being able to guide the computer rather than program it directly. That was sort of the promise with quantum computing but I think FPGAs are more feasible.
评论 #7900879 未加载
评论 #7900956 未加载
评论 #7900943 未加载
评论 #7900934 未加载
评论 #7900803 未加载
l31g将近 11 年前
<a href="http://www.theregister.co.uk/2014/06/16/microsoft_catapult_fpgas/" rel="nofollow">http:&#x2F;&#x2F;www.theregister.co.uk&#x2F;2014&#x2F;06&#x2F;16&#x2F;microsoft_catapult_f...</a>
samfisher83将近 11 年前
This is cool and all, but instead of spending money on this project why not try to improve their search engine or just not spend this money since bing loses so much money. I don&#x27;t mind waiting half a second extra for my search results. It seems more like their thinking is we got a lot of engineers we are paying a bunch of money to lets do some project.
评论 #7901344 未加载
评论 #7901345 未加载