TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Fitting to Noise or Nothing at All: Machine Learning in Markets

295 点作者 bilifuduo将近 8 年前

7 条评论

dkural将近 8 年前
Most academic CS literature is complete BS. The vast majority of papers fit into a simple formula of &quot;We apply method X to problem Y, and outperform other approaches using approaches similar to method X&quot;.<p>Meanwhile, no one uses method X or any of its cousins, because in the real world the problem is solved very differently with a combination of both principled algorithms and heuristics derived from real-world datasets.<p>The paper also fails to give any theoretical reason or mathematical insight as to why their version of X is better.<p>Thus, it doesn&#x27;t actually solve a real world problem OR advance scientific understanding.
评论 #14945047 未加载
评论 #14944955 未加载
评论 #14945767 未加载
评论 #14945955 未加载
评论 #14944630 未加载
lettergram将近 8 年前
I can&#x27;t up vote this enough..<p>I&#x27;ve spend an unreasonable amount of time reading research related to finance for my web app Piglet:<p><a href="https:&#x2F;&#x2F;projectpiglet.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;projectpiglet.com&#x2F;</a><p>Long story short, all &quot;research&quot; is pretty much B.S.<p>I used to assume this is because people want to make money, so they keep the good analysis secret. However, after working in the industry a few years; it&#x27;s mostly because they just don&#x27;t know how to apply the algorithms or if it&#x27;s even possible.<p>I think my favorite example is the seminal paper on using twitter sentiment to predict stock movement[1]. They don&#x27;t use a large enough data set, and more importantly they use granger causality to identify &quot;casualty&quot;[2] between sentiment and stock value. They then claim they found a specific range which has a p-value indicating they are correlated... Of course you&#x27;ll find a correlation when you look at two normalized signals and try to match them up.<p>Now, if they had not use the DJIA (Dow Jones Industrial Average) and instead used 500 individual stocks, and found the sentiment on twitter correlated with stock value(s) 90% of the time between 5 and 10 days. I&#x27;d argue they probably have something.<p>However, because their method is literally a BFS on only two signals in an attempt to find a correlation, they must correct for the p-value. i.e. &quot;look and you shall find&quot;[3]<p>This is just one of the hundred issues I&#x27;ve found, but really sheds light on how bad that industry is.<p>[1] <a href="https:&#x2F;&#x2F;scholar.google.com&#x2F;scholar?hl=en&amp;q=twitter+stock+sentiment+analysis&amp;btnG=&amp;as_sdt=1%2C47&amp;as_sdtp=&amp;oq=twitter" rel="nofollow">https:&#x2F;&#x2F;scholar.google.com&#x2F;scholar?hl=en&amp;q=twitter+stock+sen...</a><p>[2] Granger Causality offsets two signals in an attempt to find a correlation between them at an offset between two times. AKA find &quot;causality&quot; by finding correlation at an offset in time<p>[3] <a href="https:&#x2F;&#x2F;stats.stackexchange.com&#x2F;questions&#x2F;5750&#x2F;look-and-you-shall-find-a-correlation" rel="nofollow">https:&#x2F;&#x2F;stats.stackexchange.com&#x2F;questions&#x2F;5750&#x2F;look-and-you-...</a>
评论 #14949072 未加载
评论 #14950092 未加载
评论 #14967006 未加载
murbard2将近 8 年前
I got into quant finance 12 years ago with the mistaken idea that I was going to successfully use all these cool machine learning techniques (genetic programming! SVMs! neural networks!) to run great statistical arbitrage books.<p>Most machine learning techniques focus on problems where the signal is very strong, but the structure is very complex. For instance, take the problem of recognizing whether a picture is a picture of a bird. A human will do well on this task, which shows that there is very little intrinsic noise. However, the correlation of any given pixel with the class of the image is essentially 0. The &quot;noise&quot; is in discovering the unknown relationship between pixels and class, not in the actual output.<p>Noise dominates everything you will find in statistical arbitrage. R^2 of 1% <i>are</i> something to write home about. With this amount of noise, it&#x27;s generally hard to do much better than a linear regression. Any model complexity has to come from integrating over latent parameters or manual feature engineering, the rest will overfit.<p>I think Geoffrey Hinton said that statistics and machine learning are really the same thing, but since we have two different names for it, we might as well call machine learning everything that focuses on dealing with problems with a complex structure and low noise, and statistics everything that focuses on dealing with problems with a large amount of noise. I like this distinction, and I did end up picking up a lot of statistics working in this field.<p>I&#x27;ll regularly get emails from friends who tried some machine learning technique on some dataset and found promising results. As the article points out, these generally don&#x27;t hold up. Accounting for every source of bias in a backtest is an art. The most common mistake is to assume that you can observe the relative price of two stocks at the close, and trade at that price. Many pairs trading strategies appear to work if you make this assumption (which tends to be the case if all you have are daily bars), but they really do not. Others include: assuming transaction costs will be the same on average (they won&#x27;t, your strategy likely detects opportunities at time were the spread is very large and prices are bad), assuming index memberships don&#x27;t change (they do and that creates selection bias), assuming you can short anything (stocks can be hard to short or have high borrowing costs), etc.<p>In general, statistical arbitrage isn&#x27;t machine learning bound(1), and it is not a data mining endeavor. Understanding the latent market dynamics you are trying to capitalize on, finding new data feeds that provide valuable information, carefully building out a model to test your hypothesis, deriving a sound trading strategy from that model is how it works.<p>(1: this isn&#x27;t always true. For instance, analyzing news with NLP, or using computer vision to estimate crop outputs from satellite imagery can make use of machine learning techniques to yield useful, tradeable signals. My comment mostly focuses on machine learning applied to price information. )
评论 #14944726 未加载
评论 #14946189 未加载
评论 #14945895 未加载
评论 #14945377 未加载
评论 #14945220 未加载
评论 #14947095 未加载
评论 #14944691 未加载
zacharydavid将近 8 年前
Special thanks to Nickolas Younker (at LiquidWeb) for saving my behind and getting this all set up.
评论 #14955480 未加载
评论 #14944228 未加载
评论 #14955482 未加载
评论 #14955483 未加载
zacharydavid将近 8 年前
Sorry guys. Traffic killed the site. Booting up a new server
评论 #14944173 未加载
评论 #14944118 未加载
chvid将近 8 年前
Does anyone know of any paper that describes a reproducible method of generating above normal returns in the mature western financial markets? Nope. Me neither.
评论 #14947503 未加载
评论 #14947303 未加载
dogruck将近 8 年前
Would be nice to see a standard academic platform for backtesting. Then, the paper could say &quot;we submitted our implementation of this strategy to Backtest (which includes transaction costs and slippage).&quot;
评论 #14946887 未加载