TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Dumb statistical models, always making people look bad

118 点作者 hackandthink24 天前

12 条评论

gwern21 天前
&gt; There are a few ways to look at this from the standpoint of information that is available to the decision-maker. One is that human knowledge is valuable for guiding developing the model, but once you have a statistical model, it’s a better aggregator of the information. This is echoed by research on judgmental bootstrapping (<a href="https:&#x2F;&#x2F;gwern.net&#x2F;doc&#x2F;statistics&#x2F;decision&#x2F;1974-dawes.pdf" rel="nofollow">https:&#x2F;&#x2F;gwern.net&#x2F;doc&#x2F;statistics&#x2F;decision&#x2F;1974-dawes.pdf</a>), where a statistical model trained on a human expert’s past judgments will tend to outperform that expert.<p>By the way, note that this applies to LLMs too. One of the biggest pons asinorums that people get hung up on is the idea that &quot;it just imitates the data, therefore, it can never be better than the average datapoint (or at least, best datapoint); how could it possibly be <i>better</i>?&quot;<p>Well, we know from a long history that this is not that hard: humans make random errors all the time, and even a linear model with a few parameters or a little flowchart can outperform them. So it shouldn&#x27;t be surprising or a mystery if some much more complicated AI system could too.
评论 #43758021 未加载
评论 #43758555 未加载
评论 #43762753 未加载
vintermann21 天前
&gt; Minimizing loss over aggregates is what a statistical model is designed to do, so if you evaluate human judgment against statistical predictions in aggregate on data similar to what the model was trained on, then you should expect statistical prediction to win<p>This reminds me of the many years machine translation was evaluated on BLEU towards reference translations, because they didn&#x27;t know any better ways. Turns out that if you measure translation quality by n-gram precision towards a reference translation, then methods based on n-gram precision (such as the old pre-NMT Google translate) were really hard to beat.
nitwit00522 天前
You don&#x27;t even need a statistical model. We make checklists because we know we&#x27;ll fail to remember to check things.<p>Humans are tool users. If you make a statistical table to consult for some medical issue, you&#x27;ve using a tool.
评论 #43757203 未加载
rawgabbit22 天前
OTOH. The blog mentions that humans excel at novel situations. Such as when there is little training data, when envisioning alternate outcomes, or when recognizing the data is wrong.<p>The most recent example I can think of is &quot;Frank&quot;. In 2021, JPMorgan Chase acquired Frank, a startup founded by Charlie Javice, for $175 million. Frank claimed to simplify the FAFSA process for students. Javice asserted the platform had over 4 million users, but in reality, it had fewer than 300,000. To support her claim, she allegedly hired a data science professor to generate synthetic data, creating fake user profiles. JPMorgan later discovered the discrepancy when a marketing campaign revealed a high rate of undeliverable emails. In March 2025, Javice was convicted of defrauding JPMorgan.<p>IMO an data expert could have recognized the fake user profiles through the fact he has seen e.g., how messy real data is, know the demographics of would be users of a service like Frank (wealthy, time stressed families), know tell tale signs of fake data (clusters of data that follow obvious &quot;first principles&quot;).
评论 #43759322 未加载
dominicq22 天前
As a matter of practicality, it seems that you professionally now want to be firmly in the tails of the data distribution for your field, e.g. expert in those things that happen rarely.<p>Or maybe even be in a domain which, for whatever reason, is poorly represented by a statistical model, something where data points are hard to get.
评论 #43769829 未加载
mwkaufma21 天前
User &quot;Anoneuoid&quot; from the source&#x27;s own comment thread:<p><pre><code> There is another aspect here where those averaged outcomes are also the output of statistical models. So it is kind of like asking whether statistical models are better at agreeing with other statistical models than humans.</code></pre>
评论 #43758661 未加载
delichon22 天前
&gt; why it’s often hard to demonstrate the value of human knowledge once you have a decent statistical model.<p>This seems to be a near restatement of the bitter lesson. It&#x27;s not just that large enough statistical models outperform algorithms built from human expertise, they also outperform human expertise directly.
评论 #43755306 未加载
3abiton22 天前
It&#x27;s unfortunate how under appreciated is statistics, in nearly all (spare academic) positions that I occupied, mostly in the technical domain interacting with non-technical stakeholders, anectodal evidence always take priority compared to statistical backed data, for decision making. It&#x27;s absurd sometimes.
评论 #43757480 未加载
评论 #43757095 未加载
reedf121 天前
If there is not a human-explainable reason a model has made a prediction - and it&#x27;s just a statistical blob in multi-dimensional feature space (which we cannot introspect) perceived improvement over humans is simply overfitting. It will be <i>extremely</i> good at finding the median issue, or following a decision tree in a more exacting way than a human. What a human can do is expand the degrees of freedom of their internal model at-will, integrate out of sample data, and have a natural human-bias to the individual at the expense of the median. I&#x27;d rather have that...
kreyenborgi21 天前
Versus <a href="https:&#x2F;&#x2F;predictive-optimization.cs.princeton.edu&#x2F;" rel="nofollow">https:&#x2F;&#x2F;predictive-optimization.cs.princeton.edu&#x2F;</a>
whatever121 天前
At least when humans are wrong we own it. Statistical models can be wrong 100% of the times you used them and the claim is ‘oh this is how statistics work, you did not query the model infinite times’.<p>My point is that in many occasions being right on average is less important than being right on the tail.
bicepjai21 天前
Someone had to say this. All models are dump, but some are useful.