科技回声

12 条评论

gwern21 天前

> There are a few ways to look at this from the standpoint of information that is available to the decision-maker. One is that human knowledge is valuable for guiding developing the model, but once you have a statistical model, it’s a better aggregator of the information. This is echoed by research on judgmental bootstrapping (<a href="https://gwern.net/doc/statistics/decision/1974-dawes.pdf" rel="nofollow">https://gwern.net/doc/statistics/decision/1974-dawes.pdf</a>), where a statistical model trained on a human expert’s past judgments will tend to outperform that expert.By the way, note that this applies to LLMs too. One of the biggest pons asinorums that people get hung up on is the idea that "it just imitates the data, therefore, it can never be better than the average datapoint (or at least, best datapoint); how could it possibly be better?"Well, we know from a long history that this is not that hard: humans make random errors all the time, and even a linear model with a few parameters or a little flowchart can outperform them. So it shouldn't be surprising or a mystery if some much more complicated AI system could too.

评论 #43758021 未加载

评论 #43758555 未加载

评论 #43762753 未加载

vintermann21 天前

> Minimizing loss over aggregates is what a statistical model is designed to do, so if you evaluate human judgment against statistical predictions in aggregate on data similar to what the model was trained on, then you should expect statistical prediction to winThis reminds me of the many years machine translation was evaluated on BLEU towards reference translations, because they didn't know any better ways. Turns out that if you measure translation quality by n-gram precision towards a reference translation, then methods based on n-gram precision (such as the old pre-NMT Google translate) were really hard to beat.

nitwit00522 天前

You don't even need a statistical model. We make checklists because we know we'll fail to remember to check things.Humans are tool users. If you make a statistical table to consult for some medical issue, you've using a tool.

评论 #43757203 未加载

rawgabbit22 天前

OTOH. The blog mentions that humans excel at novel situations. Such as when there is little training data, when envisioning alternate outcomes, or when recognizing the data is wrong.The most recent example I can think of is "Frank". In 2021, JPMorgan Chase acquired Frank, a startup founded by Charlie Javice, for $175 million. Frank claimed to simplify the FAFSA process for students. Javice asserted the platform had over 4 million users, but in reality, it had fewer than 300,000. To support her claim, she allegedly hired a data science professor to generate synthetic data, creating fake user profiles. JPMorgan later discovered the discrepancy when a marketing campaign revealed a high rate of undeliverable emails. In March 2025, Javice was convicted of defrauding JPMorgan.IMO an data expert could have recognized the fake user profiles through the fact he has seen e.g., how messy real data is, know the demographics of would be users of a service like Frank (wealthy, time stressed families), know tell tale signs of fake data (clusters of data that follow obvious "first principles").

评论 #43759322 未加载

dominicq22 天前

As a matter of practicality, it seems that you professionally now want to be firmly in the tails of the data distribution for your field, e.g. expert in those things that happen rarely.Or maybe even be in a domain which, for whatever reason, is poorly represented by a statistical model, something where data points are hard to get.

评论 #43769829 未加载

mwkaufma21 天前

User "Anoneuoid" from the source's own comment thread:<pre><code> There is another aspect here where those averaged outcomes are also the output of statistical models. So it is kind of like asking whether statistical models are better at agreeing with other statistical models than humans.</code></pre>

评论 #43758661 未加载

delichon22 天前

> why it’s often hard to demonstrate the value of human knowledge once you have a decent statistical model.This seems to be a near restatement of the bitter lesson. It's not just that large enough statistical models outperform algorithms built from human expertise, they also outperform human expertise directly.

评论 #43755306 未加载

3abiton22 天前

It's unfortunate how under appreciated is statistics, in nearly all (spare academic) positions that I occupied, mostly in the technical domain interacting with non-technical stakeholders, anectodal evidence always take priority compared to statistical backed data, for decision making. It's absurd sometimes.

评论 #43757480 未加载

评论 #43757095 未加载

reedf121 天前

If there is not a human-explainable reason a model has made a prediction - and it's just a statistical blob in multi-dimensional feature space (which we cannot introspect) perceived improvement over humans is simply overfitting. It will be extremely good at finding the median issue, or following a decision tree in a more exacting way than a human. What a human can do is expand the degrees of freedom of their internal model at-will, integrate out of sample data, and have a natural human-bias to the individual at the expense of the median. I'd rather have that...

kreyenborgi21 天前

Versus <a href="https://predictive-optimization.cs.princeton.edu/" rel="nofollow">https://predictive-optimization.cs.princeton.edu/</a>

whatever121 天前

At least when humans are wrong we own it. Statistical models can be wrong 100% of the times you used them and the claim is ‘oh this is how statistics work, you did not query the model infinite times’.My point is that in many occasions being right on average is less important than being right on the tail.

bicepjai21 天前

Someone had to say this. All models are dump, but some are useful.

12 条评论