TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

GPT-4 LLM simulates people well enough to replicate social science experiments

223 点作者 thoughtpeddler9 个月前

35 条评论

dongobread9 个月前
I'm very skeptical on this, the paper they linked is not convincing. It says that GPT-4 is correct at predicting the experiment outcome direction 69% of the time versus 66% of the time for human forecasters. But this is a silly benchmark because people are not trusting human forecasters in the first place, that's the whole purpose for why the experiment is run. Knowing that GPT-4 is slightly better at predicting experiments than some human guessing doesn't make it a useful substitute for the actual experiment.
评论 #41190145 未加载
评论 #41190865 未加载
评论 #41198359 未加载
评论 #41188126 未加载
评论 #41191922 未加载
评论 #41188840 未加载
评论 #41187787 未加载
评论 #41187458 未加载
anileated9 个月前
If GPT emulations of social experiments are not correct, policy decisions based on them will make them so.<p>“GPT said people would hate buses, so we halved their number and slashed transportation budget… Wow, do our people actually hate buses with passion!”<p>“A year ago GPT said people would not be worried about climate change, so we stopped giving it coverage and removed related social adverts and initiatives. People really don’t give a flying duck about climate change it turns out, GPT was so right!”<p>This is an oversimplification, of course; to say it with more nuance, anything socio- and psycho- is a minefield of self-fulfilling prophecies that ML seems to be nicely positioned to wreak havoc in. (But the small “this is not a replacement for human experiment” notice is going to be heeded by all, right?)<p>As someone wrote once, all you need for machine dictatorship is an LLM and a critical number of human accomplices. No need for superintelligence or robots.
评论 #41189079 未加载
评论 #41188948 未加载
评论 #41192621 未加载
评论 #41188873 未加载
评论 #41189835 未加载
42lux9 个月前
Everyone and their mom in advertising sold &quot;GPT Persona&quot; tools which are basically just an api call to brands for target group simulation. Think &quot;Chat with your target group&quot; kinda stuff.<p>Hint: They like it because it&#x27;s biased for what they want... like real marketing studies.
评论 #41187251 未加载
visarga9 个月前
Reminds me of:<p>&gt; Out of One, Many: Using Language Models to Simulate Human Samples<p>&gt; We propose and explore the possibility that language models can be studied as effective proxies for specific human sub populations in social science research. Practical and research applications of artificial intelligence tools have sometimes been limited by problematic biases (such as racism or sexism), which are often treated as uniform properties of the models. We show that the &quot;algorithmic bias&quot; within one such tool -- the GPT 3 language model -- is instead both fine grained and demographically correlated, meaning that proper conditioning will cause it to accurately emulate response distributions from a wide variety of human subgroups. We term this property &quot;algorithmic fidelity&quot; and explore its extent in GPT-3. We create &quot;silicon samples&quot; by conditioning the model on thousands of socio demographic backstories from real human participants in multiple large surveys conducted in the United States. We then compare the silicon and human samples to demonstrate that the information contained in GPT 3 goes far beyond surface similarity. It is nuanced, multifaceted, and reflects the complex interplay between ideas, attitudes, and socio cultural context that characterize human attitudes. We suggest that language models with sufficient algorithmic fidelity thus constitute a novel and powerful tool to advance understanding of humans and society across a variety of disciplines.<p><a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2209.06899" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2209.06899</a>
vekntksijdhric9 个月前
Same energy as <a href="https:&#x2F;&#x2F;mastodon.radio&#x2F;@wa7iut&#x2F;112923475679116690" rel="nofollow">https:&#x2F;&#x2F;mastodon.radio&#x2F;@wa7iut&#x2F;112923475679116690</a>
评论 #41189012 未加载
tylerrobinson9 个月前
A YC company called Roundtable tried to do this.[1]<p>The comments were not terribly supportive. They’ve since pivoted to a product that does survey data cleaning.<p>[1] <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=36865625">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=36865625</a>
评论 #41188642 未加载
评论 #41189161 未加载
xp849 个月前
Can someone translate for us non-social-scientists in the audience what this means? &quot;3. Treatment. Write a message or vignette exactly as it would appear in a survey experiment.&quot;<p>Probably would be sufficient to just give a couple examples of what might constitute one of these.<p>Sorry, I know this is probably basic to someone who is in that field.
评论 #41187756 未加载
评论 #41187315 未加载
nullc9 个月前
But do the experiments replicate better in LLMs than in actual humans? :D<p>We should expect LLMs to be pretty good at repeating back to us the stories we tell about ourselves.
TeaBrain9 个月前
I don&#x27;t think &quot;replicate&quot; is the appropriate word here.
评论 #41187147 未加载
AdieuToLogic9 个月前
So did ELIZA[0] about sixty (60) years ago.<p>0 - <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;ELIZA" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;ELIZA</a>
benterix9 个月前
So, we finally found the cure for the replication crisis in social sciences: just run them on LLMs.
评论 #41189295 未加载
评论 #41192015 未加载
klyrs9 个月前
Why stop at social science? I say we make a questionnaire, give it to the GPT over a broad range of sampling temperatures, and collect the resulting score:temperature data. From that dataset, we can take people&#x27;s temperatures over the phone with a short panel of questions!<p>(this is parody)
thoughtpeddler9 个月前
Accompanying working paper that demonstrates 85% accuracy of GPT-4 in replicating 70 social science experiment results: <a href="https:&#x2F;&#x2F;docsend.com&#x2F;view&#x2F;qeeccuggec56k9hd" rel="nofollow">https:&#x2F;&#x2F;docsend.com&#x2F;view&#x2F;qeeccuggec56k9hd</a>
评论 #41187157 未加载
pedalpete9 个月前
I wonder if this could be used for testing marketing or UX actions?
croes9 个月前
Is that the solution to social science&#x27;s replication problem?
评论 #41187219 未加载
dartos9 个月前
Were those experiments in the training set?<p>If so, how close was the examination vs the record the model was trained on.<p>Some interesting insights there, I think.
评论 #41187044 未加载
hollowpython9 个月前
Did they test whether GPT4 replicated already existing social science experiments? If so, this might have happened because the experiment was in the training data
NicoJuicy9 个月前
That&#x27;s only for known situations.<p>Eg. Try LLM&#x27;s to find availability hours when you have the start and end time of each day.<p>LLM&#x27;s don&#x27;t really understand that you need to use the day 1 endhour and then the starthour of the next day.
Log_out_9 个月前
What happened to running virtual experiments aka running filters on NSA metadata from all conversation of all mankind?
undergod9 个月前
Nature&#x27;s technology vs. Human Technology<p>Emergent vs Reductionist<p>LLMs are Emergent<p>Therefore, LLMs break the mold of human technology and enter the realm of nature&#x27;s technology<p>We should talk about AI Beings not apps if we wish to stay with this analogy.<p>We can always say that&#x27;s BS and that the analogy does not reach that deep<p>Or we can take it for what it is and admit LLMs are not similar in their behavior to any tool that humans have created to date, all the way from homo habilus till homo sapiens sapiens. Tools are predictable. Intelligences are not.
gitfan869 个月前
The good news is that they should be able to replicate real world events to validate of this is true or not.<p>Tesla FSD is a good example of this in real life. You can measure how closely the car acts like a human based off of interventions and crashes that were due to unhuman behavior, as well in the first round of the robot taxi fleet which will have a safety driver, you can measure how many people complain that the driver was bad
freeone30009 个月前
I think it is far, far more likely that it replicates social science experiments well enough to simulate people
daghamm9 个月前
Not so sure about that, but I can absolutely see people getting addicted to them.
Piskvorrr9 个月前
Ooooor maybe, testing if the experiments are similar to what was in the corpus.
jtc3319 个月前
But does it replicate _better_ than really running the experiment again?<p>Joking…but not joking.
pftburger9 个月前
This is gonna end well…
1oooqooq9 个月前
this tells more about how social science data is manipulated than the usefulness of llm
nsonha9 个月前
please don&#x27;t, need I remind you the joke that social science is not real science
jrflowers9 个月前
I love that anyone can just write whatever they want and post it online.<p>GPT-4 can stand in for humans. Charlie Brown is mentioned in the Upanishads. The bubonic plague was a spread via telegram. Easter falls on 9&#x2F;11 once every other decade.<p>You can just write shit and hit post and boom, by nature of it being online someone will entertain it as true, even if only briefly so. Wild stuff!
padjo9 个月前
Well that’s one way to solve the replication crisis
boesboes9 个月前
And yet it can&#x27;t replicate a human support agent. Or even a basic search function for that matter ;)
scudsworth9 个月前
garbage in, eh?
itkovian_9 个月前
Phsycohistory
lccerina9 个月前
Source: trust us. This is some bullshit science.
uptownfunk9 个月前
Is it possible to train an LLM that is minimally biased and that could assume various personas for the purpose of the experiments? Then I imagine it’s just some prompt engineering no?