TE
TechEcho
Home
24h Top
Newest
Best
Ask
Show
Jobs
English
GitHub
Twitter
Back to Profile
Submissions by zone411
1
Public Goods Game Benchmark: Contribute and Punish, a Multi-Agent Benchmark
7 points
by
zone411
about 2 months ago
no comments
2
Elimination Game: Multi-Agent LLM Social Reasoning, Strategy, and Deception
5 points
by
zone411
3 months ago
no comments
3
SWE-Lancer: a benchmark of freelance software engineering tasks from Upwork
111 points
by
zone411
3 months ago
74 comments
4
LLM Hallucination Benchmark: R1, o1, o3-mini, Gemini 2.0 Flash Think Exp 01-21
17 points
by
zone411
3 months ago
3 comments
5
Multi-Agent Step Race Benchmark: LLM Collaboration and Deception Under Pressure
7 points
by
zone411
4 months ago
1 comment
6
Show HN: LLM Thematic Generalization Benchmark
6 points
by
zone411
4 months ago
no comments
7
Show HN: LLM Creative Story-Writing Benchmark
5 points
by
zone411
4 months ago
no comments
8
Show HN: LLM Divergent Thinking Creativity Benchmark
8 points
by
zone411
5 months ago
no comments
← Previous
Next →