2 pointsby panquecaabout 1 year ago

1 comment

panquecaabout 1 year ago

HumanEval Benchmark: 95.1 @ GPT-3.5<p>I wonder if it can be combined with projects like SWE-Agent to build powerful yet opensource coding agents.<p>- <a href="https://paperswithcode.com/sota/code-generation-on-humaneval" rel="nofollow">https://paperswithcode.com/sota/code-generation-on-humaneval</a><p>- <a href="https://github.com/princeton-nlp/SWE-agent">https://github.com/princeton-nlp/SWE-agent</a>

LDB: Large Language Model Debugger via Verifying Runtime Execution Step by Step

1 comment

LDB: Large Language Model Debugger via Verifying Runtime Execution Step by Step

1 comment