TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Llemma: An Open Language Model for Mathematics

267 pointsby AlphaWeaverover 1 year ago

13 comments

aSanchezSternover 1 year ago
Note that this still doesn&#x27;t seem as good at solving proofs as some of the specialized prover models at formal theorem proving that aren&#x27;t LLM-based. In particular, they show a 3% increase in proves solved over COPRA on the MiniF2F Lean dataset, but in COPRA&#x27;s own paper they show that they prove about 18.5% fewer theorems than Proverbot9001 on the CompCert dataset (their pitch is that they are faster, but not better at proving, than other tools). Assuming COPRA isn&#x27;t just way better at Lean than Coq (there should be some amount of difference, but given that it&#x27;s the same algorithm run on a uniform interface over Coq and Lean it&#x27;s unlikely to be large), Llemma should be proving 10-15% fewer proofs than Proverbot9001&#x27;s algorithm. There&#x27;s a bit of trickiness to trying to do this kind of cross-benchmark comparison, and COPRA picks a <i>bit</i> of a weird dataset to compare to Proverbot9001 on to save costs, but they also use a slightly older version which has worse results. So it probably still washes out to the best LLM-based provers being about 10-15% worse at proving theorems than the best specialized models.<p>EDIT: To be clear, that&#x27;s 10-15% of the <i>total</i> theorems in a test set, not a relative 10-15% improvement. Given that solve rates for tools are in the 10-30% range, that&#x27;s much more than a relative 10-15% improvement.
评论 #37921733 未加载
YetAnotherNickover 1 year ago
WizardMath[1] clearly has better benchmark results and they conveniently ignored it. They knew about the model because they have used their dataset.<p>This has 51.5% in GSM8K compared to 81.6% for WizardMath.<p>[1]: <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;WizardLM&#x2F;WizardMath-70B-V1.0" rel="nofollow noreferrer">https:&#x2F;&#x2F;huggingface.co&#x2F;WizardLM&#x2F;WizardMath-70B-V1.0</a>
评论 #37928034 未加载
alhirzelover 1 year ago
I love the name, it&#x27;s intuitive today and does not have any overloaded meanings or other confusers.
评论 #37923115 未加载
Syntoniclesover 1 year ago
I wonder if I could use this to explore mathematics. I have been attempting to use existing LLMs, but results vary widely. I could imagine getting intuitions and insights about an equation, where the responses don&#x27;t have to be watertight algebraic manipulations, but rather just deeply informed descriptions from several perspectives.
pk-protect-aiover 1 year ago
&gt;&gt; Before training L LEMMA 7B, we contract the RoPE (Su et al., 2022) base period of the Code Llama 7B initialization from θ = 1, 000, 000 to θ = 10, 000. This is so that the long context finetuning procedure described in Peng et al. (2023)and Rozière et al. (2023) can be repeated on the trained LLEMMA 7B (we leave actually doing so to future work).<p>This does not explain what was the reason for RoPE contraction in the first place. Can anyone elaborate what could be the reason?
评论 #37926636 未加载
classifiedover 1 year ago
So we get hallucinations in math, too. This was inevitable.
评论 #37925599 未加载
leedrake5over 1 year ago
Downloading this now to test it out - if anyone has prompts or tests they’d like to see let me know. I plan to run it with llama.cpp
评论 #37920886 未加载
评论 #37923653 未加载
pumanoirover 1 year ago
I skimmed (ctrl-f) the paper and didn&#x27;t see a comparison against gpt-4. Anybody knows how they&#x27;d compare?
评论 #37920493 未加载
Zuiiiover 1 year ago
This seems to be as open as proprietary game engines as no mention of model license was present in the paper, code, model card. I wonder whether their incorrect and misleading use of &quot;open&quot; will be caught during peer review.<p>Maybe the authors also believe that Windows is open. Its source code is also publicly available after all.
评论 #37924453 未加载
akomtuover 1 year ago
A missed opportunity: TheLemma.
评论 #37926359 未加载
an_aparallelover 1 year ago
so - now you dont need to know maths to work on DSP heavy&#x2F;math reliant programming? :)
lanstinover 1 year ago
Can we run this somewhere?
评论 #37918812 未加载
评论 #37919249 未加载
behnamohover 1 year ago
Can we stop using catchy marketing names for things that are *supposed to be* scientific results?<p>Choosing &quot;llemma&quot; (similar to Meta&#x27;s &quot;Llama&quot;) is just clout seeking.
评论 #37919466 未加载
评论 #37919879 未加载
评论 #37919468 未加载
评论 #37919448 未加载
评论 #37921761 未加载
评论 #37919425 未加载