TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Who is working on forward and backward compatibility for LLMs?

96 pointsby nijfranckalmost 2 years ago

8 comments

HarHarVeryFunnyalmost 2 years ago
OpenAI have some degree of versioning with the models used by their APIs, but it seems they are perhaps still updating (fine tuning) models without changing the model name&#x2F;version. For ChatGPT itself (not the APIs) many people have reported recent regressions in capability, so it seems the model is being changed there too.<p>As people start to use these API&#x27;s in production, there needs to be stricter version control, especially given how complex (impossible unless you are only using a fixed set of prompts) it is for anyone to test for backwards compatibility. Maybe something like Ubuntu&#x27;s stable long-term releases vs bleeding edge ones would work. Have some models that are guaranteed not to change for a specified amount of time, and others that will be periodically updated for people who want cutting edge behavior and care less about backwards compatibility.
评论 #36260680 未加载
评论 #36260243 未加载
评论 #36260398 未加载
评论 #36262061 未加载
评论 #36262245 未加载
beepbooptheoryalmost 2 years ago
Like 4 months ago people were saying the Singularity has pretty much already happened and everything is going to change&#x2F;the world is over, but here we are now dealing with hard and very boring problems around versioning&#x2F;hardening already somewhat counter-intuitive and highly-engineered prompts in order to hopefully eek out a single piece of consistent functionality, maybe.
nijfranckalmost 2 years ago
When a newer LLM model comes (e.g GPT3.5 to GPT4), your old prompts become obsolete. How are you solving this problem in your company? Are there companies working on solving this problem?
评论 #36259460 未加载
评论 #36259996 未加载
评论 #36259440 未加载
brucethemoose2almost 2 years ago
I dunno...<p>This sounds like making diffusion backwards compatible with ESRGAN. <i>Technically</i> they are both upscaling denoisers (with finetunes for specific tasks), and you can set up objective tests compatible with both, but actual way they are used is so different that its not even a good performance measurement.<p>The same thing applies to recent LLMs, and the structural changes are only going to get more drastic and fundamental. For instance, what about LLMs with seperate instruction and data context? Or multimodal LLMs with multiple inputs&#x2F;outputs? Or LLMs that finetune themselves during inference? That is just scratching the surface.
评论 #36264217 未加载
netruk44almost 2 years ago
&gt; If you expect the models you use to change at all, it’s important to unit-test all your prompts using evaluation examples.<p>It&#x27;s mentioned earlier in the article, but I&#x27;d like to emphasize that if you go down this route that you should either do <i>multiple</i> evaluations per prompt and come up with some kind of averaged result, or set the temperature to 0.<p>FTA:<p>&gt; LLMs are stochastic – there’s no guarantee that an LLM will give you the same output for the same input every time.<p>&gt; You can force an LLM to give the same response by setting temperature = 0, which is, in general, a good practice.
评论 #36260066 未加载
评论 #36260068 未加载
评论 #36265610 未加载
评论 #36262152 未加载
ITBalmost 2 years ago
I suggest this is the wrong way to think about this. Alexa tried for a very long time to agree on a “Alexa Ontology” and it just doesn’t work for large enough surface areas. Testing that new versions of LLMs work is better than trying to make everything backward compatible. Also, the “structured” component of the response (e.g.: send your answer in JSON format), should be something not super brittle. In fact if the structure takes a lot of prompting to work, you are probably setting yourself up.
lachlan_grayalmost 2 years ago
LMQL helps a lot with this kind of thing. It makes it really easy to swap prompts and models out, and in general it allows you to maintain your prompt workflows in whatever way you maintain the rest of your python code.<p>I’m expecting there will be more examples soon, but you can check out my tree of thoughts implementation below to see what I mean<p><a href="https:&#x2F;&#x2F;github.com&#x2F;LachlanGray&#x2F;lmql-tree-of-thoughts">https:&#x2F;&#x2F;github.com&#x2F;LachlanGray&#x2F;lmql-tree-of-thoughts</a>
aldousd666almost 2 years ago
Meta is getting it done for free by releasing their models open source. Now everyone is building things that work with their models.