科技回声

9 条评论

smokel大约 2 个月前

I'm interested to know if anyone is using fine-tuning to train a model on proprietary or in-house codebases and documentation.RAG solutions seem to have their limitations, and fine-tuning might be a more effective approach.How much effort is required to turn code into something one can use for fine-tuning?

评论 #43417422 未加载

评论 #43417559 未加载

评论 #43417383 未加载

评论 #43421734 未加载

评论 #43419660 未加载

评论 #43418359 未加载

zk大约 2 个月前

Is there a version of Gemma 3 that has tool calling? Google's blog claimed it supports tools but it doesn't seem like it actually does.

评论 #43421159 未加载

评论 #43466760 未加载

bryan0大约 2 个月前

Are people fine-tuning LLMs on their local machines with a single GPU? What are people using to scale their training to multiple nodes / gpus? I've been playing around with Hugging Face Estimators in sagemaker.huggingface but not sure if there are better options for this?

评论 #43416573 未加载

评论 #43417652 未加载

评论 #43417404 未加载

评论 #43418604 未加载

评论 #43416992 未加载

rockwotj大约 2 个月前

is anyone outside of the research labs fine tuning models for production use cases? I have been seeing more people just using foundational models off the shelf especially in light of a new advancement that seems to come every few months

评论 #43415915 未加载

评论 #43416019 未加载

评论 #43416600 未加载

评论 #43416685 未加载

评论 #43416049 未加载

评论 #43415990 未加载

评论 #43415897 未加载

评论 #43415938 未加载

yieldcrv大约 2 个月前

Instead of versions, these things should be labeled by their release date, since this kind of training is based on started at a dataset snapshot in time, colloquially called knowledge-cutoff date which isnt really accuratewe are optimizing these on different dimensions at once, and multiple branches of evolution from each modelso a successor version name doesn't really convey that

huqedato大约 2 个月前

Great article, but I didn't see anything about the costs.I'm particularly interested in this aspect because we're considering fine-tuning Gemma 3, but our budget is tight. We're looking into (real-world) cost estimates for this approach.

评论 #43418321 未加载

评论 #43418267 未加载

siliconc0w大约 2 个月前

It likely makes sense to use more expensive frontier models as teachers or architects for smaller fine-tuned ones that generate the majority of tokens (though possibly against the ToS).

admiralrohan大约 2 个月前

Have anyone used those small models in any production environment?If yes, what they are good and bad at?

dhooper大约 2 个月前

Please try to enjoy each Gemma tuning equally, and not show preference for any over the others

9 条评论

smokel大约 2 个月前

评论 #43417422 未加载

评论 #43417559 未加载

评论 #43417383 未加载

评论 #43421734 未加载

评论 #43419660 未加载

评论 #43418359 未加载

zk大约 2 个月前

Is there a version of Gemma 3 that has tool calling? Google's blog claimed it supports tools but it doesn't seem like it actually does.

评论 #43421159 未加载

评论 #43466760 未加载

bryan0大约 2 个月前

评论 #43416573 未加载

评论 #43417652 未加载

评论 #43417404 未加载

评论 #43418604 未加载

评论 #43416992 未加载

rockwotj大约 2 个月前

评论 #43415915 未加载

评论 #43416019 未加载

评论 #43416600 未加载

评论 #43416685 未加载

评论 #43416049 未加载

评论 #43415990 未加载

评论 #43415897 未加载

评论 #43415938 未加载

yieldcrv大约 2 个月前

huqedato大约 2 个月前

评论 #43418321 未加载

评论 #43418267 未加载

siliconc0w大约 2 个月前

It likely makes sense to use more expensive frontier models as teachers or architects for smaller fine-tuned ones that generate the majority of tokens (though possibly against the ToS).

admiralrohan大约 2 个月前

Have anyone used those small models in any production environment?If yes, what they are good and bad at?

dhooper大约 2 个月前

Please try to enjoy each Gemma tuning equally, and not show preference for any over the others

Fine-tune Google's Gemma 3

9 条评论

Fine-tune Google's Gemma 3

9 条评论