科技回声

6 条评论

valine大约 2 年前

The best of the best right now is probably Vicuna 13B. The 30B and 65B LLaMA models are better on benchmarks, but there isn’t a compelling instruct fine tuned version of those yet so they require a lot of prompt engineering.If you want to run Vicuna without quantization you need 25GB of VRAM, which exceeds pretty much all consumer GPUs. Vicuna 4bit GPTQ is decent though I personally notice a quality difference when comparing it to 16bit.CPU is also an option, you can run pretty much any model that will fit in your RAM, although your performance will obviously suffer. LlamaCPP has gotten very popular.

评论 #35760613 未加载

评论 #35761035 未加载

kenniy大约 2 年前

On a slight tangent, I feel like this is one major ace up Apple’s sleeve, if they can zoom into it. With the awesome performance of Apple Silicon and how lots of the big leaps are seen with video rendering, they just need to focus some more on the ML capabilities. They’ve been skirting around ML with some optimizing on popular ML libraries, but it’s mostly focused on inference, but hopefully with recent pytorch 2.0 optimizations and co, they can meet open source libraries halfway, and do more.I think the current proliferation of AI and general awareness of LLM can be a major selling point if they make sure their neural engine is well optimized for it. Will put them right at the center of the conversation, especially since one of the current concern is the cost of training these models.

drakerossman大约 2 年前

<a href="https://stateofart.ai/" rel="nofollow">https://stateofart.ai/</a>Disclosure: I am the author of the website, and it's extremely light on content currently.

seydor大约 2 年前

<a href="https://www.youtube.com/@Aitrepreneur">https://www.youtube.com/@Aitrepreneur</a><a href="https://www.reddit.com/r/LocalLLaMA/" rel="nofollow">https://www.reddit.com/r/LocalLLaMA/</a>

speedgoose大约 2 年前

1. The best hardware you can afford.2. Take a look at huggingface.co3. Rent for shot periods, buy if you need it for a long time. You can do the maths.4. Smaller models, quantization, running on CPU when the speed and increased energy usage isn’t a problem.

iamflimflam1大约 2 年前

Probably easier to answer your question if you can explain what you are trying to achieve.

6 条评论

valine大约 2 年前

评论 #35760613 未加载

评论 #35761035 未加载

kenniy大约 2 年前

drakerossman大约 2 年前

<a href="https://stateofart.ai/" rel="nofollow">https://stateofart.ai/</a>Disclosure: I am the author of the website, and it's extremely light on content currently.

seydor大约 2 年前

<a href="https://www.youtube.com/@Aitrepreneur">https://www.youtube.com/@Aitrepreneur</a><a href="https://www.reddit.com/r/LocalLLaMA/" rel="nofollow">https://www.reddit.com/r/LocalLLaMA/</a>

speedgoose大约 2 年前

iamflimflam1大约 2 年前

Probably easier to answer your question if you can explain what you are trying to achieve.

Ask HN: How to run Language Models on your own?

6 条评论

Ask HN: How to run Language Models on your own?

6 条评论