科技回声

14 条评论

One thing that's worth mentioning about llama.cpp wrappers like ollama, LM Studio and Faraday is that they don't yet support[1] sliding window attention, and instead use vanilla causal attention from llama2. As noted in the Mistral 7B paper[2], SWA has some benefits in terms of attention span over regular causal attention.Disclaimer: I have a competing universal macOS/iOS app[3] that does support SWA with Mistral models (using mlc-llm).[1]: <a href="https://github.com/ggerganov/llama.cpp/issues/3377">https://github.com/ggerganov/llama.cpp/issues/3377</a>[2]: <a href="https://arxiv.org/abs/2310.06825" rel="nofollow noreferrer">https://arxiv.org/abs/2310.06825</a>[3]: <a href="https://apps.apple.com/us/app/private-llm/id6448106860" rel="nofollow noreferrer">https://apps.apple.com/us/app/private-llm/id6448106860</a>

评论 #38672148 未加载

评论 #38672459 未加载

评论 #38672021 未加载

评论 #38672441 未加载

FergusArgyll超过 1 年前

For anyone who only has 8gb RAM;I can run orca-mini:3b, It's dumber than just flipping a coin but it still feels cool to have a LLM running on your own computer.

评论 #38675845 未加载

vaillant超过 1 年前

Trivial to run Mistral 7B on an M1 Macbook Air using LM Studio. Just make sure you use a quantized version.

评论 #38670754 未加载

评论 #38670869 未加载

jasonjmcghee超过 1 年前

If you prefer web UI: <a href="https://github.com/ollama-webui/ollama-webui">https://github.com/ollama-webui/ollama-webui</a>

Const-me超过 1 年前

Windows equivalent: <a href="https://github.com/Const-me/Cgml/tree/master/Mistral/MistralChat">https://github.com/Const-me/Cgml/tree/master/Mistral/Mistral...</a>Runs on GPUs, uses about 5GB VRAM. On integrated GPUs generates 1-2 tokens/second, on discrete ones often over 20 tokens/second.

评论 #38671361 未加载

yawnxyz超过 1 年前

Running mixtral on an 32gb M3 Max is surprisingly hard on LM Studio... turning on "Apple Metal" mode just crashes it. It takes about 10 minutes to load otherwise into memory, and inference is super slow.Anyone got suggestions? Would love for it to run summarization over GBs of ncbi data.

评论 #38670393 未加载

评论 #38672444 未加载

评论 #38672174 未加载

评论 #38670294 未加载

评论 #38672443 未加载

heyoni超过 1 年前

Strange. I remember trying to get this to work on a 16gb machine and all of the comments on a github issue mentioning it were saying it needs at least 32 or more./edit this was with llama cpp though not ollama

评论 #38670230 未加载

评论 #38670285 未加载

georgel超过 1 年前

What is the performance? I didn't see any benchmarks listed.

chestertn超过 1 年前

I’m sure this question has been asked plenty of times but what is the best place to start with LLMs (no theory, just installing them locally, fine tuning them, etc)

评论 #38671375 未加载

nextaccountic超过 1 年前

Does this run on CPU or GPU?Would it be advantageous to divide the load and run on both?

评论 #38670756 未加载

rattray超过 1 年前

Is there a llamafile?

dangero超过 1 年前

Aside from LM Studio there's also Faraday <a href="https://faraday.dev/" rel="nofollow noreferrer">https://faraday.dev/</a>

dankle超过 1 年前

LM Studio is even easier.

评论 #38671026 未加载

dixie_land超过 1 年前

I find it hard to trust a tutorial that tells me forking curl is an exemplary way to interact with a local restful API in Python.

评论 #38670306 未加载

评论 #38672394 未加载

评论 #38672195 未加载

评论 #38670735 未加载

评论 #38670268 未加载

14 条评论

woadwarrior01超过 1 年前

评论 #38672148 未加载

评论 #38672459 未加载

评论 #38672021 未加载

评论 #38672441 未加载

FergusArgyll超过 1 年前

For anyone who only has 8gb RAM;I can run orca-mini:3b, It's dumber than just flipping a coin but it still feels cool to have a LLM running on your own computer.

评论 #38675845 未加载

vaillant超过 1 年前

Trivial to run Mistral 7B on an M1 Macbook Air using LM Studio. Just make sure you use a quantized version.

评论 #38670754 未加载

评论 #38670869 未加载

jasonjmcghee超过 1 年前

If you prefer web UI: <a href="https://github.com/ollama-webui/ollama-webui">https://github.com/ollama-webui/ollama-webui</a>

Const-me超过 1 年前

评论 #38671361 未加载

yawnxyz超过 1 年前

评论 #38670393 未加载

评论 #38672444 未加载

评论 #38672174 未加载

评论 #38670294 未加载

评论 #38672443 未加载

heyoni超过 1 年前

评论 #38670230 未加载

评论 #38670285 未加载

georgel超过 1 年前

What is the performance? I didn't see any benchmarks listed.

chestertn超过 1 年前

I’m sure this question has been asked plenty of times but what is the best place to start with LLMs (no theory, just installing them locally, fine tuning them, etc)

评论 #38671375 未加载

nextaccountic超过 1 年前

Does this run on CPU or GPU?Would it be advantageous to divide the load and run on both?

评论 #38670756 未加载

rattray超过 1 年前

Is there a llamafile?

dangero超过 1 年前

Aside from LM Studio there's also Faraday <a href="https://faraday.dev/" rel="nofollow noreferrer">https://faraday.dev/</a>

dankle超过 1 年前

LM Studio is even easier.

评论 #38671026 未加载

dixie_land超过 1 年前

I find it hard to trust a tutorial that tells me forking curl is an exemplary way to interact with a local restful API in Python.

Run Mistral 7B on M1 Mac

14 条评论

Run Mistral 7B on M1 Mac

14 条评论