TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Run Mistral 7B on M1 Mac

111 点作者 byyoung3超过 1 年前

14 条评论

woadwarrior01超过 1 年前
One thing that&#x27;s worth mentioning about llama.cpp wrappers like ollama, LM Studio and Faraday is that they don&#x27;t yet support[1] sliding window attention, and instead use vanilla causal attention from llama2. As noted in the Mistral 7B paper[2], SWA has some benefits in terms of attention span over regular causal attention.<p>Disclaimer: I have a competing universal macOS&#x2F;iOS app[3] that does support SWA with Mistral models (using mlc-llm).<p>[1]: <a href="https:&#x2F;&#x2F;github.com&#x2F;ggerganov&#x2F;llama.cpp&#x2F;issues&#x2F;3377">https:&#x2F;&#x2F;github.com&#x2F;ggerganov&#x2F;llama.cpp&#x2F;issues&#x2F;3377</a><p>[2]: <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2310.06825" rel="nofollow noreferrer">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2310.06825</a><p>[3]: <a href="https:&#x2F;&#x2F;apps.apple.com&#x2F;us&#x2F;app&#x2F;private-llm&#x2F;id6448106860" rel="nofollow noreferrer">https:&#x2F;&#x2F;apps.apple.com&#x2F;us&#x2F;app&#x2F;private-llm&#x2F;id6448106860</a>
评论 #38672148 未加载
评论 #38672459 未加载
评论 #38672021 未加载
评论 #38672441 未加载
FergusArgyll超过 1 年前
For anyone who only has 8gb RAM;<p>I can run orca-mini:3b, It&#x27;s dumber than just flipping a coin but it still feels cool to have a LLM running on your own computer.
评论 #38675845 未加载
vaillant超过 1 年前
Trivial to run Mistral 7B on an M1 Macbook Air using LM Studio. Just make sure you use a quantized version.
评论 #38670754 未加载
评论 #38670869 未加载
jasonjmcghee超过 1 年前
If you prefer web UI: <a href="https:&#x2F;&#x2F;github.com&#x2F;ollama-webui&#x2F;ollama-webui">https:&#x2F;&#x2F;github.com&#x2F;ollama-webui&#x2F;ollama-webui</a>
Const-me超过 1 年前
Windows equivalent: <a href="https:&#x2F;&#x2F;github.com&#x2F;Const-me&#x2F;Cgml&#x2F;tree&#x2F;master&#x2F;Mistral&#x2F;MistralChat">https:&#x2F;&#x2F;github.com&#x2F;Const-me&#x2F;Cgml&#x2F;tree&#x2F;master&#x2F;Mistral&#x2F;Mistral...</a><p>Runs on GPUs, uses about 5GB VRAM. On integrated GPUs generates 1-2 tokens&#x2F;second, on discrete ones often over 20 tokens&#x2F;second.
评论 #38671361 未加载
yawnxyz超过 1 年前
Running mixtral on an 32gb M3 Max is surprisingly hard on LM Studio... turning on &quot;Apple Metal&quot; mode just crashes it. It takes about 10 minutes to load otherwise into memory, and inference is super slow.<p>Anyone got suggestions? Would love for it to run summarization over GBs of ncbi data.
评论 #38670393 未加载
评论 #38672444 未加载
评论 #38672174 未加载
评论 #38670294 未加载
评论 #38672443 未加载
heyoni超过 1 年前
Strange. I remember trying to get this to work on a 16gb machine and all of the comments on a github issue mentioning it were saying it needs at least 32 or more.<p>&#x2F;edit this was with llama cpp though not ollama
评论 #38670230 未加载
评论 #38670285 未加载
georgel超过 1 年前
What is the performance? I didn&#x27;t see any benchmarks listed.
chestertn超过 1 年前
I’m sure this question has been asked plenty of times but what is the best place to start with LLMs (no theory, just installing them locally, fine tuning them, etc)
评论 #38671375 未加载
nextaccountic超过 1 年前
Does this run on CPU or GPU?<p>Would it be advantageous to divide the load and run on both?
评论 #38670756 未加载
rattray超过 1 年前
Is there a llamafile?
dangero超过 1 年前
Aside from LM Studio there&#x27;s also Faraday <a href="https:&#x2F;&#x2F;faraday.dev&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;faraday.dev&#x2F;</a>
dankle超过 1 年前
LM Studio is even easier.
评论 #38671026 未加载
dixie_land超过 1 年前
I find it hard to trust a tutorial that tells me forking curl is an exemplary way to interact with a local restful API in Python.
评论 #38670306 未加载
评论 #38672394 未加载
评论 #38672195 未加载
评论 #38670735 未加载
评论 #38670268 未加载