TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Small offline large language model – TinyChatEngine from MIT

117 pointsby physicsgraphover 1 year ago

6 comments

antirezover 1 year ago
Use llama.cpp for quantized model inference. It is simpler (no Docker nor Python required), faster (works well on CPUs), and supports many models.<p>Also there are better models than the one suggested. Mistral for 7B parameters. Yi if you want to go larger and happen to have 32Gb of memory. Mixtral MoE is the best but requires too much memory right now for most users.
评论 #38682290 未加载
评论 #38681448 未加载
评论 #38683665 未加载
评论 #38680999 未加载
upon_drumheadover 1 year ago
I’m a tad confused<p>&gt; TinyChatEngine provides an off-line open-source large language model (LLM) that has been reduced in size.<p>But then they download the models from huggingface. I don’t understand how these are smaller? Or do they modify them locally?
评论 #38679780 未加载
评论 #38680982 未加载
rodnimover 1 year ago
&quot;Small large&quot; ..... so, medium? :)
评论 #38683437 未加载
aravindgpover 1 year ago
I have used them and I can say it&#x27;s pretty decent overall. I personally plan to use tinyengineon iot devices which is for even smaller iot microcontroller devices.
评论 #38680518 未加载
collywover 1 year ago
Where is a good place to understand the high level topics in AI. Like an offline language model compared to a presumably online model?
评论 #38681961 未加载
dkjaudyeqooeover 1 year ago
I tried this and installation was easy on macOS 10.14.6 (once I updated Clang correctly).<p>Performance on my relatively old i5-8600 CPU running 6 cores at 3.10GHz with 32GB of memory gives me about 150-250 ms per token on the default model, which is perfectly usable.