TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

MK-1

288 点作者 ejz将近 2 年前

15 条评论

lolinder将近 2 年前
It&#x27;s weird that not once do they mention or compare their results to the already-available quantization methods. I normally try to give benefit of the doubt, but there&#x27;s really no way they&#x27;re not aware that there are already widely used techniques for accomplishing this same thing, so the comparison benchmarks <i>really</i> should be there.<p>To fill in the gap, here&#x27;s llama.cpp&#x27;s comparison chart[0] for the different quantizations available for Llama 1. We can&#x27;t compare directly with their Llama 2 metrics, but just comparing the percent change in speed and perplexity, MK-1 looks very similar to Q5_1. There&#x27;s a small but not insignificant hit to perplexity, and a just over 2x speedup.<p>If these numbers are accurate, you can download pre-quantized Llama 2 models from Hugging Face that will perform essentially the same as what MK-1 is offering, with the Q5 files here: <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;TheBloke&#x2F;Llama-2-13B-GGML&#x2F;tree&#x2F;main" rel="nofollow noreferrer">https:&#x2F;&#x2F;huggingface.co&#x2F;TheBloke&#x2F;Llama-2-13B-GGML&#x2F;tree&#x2F;main</a><p>[0] <a href="https:&#x2F;&#x2F;github.com&#x2F;ggerganov&#x2F;llama.cpp#quantization">https:&#x2F;&#x2F;github.com&#x2F;ggerganov&#x2F;llama.cpp#quantization</a>
评论 #37018989 未加载
评论 #37016971 未加载
评论 #37018150 未加载
评论 #37016807 未加载
评论 #37018181 未加载
评论 #37017888 未加载
xianshou将近 2 年前
Not a single mention of existing quantization techniques? Ten bucks says this is just a wrapper around bitsandbytes or ggml.
lyapunova将近 2 年前
I don&#x27;t think I can use this if it&#x27;s not open source... sorry.<p>The field moves too fast and the convenience is just not there otherwise.<p>edit: also the branding makes me think of MK-ultra which is probably something to avoid
Scene_Cast2将近 2 年前
I&#x27;ve worked on ML model quantization. The open source 4-bit or 8-bit quantization isn&#x27;t as good as one can get - there are much fancier techniques to keep predictive performance while squeezing size.<p>Some techniques (like quantization-aware training) involve changes to training.
评论 #37016957 未加载
评论 #37016982 未加载
rvz将近 2 年前
Another AI startup grift, using GGML and closing it up to beg for VC cash.<p>Yet another AI wrapper company doing the same thing and jumping on the LLM hype train before it dries up.<p>If it is not open source and it is closed, it is immediately dead in the water.
Philpax将近 2 年前
...isn&#x27;t this just quantization?
评论 #37017008 未加载
评论 #37016722 未加载
评论 #37016644 未加载
metadat将近 2 年前
Too bad it&#x27;s not an open source effort.<p>I&#x27;m not a fan of proprietary dependencies in my stack, full stop.
评论 #37016869 未加载
modeless将近 2 年前
How does this compare to mlc-llm with 4 bit quantization? It runs llama2 13B incredibly fast on my 4090. Multiples of the speed of llama.cpp even on GPU with the same 4 bit quantization.
评论 #37018436 未加载
评论 #37017249 未加载
radicaldreamer将近 2 年前
You can do this stuff on a MacBook Pro these days... not sure why you&#x27;d want to be locked into another vendor here. Either use the best (OpenAI, Anthropic) or just roll your own.
hardwaresofton将近 2 年前
Is this the true effect of Ultra Instinct^H^H Llama2?<p>Facebook is effectively supercharging the ecosystems and tool builders and <i>smaller</i> inference services.<p>This company had access to a credible, popular model (with an actual OSS license), and the relevant weights so they could optimize on it and sell the optimization without worrying about the license&#x2F;restrictions on the weights themselves.
ipsum2将近 2 年前
Isn&#x27;t FasterTransformer (NVidia, OSS) and text-generation-inference (Huggingface, not OSS) are faster than this?
pestatije将近 2 年前
&gt; Today, we’re announcing our first product, MKML. MKML is a software package that can reduce LLM inference costs on GPUs by 2x with just a few lines of Python code. And it is plug and play with popular ecosystems like Hugging Face and PyTorch
评论 #37016827 未加载
ushakov将近 2 年前
This seems more like a VC Pitchdeck rather than a technical paper explaining why their approach is better
drtournier将近 2 年前
MKML == abstractions and wrappers for GGML?
mugivarra69将近 2 年前
we have no moat and so does not openai is getting better by time.