科技回声

15 条评论

lolinder将近 2 年前

It's weird that not once do they mention or compare their results to the already-available quantization methods. I normally try to give benefit of the doubt, but there's really no way they're not aware that there are already widely used techniques for accomplishing this same thing, so the comparison benchmarks really should be there.To fill in the gap, here's llama.cpp's comparison chart[0] for the different quantizations available for Llama 1. We can't compare directly with their Llama 2 metrics, but just comparing the percent change in speed and perplexity, MK-1 looks very similar to Q5_1. There's a small but not insignificant hit to perplexity, and a just over 2x speedup.If these numbers are accurate, you can download pre-quantized Llama 2 models from Hugging Face that will perform essentially the same as what MK-1 is offering, with the Q5 files here: <a href="https://huggingface.co/TheBloke/Llama-2-13B-GGML/tree/main" rel="nofollow noreferrer">https://huggingface.co/TheBloke/Llama-2-13B-GGML/tree/main</a>[0] <a href="https://github.com/ggerganov/llama.cpp#quantization">https://github.com/ggerganov/llama.cpp#quantization</a>

评论 #37018989 未加载

评论 #37016971 未加载

评论 #37018150 未加载

评论 #37016807 未加载

评论 #37018181 未加载

评论 #37017888 未加载

xianshou将近 2 年前

Not a single mention of existing quantization techniques? Ten bucks says this is just a wrapper around bitsandbytes or ggml.

lyapunova将近 2 年前

I don't think I can use this if it's not open source... sorry.The field moves too fast and the convenience is just not there otherwise.edit: also the branding makes me think of MK-ultra which is probably something to avoid

Scene_Cast2将近 2 年前

I've worked on ML model quantization. The open source 4-bit or 8-bit quantization isn't as good as one can get - there are much fancier techniques to keep predictive performance while squeezing size.Some techniques (like quantization-aware training) involve changes to training.

评论 #37016957 未加载

评论 #37016982 未加载

rvz将近 2 年前

Another AI startup grift, using GGML and closing it up to beg for VC cash.Yet another AI wrapper company doing the same thing and jumping on the LLM hype train before it dries up.If it is not open source and it is closed, it is immediately dead in the water.

Philpax将近 2 年前

...isn't this just quantization?

评论 #37017008 未加载

评论 #37016722 未加载

评论 #37016644 未加载

metadat将近 2 年前

Too bad it's not an open source effort.I'm not a fan of proprietary dependencies in my stack, full stop.

评论 #37016869 未加载

modeless将近 2 年前

How does this compare to mlc-llm with 4 bit quantization? It runs llama2 13B incredibly fast on my 4090. Multiples of the speed of llama.cpp even on GPU with the same 4 bit quantization.

评论 #37018436 未加载

评论 #37017249 未加载

radicaldreamer将近 2 年前

You can do this stuff on a MacBook Pro these days... not sure why you'd want to be locked into another vendor here. Either use the best (OpenAI, Anthropic) or just roll your own.

hardwaresofton将近 2 年前

Is this the true effect of Ultra Instinct^H^H Llama2?Facebook is effectively supercharging the ecosystems and tool builders and smaller inference services.This company had access to a credible, popular model (with an actual OSS license), and the relevant weights so they could optimize on it and sell the optimization without worrying about the license/restrictions on the weights themselves.

ipsum2将近 2 年前

Isn't FasterTransformer (NVidia, OSS) and text-generation-inference (Huggingface, not OSS) are faster than this?

pestatije将近 2 年前

> Today, we’re announcing our first product, MKML. MKML is a software package that can reduce LLM inference costs on GPUs by 2x with just a few lines of Python code. And it is plug and play with popular ecosystems like Hugging Face and PyTorch

评论 #37016827 未加载

ushakov将近 2 年前

This seems more like a VC Pitchdeck rather than a technical paper explaining why their approach is better

drtournier将近 2 年前

MKML == abstractions and wrappers for GGML?

mugivarra69将近 2 年前

we have no moat and so does not openai is getting better by time.