科技回声

11 条评论

naillo大约 2 年前

This is really cool but the output is such garbage at that weight size that you might as well be running a markov chain.

评论 #35173362 未加载

评论 #35173392 未加载

评论 #35176036 未加载

评论 #35172290 未加载

评论 #35175587 未加载

评论 #35171416 未加载

superkuh大约 2 年前

Until it thermally throttles 40 seconds later. But yeah, it's really cool how many platforms the vanilla code in llama.cpp can be easily compiled on. And somehow I doubt they did the quantization step on the Pixel itself. My favorite was the person who did it on the rpi4. I know a guy working on getting it going on rpi3 but the ARM7/8 mixing , NEON support, and 64 bit ARM intrinsics are apparently non-trivial to convert.

评论 #35172723 未加载

评论 #35171566 未加载

beiller大约 2 年前

Here is a thread to tweak the parameters which the model seems very sensitive to:<a href="https://github.com/ggerganov/llama.cpp/issues/129">https://github.com/ggerganov/llama.cpp/issues/129</a>

评论 #35174157 未加载

OscarCunningham大约 2 年前

This would be useful for predictive text. That's exactly what LLMs are actually built for.

评论 #35174050 未加载

__mharrison__大约 2 年前

I'm waiting until it runs on my C64...

评论 #35171840 未加载

tosh大约 2 年前

Did anyone get this to run on an iPhone or in a browser yet?

评论 #35171914 未加载

syntaxing大约 2 年前

Does this in theory mean it should be relatively easy to port to coral TPU?

评论 #35172323 未加载

评论 #35171947 未加载

nshm大约 2 年前

It is not really llama, it is llama quantized to 4bit. Not even the quality of original 7B. I could also quantize it to 1 bit and claim it runs on my RPI3.

评论 #35172133 未加载

评论 #35172748 未加载

评论 #35172120 未加载

评论 #35172426 未加载

Havoc大约 2 年前

Any more details? I'm guessing they're leveraging the NPU in the pixel?

评论 #35171703 未加载

评论 #35171725 未加载

snapplebobapple大约 2 年前

So is this finally peak hipster coder and from this point on rust will diminish because all the cool kids start switching to zag?

a-dub大约 2 年前

would be even cooler if it employed the accelerator!(unless this ggml library is doing that under the hood)i assume it has unified memory, but maybe not little numbers...

11 条评论

naillo大约 2 年前

This is really cool but the output is such garbage at that weight size that you might as well be running a markov chain.

评论 #35173362 未加载

评论 #35173392 未加载

评论 #35176036 未加载

评论 #35172290 未加载

评论 #35175587 未加载

评论 #35171416 未加载

superkuh大约 2 年前

评论 #35172723 未加载

评论 #35171566 未加载

beiller大约 2 年前

Here is a thread to tweak the parameters which the model seems very sensitive to:<a href="https://github.com/ggerganov/llama.cpp/issues/129">https://github.com/ggerganov/llama.cpp/issues/129</a>

评论 #35174157 未加载

OscarCunningham大约 2 年前

This would be useful for predictive text. That's exactly what LLMs are actually built for.

评论 #35174050 未加载

__mharrison__大约 2 年前

I'm waiting until it runs on my C64...

评论 #35171840 未加载

tosh大约 2 年前

Did anyone get this to run on an iPhone or in a browser yet?

评论 #35171914 未加载

syntaxing大约 2 年前

Does this in theory mean it should be relatively easy to port to coral TPU?

评论 #35172323 未加载

评论 #35171947 未加载

nshm大约 2 年前

It is not really llama, it is llama quantized to 4bit. Not even the quality of original 7B. I could also quantize it to 1 bit and claim it runs on my RPI3.

评论 #35172133 未加载

评论 #35172748 未加载

评论 #35172120 未加载

评论 #35172426 未加载

Havoc大约 2 年前

Any more details? I'm guessing they're leveraging the NPU in the pixel?

评论 #35171703 未加载

评论 #35171725 未加载

snapplebobapple大约 2 年前

So is this finally peak hipster coder and from this point on rust will diminish because all the cool kids start switching to zag?

a-dub大约 2 年前

would be even cooler if it employed the accelerator!(unless this ggml library is doing that under the hood)i assume it has unified memory, but maybe not little numbers...

LLaMa running at 5 tokens/second on a Pixel 6

11 条评论

LLaMa running at 5 tokens/second on a Pixel 6

11 条评论