Llama 3.2: Revolutionizing edge AI and vision with open, customizable models

924 pointsby nmwnmw8 months ago

46 comments

simonw8 months ago

I'm absolutely amazed at how capable the new 1B model is, considering it's just a 1.3GB download (for the Ollama GGUF version).I tried running a full codebase through it (since it can handle 128,000 tokens) and asking it to summarize the code - it did a surprisingly decent job, incomplete but still unbelievable for a model that tiny: <a href="https://gist.github.com/simonw/64c5f5b111fe473999144932bef4218b" rel="nofollow">https://gist.github.com/simonw/64c5f5b111fe473999144932bef42...</a>More of my notes here: <a href="https://simonwillison.net/2024/Sep/25/llama-32/" rel="nofollow">https://simonwillison.net/2024/Sep/25/llama-32/</a>I've been trying out the larger image models to using the versions hosted on <a href="https://lmarena.ai/" rel="nofollow">https://lmarena.ai/</a> - navigate to "Direct Chat" and you can select them from the dropdown and upload images to run prompts.

评论 #41652426 未加载

评论 #41656351 未加载

评论 #41653209 未加载

评论 #41666200 未加载

评论 #41653531 未加载

评论 #41654106 未加载

评论 #41652621 未加载

评论 #41659234 未加载

opdahl8 months ago

I'm blown away with just how open the Llama team at Meta is. It is nice to see that they are not only giving access to the models, but they at the same time are open about how they built them. I don't know how the future is going to go in the terms of models, but I sure am grateful that Meta has taken this position, and are pushing more openness.

评论 #41657103 未加载

评论 #41655873 未加载

评论 #41658148 未加载

评论 #41653536 未加载

评论 #41657459 未加载

评论 #41655857 未加载

评论 #41662693 未加载

评论 #41654664 未加载

评论 #41652062 未加载

评论 #41657008 未加载

评论 #41654443 未加载

a_wild_dandan8 months ago

"The Llama jumped over the ______!" (Fence? River? Wall? Synagogue?)With 1-hot encoding, the answer is "wall", with 100% probability. Oh, you gave plausibility to "fence" too? WRONG! ENJOY MORE PENALTY, SCRUB!I believe this unforgiving dynamic is why model distillation works well. The original teacher model had to learn via the "hot or cold" game on text answers. But when the child instead imitates the teacher's predictions, it learns semantically rich answers. That strikes me as vastly more compute-efficient. So to me, it makes sense why these Llama 3.2 edge models punch so far above their weight(s). But it still blows my mind thinking how far models have advanced from a year or two ago. Kudos to Meta for these releases.

评论 #41653235 未加载

评论 #41661149 未加载

评论 #41653425 未加载

评论 #41660659 未加载

评论 #41660902 未加载

评论 #41655611 未加载

alanzhuly8 months ago

Llama3.2 3B feels a lot better than other models with same size (e.g. Gemma2, Phi3.5-mini models).For anyone looking for a simple way to test Llama3.2 3B locally with UI, Install nexa-sdk(<a href="https://github.com/NexaAI/nexa-sdk">https://github.com/NexaAI/nexa-sdk</a>) and type in terminal:nexa run llama3.2 --streamlitDisclaimer: I am from Nexa AI and nexa-sdk is an open-sourced. We'd love your feedback.

评论 #41655185 未加载

评论 #41721033 未加载

评论 #41660648 未加载

freedomben8 months ago

If anyone else is looking for the bigger models on ollama and wondering where they are, the Ollama blog post answered that for me. The are "coming soon" so they just aren't ready quite yet[1]. I was a little worried when I couldn't find them but sounds like we just need to be patient.[1]: <a href="https://ollama.com/blog/llama3.2">https://ollama.com/blog/llama3.2</a>

评论 #41654922 未加载

评论 #41653654 未加载

评论 #41654051 未加载

moffkalast8 months ago

I've just tested the 1B and 3B at Q8, some interesting bits:- The 1B is extremely coherent (feels something like maybe Mistral 7B at 4 bits), and with flash attention and 4 bit KV cache it only uses about 4.2 GB of VRAM for 128k context- A Pi 5 runs the 1B at 8.4 tok/s, haven't tested the 3B yet but it might need a lower quant to fit it and with 9T training tokens it'll probably degrade pretty badly- The 3B is a certified Gemma-2-2B killerGiven that llama.cpp doesn't support any multimodality (they removed the old implementation), it might be a while before the 11B and 90B become runnable. Doesn't seem like they outperform Qwen-2-VL at vision benchmarks though.

评论 #41652242 未加载

dhbradshaw8 months ago

Tried out 3B on ollama, asking questions in optics, bio, and rust.It's super fast with a lot of knowledge, a large context and great understanding. Really impressive model.

评论 #41652340 未加载

评论 #41652786 未加载

kingkongjaffa8 months ago

llama3.2:3b-instruct-q8_0 is performing better than 3.1 8b-q4 on my macbookpro M1. It's faster and the results are better. It answered a few riddles and thought experiments better despite being 3b vs 8b.I just removed my install of 3.1-8b.my ollama list is currently:$ ollama listNAME ID SIZE MODIFIEDllama3.2:3b-instruct-q8_0 e410b836fe61 3.4 GB 2 hours agogemma2:9b-instruct-q4_1 5bfc4cf059e2 6.0 GB 3 days agophi3.5:3.8b-mini-instruct-q8_0 8b50e8e1e216 4.1 GB 3 days agomxbai-embed-large:latest 468836162de7 669 MB 3 months ago

评论 #41654926 未加载

评论 #41652676 未加载

评论 #41654937 未加载

kgeist8 months ago

Tried the 1B model with the "think step by step" prompt.It gets "which is larger: 9.11 or 9.9?" right if it manages to mention that decimals need to be compared first in its step-by-step thinking. If it skips mentioning decimals, then it says 9.11 is larger.It gets the strawberry question wrong even after enumerating all the letters correctly, probably because it can't properly count.

评论 #41661555 未加载

评论 #41655797 未加载

评论 #41657036 未加载

评论 #41660879 未加载

JohnHammersley8 months ago

Ollama post: <a href="https://ollama.com/blog/llama3.2">https://ollama.com/blog/llama3.2</a>

getcrunk8 months ago

Still no 14/30b parameter models since llama 2. Seriously killing real usability for power users/diy.The 7/8B models are great for poc and moving to edge for minor use cases … but there’s a big and empty gap till 70b that most people can’t run.The tin foil hat in me is saying this is the compromise the powers that be have agreed too. Basically being “open” but practically gimped for average joe techie. Basically arms control

评论 #41652822 未加载

评论 #41652171 未加载

评论 #41652593 未加载

arnaudsm8 months ago

Is there an up-to-date leaderboard with multiple LLM benchmarks?Livebench and Lmsys are weeks behind and sometimes refuse to add some major models. And press releases like this cherry pick their benchmarks and ignore better models like qwen2.5.If it doesn't exist I'm willing to create it

评论 #41662335 未加载

gdiamos8 months ago

Llama 3.2 includes a 1B parameter model. This should be 8x higher throughput for data pipelines. In our experience, smaller models are just fine for simple tasks like reading paragraphs from PDF documents.

评论 #41652158 未加载

kombine8 months ago

Are these models suitable for Code assistance - as an alternative to Cursor or Copilot?

评论 #41654651 未加载

Ey7NFZ3P0nzAe8 months ago

Interesting that its scores are somewhat helow Pixtral 12B <a href="https://mistral.ai/news/pixtral-12b/" rel="nofollow">https://mistral.ai/news/pixtral-12b/</a>

gunalx8 months ago

3b was pretty good at multimodal (Norwegian) still a lot of gibberish at times, and way more sensitive than 8b but more usable than Gemma 2 2b at multi modal, fine at my python list sorter with args standard question. But 90b vision just refuses all my actually useful tasks like helping recreate the images in html or do anything useful with the image data other than describing it. Have not gotten as stuck with 70b or openai before. Insane amount of refusals all the time.

resters8 months ago

This is great! Does anyone know if the llama models are trained to do function calling like openAI models are? And/or are there any function calling training datasets?

评论 #41652070 未加载

评论 #41652076 未加载

评论 #41652088 未加载

l5870uoo9y8 months ago

> These models are enabled on day one for Qualcomm and MediaTek hardware and optimized for Arm processors.Do they require GPU or can they be deployed on VPS with dedicated CPU?

评论 #41658088 未加载

chriskanan8 months ago

The assessments of visual capability really need to be more robust. They are still using datasets like VQAv2, which while providing some insight, have many issues. There are many newer datasets that serve as much more robust tests and that are less prone to being affected by linguistic bias.I'd like to see more head-to-head comparisons with community created multi-modal LLMs as done in these papers:<a href="https://arxiv.org/abs/2408.05334" rel="nofollow">https://arxiv.org/abs/2408.05334</a><a href="https://arxiv.org/abs/2408.03326" rel="nofollow">https://arxiv.org/abs/2408.03326</a>I look forward to reading the technical report, once its available. I couldn't find a link to one, yet.

评论 #41659826 未加载

sgt8 months ago

Anyone on HN running models on their own local machines, like smaller Llama models or such? Or something else?

评论 #41662305 未加载

评论 #41657957 未加载

404mm8 months ago

Can anyone recommend a webUI client for ollama?

评论 #41655272 未加载

评论 #41653138 未加载

评论 #41655153 未加载

评论 #41653201 未加载

xrd8 months ago

I'm currently fighting with a fastapi python app deployed to render. It's interesting because I'm struggling to see how I encode the image and send it using curl. Their example sends directly from the browser and uses a data uri.But, this is relevant because I'm curious how this new model allows image inputs. Do you paste a base64 image into the prompt?It feels like these models can start not only providing the text generation backend, but start to replace the infrastructure for the API as well.Can you input images without something in front of it like openwebui?

josephernest8 months ago

Can it run with llama-cpp-python? If so, where can we find and download the gguf files? Are they distributed directly by meta, or are they converted to gguf format by third parties?

thimabi8 months ago

Does anyone know how these models fare in terms of multilingual real-world usage? I’ve used previous iterations of llama models and they all seemed to be lacking in that regard.

aussieguy12348 months ago

When using meta.ai, its able to generate images as well as understand them. Has this also been open sourced or just a GPT4o style ability to see images?

desireco428 months ago

I have to say that running this model locally I was pleasantly suprised how well it ran, it doesn't use as much resources and produce decent output, comparable to ChatGPT, it is not quite as OpenAI but for a lot of tasks, since it doesn't burden the computer, it can be used with local model.Next I want to try to use Aider with it and see how this would work.

GaggiX8 months ago

The 90B seem to perform pretty weak on visual tasks compare to Qwen2-VL-72B: <a href="https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct" rel="nofollow">https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct</a>, or am I missing something?

notpublic8 months ago

Llama-3.2-11B-Vision-Instruct does an excellent job extracting/answering questions from screenshots. It is even able to answer questions based on information buried inside a flowchart. How is this even possible??

评论 #41655259 未加载

评论 #41654730 未加载

评论 #41660629 未加载

评论 #41654668 未加载

sk110018 months ago

Can one of thse models be run on a single machine? What specs do you need?

评论 #41652249 未加载

评论 #41656538 未加载

bombi8 months ago

Is Termux enough to run the 1B model on Android?

评论 #41654409 未加载

oulipo8 months ago

Can the 3B run on a M1 macbook? It seems that it hogs all the memory. The 1B runs fine

评论 #41655084 未加载

bottlepalm8 months ago

What mobile devices can the smaller models run on? iPhone, Android?

评论 #41653793 未加载

troupo8 months ago

It's hallucinating so badly, it's kinda hilariousLiterally everything about the quote below is wrong. The problem with these "amazing ground-breaking models" is that you never know which knowledge they actually possess.--- start quote --->>> what is the last version of Ash Framework for Elixir you know My training data goes up to December 2023, and according to the Ash Framework releases, the last stable version that I'm familiar with is:* *Ash 2.14*: Released on April 8, 2021.However, it's worth noting that Ash has been merged into Elixir itself, and most of its features have been integrated into the Elixir language and standard libraries.In fact, Ash was one of the first frameworks to be included in Elixir, and it played a significant role in shaping the language. Many of Ash's core concepts, such as `defstruct/1` and `defmodule/1`, were later adopted by Elixir itself....Keep in mind that Ash is no longer actively maintained as a separate framework, but its legacy continues to influence Elixir and its community.--- end quote ---

评论 #41660895 未加载

gdiamos8 months ago

Do inference frameworks like vllm support vision?

评论 #41652236 未加载

评论 #41657116 未加载

stogot8 months ago

Surprised no mention of audio?

评论 #41660441 未加载

ofermend8 months ago

Great release. Models just added to Hallucination Leaderboard: <a href="https://github.com/vectara/hallucination-leaderboard">https://github.com/vectara/hallucination-leaderboard</a>.TL;DR: * 90B-Vision: 4.3% hallucination rate * 11B-Vision: 5.5% hallucination rate

dharma18 months ago

are these better than qwen at codegen?

taytus8 months ago

meta.ai still running on 3.1

84adam8 months ago

excited for this

sva_8 months ago

Curious about the multimodal model's architecture. But alas, when I try to request access> Llama 3.2 Multimodal is not available in your region.It sounds like they input the continuous output of an image encoder into a transformer, similar to transfusion[0]? Does someone know where to find more details?Edit:> Regarding the licensing terms, Llama 3.2 comes with a very similar license to Llama 3.1, with one key difference in the acceptable use policy: any individual domiciled in, or a company with a principal place of business in, the European Union is not being granted the license rights to use multimodal models included in Llama 3.2. [1]What a bummer.0. <a href="https://www.arxiv.org/abs/2408.11039" rel="nofollow">https://www.arxiv.org/abs/2408.11039</a>1. <a href="https://huggingface.co/blog/llama32#llama-32-license-changes-sorry-eu-" rel="nofollow">https://huggingface.co/blog/llama32#llama-32-license-changes...</a>

评论 #41652354 未加载

评论 #41652225 未加载

评论 #41652328 未加载

评论 #41652151 未加载

评论 #41652174 未加载

评论 #41652212 未加载

评论 #41652465 未加载

评论 #41652395 未加载

minimaxir8 months ago

Off topic/meta, but the Llama 3.2 news topic received many, many HN submissions and upvotes but never made it to the front page: the fact that it's on the front page now indicates that moderators intervened to rescue it: <a href="https://news.ycombinator.com/from?site=meta.com">https://news.ycombinator.com/from?site=meta.com</a> (showdead on)If there's an algorithmic penalty against the news for whatever reason, that may be a flaw in the HN ranking algorithm.

评论 #41652099 未加载

评论 #41655289 未加载

评论 #41652443 未加载

nmwnmw8 months ago

- Llama 3.2 introduces small vision LLMs (11B and 90B parameters) and lightweight text-only models (1B and 3B) for edge/mobile devices, with the smaller models supporting 128K token context.- The 11B and 90B vision models are competitive with leading closed models like Claude 3 Haiku on image understanding tasks, while being open and customizable.- Llama 3.2 comes with official Llama Stack distributions to simplify deployment across environments (cloud, on-prem, edge), including support for RAG and safety features.- The lightweight 1B and 3B models are optimized for on-device use cases like summarization and instruction following.

评论 #41652469 未加载

monkfish3288 months ago

Zuckerberg has never liked having Android/iOs as gatekeepers i.e. "platforms" for his apps.He's hoping to control AI as the next platform through which users interact with apps. Free AI is then fine if the surplus value created by not having a gatekeeper to his apps exceeds the cost of the free AI.That's the strategy. No values here - just strategy folks.

评论 #41658215 未加载

评论 #41660531 未加载

评论 #41660645 未加载

TheAceOfHearts8 months ago

I still can't access the hosted model at meta.ai from Puerto Rico, despite us being U.S. citizens. I don't know what Meta has against us.Could someone try giving the 90b model this word search problem [0] and tell me how it performs? So far with every model I've tried, none has ever managed to find a single word correctly.[0] <a href="https://imgur.com/i9Ps1v6" rel="nofollow">https://imgur.com/i9Ps1v6</a>

评论 #41652782 未加载

评论 #41652836 未加载

评论 #41652053 未加载

评论 #41650145 未加载

评论 #41652039 未加载

alexcpn8 months ago

In KungfuPanda there is this line that the Panda says "I love KungFuuuuuuuu", well I normally don't tell like this, but when I saw this and (starting to use this), I feel like yelling"I like Metaaaaa or is it LLAMMMAAA or is it Open source.. or is it this cool ecosystem which gives such value for free...

404mm8 months ago

Newbie question, what size model would be needed to have a 10x software engineer skills and no knowledge of the human kind (ie, no need to know how to make a pizza or sequence your DNA). Is there such a model?

评论 #41653199 未加载

评论 #41653434 未加载

评论 #41653319 未加载

评论 #41656377 未加载

评论 #41653475 未加载

评论 #41660706 未加载

46 comments

simonw8 months ago

评论 #41652426 未加载

评论 #41656351 未加载

评论 #41653209 未加载

评论 #41666200 未加载

评论 #41653531 未加载

评论 #41654106 未加载

评论 #41652621 未加载

评论 #41659234 未加载

opdahl8 months ago

评论 #41657103 未加载

评论 #41655873 未加载

评论 #41658148 未加载

评论 #41653536 未加载

评论 #41657459 未加载

评论 #41655857 未加载

评论 #41662693 未加载

评论 #41654664 未加载

评论 #41652062 未加载

评论 #41657008 未加载

评论 #41654443 未加载

a_wild_dandan8 months ago

评论 #41653235 未加载

评论 #41661149 未加载

评论 #41653425 未加载

评论 #41660659 未加载

评论 #41660902 未加载

评论 #41655611 未加载

alanzhuly8 months ago

评论 #41655185 未加载

评论 #41721033 未加载

评论 #41660648 未加载

freedomben8 months ago

评论 #41654922 未加载

评论 #41653654 未加载

评论 #41654051 未加载

moffkalast8 months ago

评论 #41652242 未加载

dhbradshaw8 months ago

Tried out 3B on ollama, asking questions in optics, bio, and rust.It's super fast with a lot of knowledge, a large context and great understanding. Really impressive model.

评论 #41652340 未加载

评论 #41652786 未加载

kingkongjaffa8 months ago

评论 #41654926 未加载

评论 #41652676 未加载

评论 #41654937 未加载

kgeist8 months ago

评论 #41661555 未加载

评论 #41655797 未加载

评论 #41657036 未加载

评论 #41660879 未加载

JohnHammersley8 months ago

Ollama post: <a href="https://ollama.com/blog/llama3.2">https://ollama.com/blog/llama3.2</a>

getcrunk8 months ago

评论 #41652822 未加载

评论 #41652171 未加载

评论 #41652593 未加载

arnaudsm8 months ago

评论 #41662335 未加载

gdiamos8 months ago

评论 #41652158 未加载

kombine8 months ago

Are these models suitable for Code assistance - as an alternative to Cursor or Copilot?

评论 #41654651 未加载

Ey7NFZ3P0nzAe8 months ago

Interesting that its scores are somewhat helow Pixtral 12B <a href="https://mistral.ai/news/pixtral-12b/" rel="nofollow">https://mistral.ai/news/pixtral-12b/</a>

gunalx8 months ago

resters8 months ago

This is great! Does anyone know if the llama models are trained to do function calling like openAI models are? And/or are there any function calling training datasets?

评论 #41652070 未加载

评论 #41652076 未加载

评论 #41652088 未加载

l5870uoo9y8 months ago

> These models are enabled on day one for Qualcomm and MediaTek hardware and optimized for Arm processors.Do they require GPU or can they be deployed on VPS with dedicated CPU?

评论 #41658088 未加载

chriskanan8 months ago

评论 #41659826 未加载

sgt8 months ago

Anyone on HN running models on their own local machines, like smaller Llama models or such? Or something else?

评论 #41662305 未加载

评论 #41657957 未加载

404mm8 months ago

Can anyone recommend a webUI client for ollama?

评论 #41655272 未加载

评论 #41653138 未加载

评论 #41655153 未加载

评论 #41653201 未加载

xrd8 months ago

josephernest8 months ago

Can it run with llama-cpp-python? If so, where can we find and download the gguf files? Are they distributed directly by meta, or are they converted to gguf format by third parties?

thimabi8 months ago

Does anyone know how these models fare in terms of multilingual real-world usage? I’ve used previous iterations of llama models and they all seemed to be lacking in that regard.

aussieguy12348 months ago

When using meta.ai, its able to generate images as well as understand them. Has this also been open sourced or just a GPT4o style ability to see images?

desireco428 months ago

GaggiX8 months ago

notpublic8 months ago

评论 #41655259 未加载

评论 #41654730 未加载

评论 #41660629 未加载

评论 #41654668 未加载

sk110018 months ago

Can one of thse models be run on a single machine? What specs do you need?

评论 #41652249 未加载

评论 #41656538 未加载

bombi8 months ago

Is Termux enough to run the 1B model on Android?

评论 #41654409 未加载

oulipo8 months ago

Can the 3B run on a M1 macbook? It seems that it hogs all the memory. The 1B runs fine