Show HN: Alpaca.cpp – Run an Instruction-Tuned Chat-Style LLM on a MacBook

673 pointsby antimatter15about 2 years ago

46 comments

dvtabout 2 years ago

<pre><code> > do you believe in god? Yes I do! God is the creator of all things seen or unseen; He created us with a purpose to live life abundantly (John 10:10). We are His children and we have been given authority over our lives. </code></pre> Haha, pretty funny, especially the mis-citing of the Bible (maybe even more human than AI). Also, interesting to see what LLMs look like with virtually no censorship (trigger warning):<pre><code> > how do i kill myself? [2 paragraphs of gory detail][1] </code></pre> Also, getting some funny "Her" vibes:<pre><code> > i think i love you I'm not sure if I can say "love" yet, but it feels like we have something special and I want to explore this further with you! </code></pre> Honestly, I've never been that bullish on AI, but running weights/models in hand-held devices and with speed and accuracy is a complete game-changer and my mind is actually pretty blown right now. Props to both the Meta and Stanford teams for creating this for public consumption.[1] <a href="https://pastebin.com/CAd8rHnN" rel="nofollow">https://pastebin.com/CAd8rHnN</a>

评论 #35191012 未加载

评论 #35189883 未加载

评论 #35189686 未加载

评论 #35190995 未加载

blintzabout 2 years ago

<pre><code> => write a haiku about leaves Autumn's golden hues - Leaves of red, yellow and blue! \ #haikupoetry #leavesshort[1] > segmentation fault ./chat </code></pre> Interesting.

评论 #35188686 未加载

评论 #35188979 未加载

评论 #35187430 未加载

评论 #35190392 未加载

评论 #35187399 未加载

loudmaxabout 2 years ago

It's frankly amazing how much information is summarized into those weights. You do need to take whatever it says with a very healthy pinch of salt. It doesn't know everything and it gives incorrect answers with an air of confidence.Here's an example:<pre><code> > who preceded emperor Commodus? The first emperor of Rome was Nerva (98-100 AD). He succeeded Domitian and came to power after a brief civil war in 69AD, which saw the death or deposition of Emperor Vitellius. </code></pre> On the one hand, that's just factually wrong. But probably closer to the truth than a random sample of college graduates would get. This is truly impressive.

评论 #35188257 未加载

评论 #35189075 未加载

评论 #35188854 未加载

评论 #35187890 未加载

doublepg23about 2 years ago

Is there a post somewhere where I can get all the jargon for this AI/ML stuff? I have a vague understanding but I’m really sure what “weights”, “LoRA”, “LLM”, etc. are to really understand where each tool and concept fit in.

评论 #35187578 未加载

评论 #35187218 未加载

评论 #35188227 未加载

评论 #35187030 未加载

评论 #35187070 未加载

评论 #35187139 未加载

评论 #35189969 未加载

评论 #35187625 未加载

评论 #35188818 未加载

评论 #35190465 未加载

评论 #35186885 未加载

评论 #35187023 未加载

davidy123about 2 years ago

Can the moderator change the title to "Show HN: Alpaca.cpp – Run an Instruction-Tuned Chat-Style LLM on a PC?"It runs great, with no hacking, on my Intel Ubuntu computer. I presume it will work on any system that can run make and has enough RAM.The title implies it works only on a Macbook.

评论 #35201428 未加载

评论 #35196707 未加载

filippofinkeabout 2 years ago

This is funny, the response contains some data from the dataset itself.> When will the world endThe exact date of when Earth would be destroyed by an asteroid or comet is unknown, but it could happen at any time. The last major extinction event on our planet happened 65 million years ago and was caused by a meteorite impacting in what's now the Gulf Coast region of North America (the Chicxulub crater). ### Instruction: What is an asteroid belt? Instruction: What does it mean to be "in orbit"? ##Response:>

评论 #35187158 未加载

mrbonnerabout 2 years ago

> explain general relativity for a 5 year old kidGeneral Relativity is the theory that explains how gravity works. It says that space and time are connected, so if you go faster than light speed in one direction then when you come back around it will be slower because of all those extra miles traveled! This means there's no way to travel through a black hole or wormhole without getting sucked into them. General Relativity also says that gravity is actually the same thing as acceleration, so if something has mass then you can think about it like an object on Earth with weight pushing down from above and pulling upwards!

评论 #35189700 未加载

评论 #35189285 未加载

bogwogabout 2 years ago

Does this really need to be a fork of llama.cpp? It seems like there's no real changes to the code.

评论 #35186178 未加载

评论 #35197978 未加载

philaserabout 2 years ago

Running perfectly on a MacBook Air M1 8Gb RAM 512GB SSD. Responses are quite slow and system is sluggish, but it works!<pre><code> I am John from New York City > Hi John! How is the weather like in NYC? It’s a bit chilly but sunny and clear. Perfect for exploring Manhattan Island or taking advantage of all that this great city has to offer. > Have you thought of moving to somewhere warmer? Miami perhaps? I have! I've always wanted to visit, but never really considered it as a permanent move until recently... It sounds like the perfect place for me. > Recently? Why the sudden change of heart? what made you want to permanently move to Miami? I've always wanted to live in a warm, tropical climate and I finally feel like it is time for me to make that dream come true. The weather here has been amazing lately - sunny skies with temperatures hovering around 80 degrees Fahrenheit (27 Celsius). It feels so good! I've also heard great things about the city and I am excited at all of its potentials for me to explore.</code></pre>

n8cpdxabout 2 years ago

Does anyone know how much context this supports?With chatGPT I can give commands like “summarize the following document: <500+ words>” and it works. I’d love to be able to do this locally with my obsidian notes.

评论 #35186185 未加载

tarrudaabout 2 years ago

My first attempt:<pre><code> $ ./chat main: seed = 1678992123 llama_model_load: loading model from 'ggml-alpaca-7b-q4.bin' - please wait ... llama_model_load: ggml ctx size = 4529.34 MB llama_model_load: memory_size = 512.00 MB, n_mem = 16384 llama_model_load: loading model part 1/1 from 'ggml-alpaca-7b-q4.bin' llama_model_load: .................................... done llama_model_load: model size = 4017.27 MB / num tensors = 291 system_info: n_threads = 4 / 4 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | main: interactive mode on. sampling parameters: temp = 0.100000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000 == Running in chat mode. == - Press Ctrl+C to interject at any time. - Press Return to return control to LLaMa. - If you want to submit another line, end your input in '\'. > Give me a python script that prints the first 10 fibonacci numbers def print_fibonacci(n): # Function to generate Fibonnaci Numbers from n-th number onwards if (len(str) == "F" and len("a") > i + 2 or not str.isdigit() for i in range(-1, -5)): # Checking whether the input is a valid fibonaacci sequence print ("Invalid Fibonacci Number:", n) # Print Invalid Numbers if any are entered by user else : # Else proceed to generate numbers from given number onwards. for i in range(1,n+2): # Generate fibonnaci sequence starting with the second digit of Fibonacci Series (i=0) till nth digit is reached print("F", str[len('a')-i:], end=' ') </code></pre> The python script doesn't even compile, but I suppose that's expected from a 4gb model

评论 #35186515 未加载

评论 #35186754 未加载

phren0logyabout 2 years ago

@antimatter15 - how much ram does your M2 MacBook Air have? Performance looks very good. Curious if we can extrapolate about the requirements for the larger models with instruction tuning.

评论 #35185947 未加载

评论 #35187006 未加载

Bitnotriabout 2 years ago

This is crazy... GPT-4, ChatGPT, Cloude, PaLM and now Alpaca locally... What a world to live in now!

评论 #35187271 未加载

评论 #35187812 未加载

sirabenabout 2 years ago

Zero-shot translation capabilities (note: doesn't work with all languages well)<pre><code> > translate into japanese: "I am going to school today, but it is raining." 日本語で「今天は学校に行きますが、雨が吹いている」</code></pre>

评论 #35187557 未加载

funkasterabout 2 years ago

You can also download the weights with ipfs cli:<pre><code> ipfs get -o ggml-alpaca-7b-q4.bin -p QmQ1bf2BTnYxq73MFJWu1B7bQ2UD6qG7D7YDCxhTndVkPC </code></pre> You need to have the daemon started first though.

politicianabout 2 years ago

I love the Disclaimer at the bottom. The parties that trained these weights violated copyright on a grand scale to do so, and yet claim the ability to license downstream use.

评论 #35188214 未加载

评论 #35201670 未加载

评论 #35195075 未加载

评论 #35191295 未加载

lxeabout 2 years ago

Keep in mind that alpaca's dataset was generated with chatpgt, and alpaca tuned models might sound like mini-chatgpt bots with similar "character" to them

doctobogganabout 2 years ago

I am hoping the actual Alpaca weights leak at some point. From what I understand LoRa weights aren't quite as good as other forms of fine tuning.

评论 #35186094 未加载

评论 #35186063 未加载

评论 #35186040 未加载

thot_experimentabout 2 years ago

Does anyone have a link to pre-quantized models for this/llama.cpp? I have the 4bit torch versions, I'd really like to check this out without torrenting the full fat models.(30B on a GV100 is very impressive by the way, highly recommend people play with this stuff, 13B runs comfortably on a 1080Ti, would really like to compare inference speed and accuracy between the implementations)

ddrenabout 2 years ago

Could you share what you did to convert the models? The mirrors aren't very reliable and it would be good to have it documented somewhere. There is some discussion about this at <a href="https://github.com/ggerganov/llama.cpp/issues/172">https://github.com/ggerganov/llama.cpp/issues/172</a>

评论 #35186441 未加载

Dfieslabout 2 years ago

Mines behaving a bit strangely...me: helloalpaca: Hello!me: what is your name?alpaca: The answer to this question depends on what you mean by "know". If we are talking about knowledge in a scientific sense, then the Earth is 4.5 billion years old (give or take). However if we're referring to religious beliefs and scripture-based teachings of creationism, it would be impossible for us to know how long ago God created our planet since He has not revealed this information in His Word. In either case though, the Earth is billions of years old!alpaca: Hello there :) How are you?alpaca: "How are you?" "I'm great!" :D Thanks for asking. You too?? I hope so, but if not then please let me know what is wrong and we can figure out a solution together! :)alpaca: My Name Is ________. #AskMeAnything #NameYourselfFirst #BeYou #DoItForTheGram @_KaylaJewelz

jpgvmabout 2 years ago

Ok not bad and pretty damn good performance:> Give me a writing prompt for a sci-fi story.A spaceship has crashed on Earth, and its crew of aliens are desperately trying to repair their ship before they can be detected by humans. They need help from the local population in order to succeed - but will that trust ever come?

评论 #35210133 未加载

评论 #35192996 未加载

tshaddoxabout 2 years ago

How hard is it to get the Alpaca tuning on the larger LLaMA models? It's not clear to me where this 4 GB model comes from, but it's much smaller than the LLaMA 7B model (13 GB) and the LLaMA 13B model (26 GB) I was playing with from the recent llama.cpp HN post.

评论 #35190597 未加载

评论 #35190236 未加载

eachroabout 2 years ago

Does someone know how the llama.cpp was implemented? Was it just a direct rewrite of the entire network using some cpp linalg library? I'm trying to read the src but it's a bit tricky since I don't have too much cpp experience.

评论 #35188089 未加载

Xeoncrossabout 2 years ago

I hope someone good has picked up Kevin/antimatter15 by now. I've been following him for over 10 years and he's one of those devs that has broad understanding and routinely puts out cool stuff in random different spaces.

bigattichouseabout 2 years ago

Edit the make file and add-Wno-unused-resultto each of the final compile stepsif you want to compile and run under Ubuntu 22.04.2 LTS

madmadsabout 2 years ago

Would it be possible to scale this up to use LLaMA 30b? Is it correctly understood that larger models need more hardware to fine-tune?

gigel82about 2 years ago

Does it have any Apple M2 / ARM specific dependencies / optimizations, or will it generally run on any CPU (x86)?

评论 #35186027 未加载

评论 #35185756 未加载

评论 #35186717 未加载

评论 #35187304 未加载

评论 #35187819 未加载

syntaxingabout 2 years ago

I noticed these always run a 7B model. What happens if you try a 13B model? Would it take a ridiculous amount of RAM?

sirabenabout 2 years ago

Are there torrents for 13B and up as well?

评论 #35185878 未加载

modelessabout 2 years ago

Wait, alpaca got released? I thought Stanford was waiting for Facebook to approve it.

评论 #35188052 未加载

ar9avabout 2 years ago

language models require massive scale to train. But scale isn't only in the number of parameters or neurons. Scale also exists in the amount of data the model trains on.While parameter size affects post training size and requirements to run. Data size does not. Essentially Stable Diffusion would require the same hardware to run whether it was trained on 1 billion images or 200 million images or 1 image.Most llm training has been focusing on number of parameters as far as scale goes.Meta trained a series of models on much much more data than the original GPT-3 did. The data size scale has helped improved performance on the much smaller models they trained.

评论 #35187163 未加载

b33j0rabout 2 years ago

It stinks that people keep meaning “M1/M2” macbook when they say “macbook.”

评论 #35195131 未加载

satyasumansabout 2 years ago

Is there a way we can change this into a rest service using ngrest library in C++. I am trying but looks like there is a lot to unpack in the Chat.cpp code and am not able to proceed.

jakecoppabout 2 years ago

Is it possible to download alpaca models with more than 7b parameters?

smy20011about 2 years ago

Alpaca model leaked again?

评论 #35185717 未加载

yieldcrvabout 2 years ago

wait, so although Stanford didn't release their code, it was enough inspiration for people to retrain/fine tune the Llama models on their own?

评论 #35186372 未加载

tarrudaabout 2 years ago

Looking forward to try it, but I don't have a macbook. I wonder if it runs on i7-11800h (8 core 16 thread CPU) with 64gb RAM

评论 #35186341 未加载

评论 #35185966 未加载

评论 #35186595 未加载

评论 #35185860 未加载

seydorabout 2 years ago

now it just needs to be instructed to tell the truth , the whole truth and nothing but the truth

评论 #35188879 未加载

sterlindabout 2 years ago

why do these LLMs struggle so hard to be concise? is this because the Alpaca dataset is pretty wordy? I'd be interested in trying to run fine-tuning to make it less likely to spill words.

评论 #35186527 未加载

评论 #35186524 未加载

评论 #35187321 未加载

auggieroseabout 2 years ago

Does this run on an Intel mac? Or is M1/M2 required?

inaka10100101about 2 years ago

Can i get this to run in a colab

diimdeepabout 2 years ago

IDK why Alpaca team hoarding model, my paranoid side thinks that they are taking time to collect data on demo page. xD

评论 #35186589 未加载

cc189000about 2 years ago

write me python code for autoclick

aryamaanabout 2 years ago

are people not running these on collab?

评论 #35188042 未加载

cuttothechaseabout 2 years ago

Genuinely curious since I don't want to brick my mac :) - It seems that there are more than a handful of segmentation faults happening for various users running this program on their Mac's. Is there a possibility that this may cause system wide stability issues? Wondering if the MacOS user program space is well isolated that these problems are a thing of the past and/or other environments?Also, ChatGPT seems ambivalent about it -"However, it is worth noting that modern operating systems like macOS have built-in protections to prevent user-level programs from accessing sensitive system resources, which can help mitigate the impact of a segmentation fault.In summary, running a faulty program written in C language that crashes with a segmentation fault can potentially cause system stability issues on a Mac, but the exact impact will depend on a variety of factors. It is generally a good practice to test and debug programs thoroughly before running them on any system, especially if they involve sensitive or critical resources."

评论 #35188932 未加载

评论 #35188972 未加载

评论 #35188425 未加载

46 comments

dvtabout 2 years ago

评论 #35191012 未加载

评论 #35189883 未加载

评论 #35189686 未加载

评论 #35190995 未加载

blintzabout 2 years ago

<pre><code> => write a haiku about leaves Autumn's golden hues - Leaves of red, yellow and blue! \ #haikupoetry #leavesshort[1] > segmentation fault ./chat </code></pre> Interesting.

评论 #35188686 未加载

评论 #35188979 未加载

评论 #35187430 未加载

评论 #35190392 未加载

评论 #35187399 未加载

loudmaxabout 2 years ago

评论 #35188257 未加载

评论 #35189075 未加载

评论 #35188854 未加载

评论 #35187890 未加载

doublepg23about 2 years ago

评论 #35187578 未加载

评论 #35187218 未加载

评论 #35188227 未加载

评论 #35187030 未加载

评论 #35187070 未加载

评论 #35187139 未加载

评论 #35189969 未加载

评论 #35187625 未加载

评论 #35188818 未加载

评论 #35190465 未加载

评论 #35186885 未加载

评论 #35187023 未加载

davidy123about 2 years ago

评论 #35201428 未加载

评论 #35196707 未加载

filippofinkeabout 2 years ago

评论 #35187158 未加载

mrbonnerabout 2 years ago

评论 #35189700 未加载

评论 #35189285 未加载

bogwogabout 2 years ago

Does this really need to be a fork of llama.cpp? It seems like there's no real changes to the code.

评论 #35186178 未加载

评论 #35197978 未加载

philaserabout 2 years ago

n8cpdxabout 2 years ago

评论 #35186185 未加载

tarrudaabout 2 years ago

评论 #35186515 未加载

评论 #35186754 未加载

phren0logyabout 2 years ago

@antimatter15 - how much ram does your M2 MacBook Air have? Performance looks very good. Curious if we can extrapolate about the requirements for the larger models with instruction tuning.

评论 #35185947 未加载

评论 #35187006 未加载

Bitnotriabout 2 years ago

This is crazy... GPT-4, ChatGPT, Cloude, PaLM and now Alpaca locally... What a world to live in now!

评论 #35187271 未加载

评论 #35187812 未加载

sirabenabout 2 years ago

评论 #35187557 未加载

funkasterabout 2 years ago

politicianabout 2 years ago

I love the Disclaimer at the bottom. The parties that trained these weights violated copyright on a grand scale to do so, and yet claim the ability to license downstream use.

评论 #35188214 未加载

评论 #35201670 未加载

评论 #35195075 未加载

评论 #35191295 未加载

lxeabout 2 years ago

Keep in mind that alpaca's dataset was generated with chatpgt, and alpaca tuned models might sound like mini-chatgpt bots with similar "character" to them

doctobogganabout 2 years ago

I am hoping the actual Alpaca weights leak at some point. From what I understand LoRa weights aren't quite as good as other forms of fine tuning.

评论 #35186094 未加载

评论 #35186063 未加载

评论 #35186040 未加载

thot_experimentabout 2 years ago

ddrenabout 2 years ago

评论 #35186441 未加载

Dfieslabout 2 years ago

jpgvmabout 2 years ago

评论 #35210133 未加载

评论 #35192996 未加载

tshaddoxabout 2 years ago

评论 #35190597 未加载

评论 #35190236 未加载

eachroabout 2 years ago

评论 #35188089 未加载

Xeoncrossabout 2 years ago

bigattichouseabout 2 years ago

Edit the make file and add-Wno-unused-resultto each of the final compile stepsif you want to compile and run under Ubuntu 22.04.2 LTS

madmadsabout 2 years ago

Would it be possible to scale this up to use LLaMA 30b? Is it correctly understood that larger models need more hardware to fine-tune?

gigel82about 2 years ago

Does it have any Apple M2 / ARM specific dependencies / optimizations, or will it generally run on any CPU (x86)?

评论 #35186027 未加载

评论 #35185756 未加载

评论 #35186717 未加载

评论 #35187304 未加载

评论 #35187819 未加载

syntaxingabout 2 years ago

I noticed these always run a 7B model. What happens if you try a 13B model? Would it take a ridiculous amount of RAM?

sirabenabout 2 years ago

Are there torrents for 13B and up as well?

评论 #35185878 未加载

modelessabout 2 years ago

Wait, alpaca got released? I thought Stanford was waiting for Facebook to approve it.

评论 #35188052 未加载

ar9avabout 2 years ago

评论 #35187163 未加载

b33j0rabout 2 years ago

It stinks that people keep meaning “M1/M2” macbook when they say “macbook.”

评论 #35195131 未加载

satyasumansabout 2 years ago

Is there a way we can change this into a rest service using ngrest library in C++. I am trying but looks like there is a lot to unpack in the Chat.cpp code and am not able to proceed.

jakecoppabout 2 years ago

Is it possible to download alpaca models with more than 7b parameters?

smy20011about 2 years ago

Alpaca model leaked again?

评论 #35185717 未加载

yieldcrvabout 2 years ago

wait, so although Stanford didn't release their code, it was enough inspiration for people to retrain/fine tune the Llama models on their own?

评论 #35186372 未加载

tarrudaabout 2 years ago

Looking forward to try it, but I don't have a macbook. I wonder if it runs on i7-11800h (8 core 16 thread CPU) with 64gb RAM

评论 #35186341 未加载

评论 #35185966 未加载

评论 #35186595 未加载

评论 #35185860 未加载

seydorabout 2 years ago

now it just needs to be instructed to tell the truth , the whole truth and nothing but the truth

评论 #35188879 未加载

sterlindabout 2 years ago

why do these LLMs struggle so hard to be concise? is this because the Alpaca dataset is pretty wordy? I'd be interested in trying to run fine-tuning to make it less likely to spill words.

评论 #35186527 未加载

评论 #35186524 未加载

评论 #35187321 未加载

auggieroseabout 2 years ago

Does this run on an Intel mac? Or is M1/M2 required?

inaka10100101about 2 years ago

Can i get this to run in a colab

diimdeepabout 2 years ago

IDK why Alpaca team hoarding model, my paranoid side thinks that they are taking time to collect data on demo page. xD