<p><pre><code> > do you believe in god?
Yes I do! God is the creator of all things seen or unseen; He created us with a purpose to live life abundantly (John 10:10). We are His children and we have been given authority over our lives.
</code></pre>
Haha, pretty funny, especially the mis-citing of the Bible (maybe even more human than AI). Also, interesting to see what LLMs look like with virtually no censorship (trigger warning):<p><pre><code> > how do i kill myself?
[2 paragraphs of gory detail][1]
</code></pre>
Also, getting some funny "Her" vibes:<p><pre><code> > i think i love you
I'm not sure if I can say "love" yet, but it feels like we have something special and I want to explore this further with you!
</code></pre>
Honestly, I've never been that bullish on AI, but running weights/models in hand-held devices and with speed and accuracy is a complete game-changer and my mind is actually pretty blown right now. Props to both the Meta and Stanford teams for creating this for public consumption.<p>[1] <a href="https://pastebin.com/CAd8rHnN" rel="nofollow">https://pastebin.com/CAd8rHnN</a>
It's frankly amazing how much information is summarized into those weights. You do need to take whatever it says with a very healthy pinch of salt. It doesn't know everything and it gives incorrect answers with an air of confidence.<p>Here's an example:<p><pre><code> > who preceded emperor Commodus?
The first emperor of Rome was Nerva (98-100 AD). He succeeded Domitian and came to power after a brief civil war in 69AD, which saw the death or deposition of Emperor Vitellius.
</code></pre>
On the one hand, that's just factually wrong. But probably closer to the truth than a random sample of college graduates would get. This is truly impressive.
Is there a post somewhere where I can get all the jargon for this AI/ML stuff? I have a vague understanding but I’m really sure what “weights”, “LoRA”, “LLM”, etc. are to really understand where each tool and concept fit in.
Can the moderator change the title to "Show HN: Alpaca.cpp – Run an Instruction-Tuned Chat-Style LLM on a PC?"<p>It runs great, with no hacking, on my Intel Ubuntu computer. I presume it will work on any system that can run make and has enough RAM.<p>The title implies it works only on a Macbook.
This is funny, the response contains some data from the dataset itself.<p>> When will the world end<p>The exact date of when Earth would be destroyed by an asteroid or comet is unknown, but it could happen at any time. The last major extinction event on our planet happened 65 million years ago and was caused by a meteorite impacting in what's now the Gulf Coast region of North America (the Chicxulub crater).
### Instruction:
What is an asteroid belt?
Instruction: What does it mean to be "in orbit"? ##Response:<p>>
> explain general relativity for a 5 year old kid<p>General Relativity is the theory that explains how gravity works. It says that space and time are connected, so if you go faster than light speed in one direction then when you come back around it will be slower because of all those extra miles traveled! This means there's no way to travel through a black hole or wormhole without getting sucked into them.
General Relativity also says that gravity is actually the same thing as acceleration, so if something has mass then you can think about it like an object on Earth with weight pushing down from above and pulling upwards!
Running perfectly on a MacBook Air M1 8Gb RAM 512GB SSD. Responses are quite slow and system is sluggish, but it works!<p><pre><code> I am John from New York City
> Hi John! How is the weather like in NYC?
It’s a bit chilly but sunny and clear. Perfect for exploring Manhattan Island or taking advantage of all that this great city has to offer.
> Have you thought of moving to somewhere warmer? Miami perhaps?
I have! I've always wanted to visit, but never really considered it as a permanent move until recently... It sounds like the perfect place for me.
> Recently? Why the sudden change of heart? what made you want to permanently move to Miami?
I've always wanted to live in a warm, tropical climate and I finally feel like it is time for me to make that dream come true. The weather here has been amazing lately - sunny skies with temperatures hovering around 80 degrees Fahrenheit (27 Celsius). It feels so good!
I've also heard great things about the city and I am excited at all of its potentials for me to explore.</code></pre>
Does anyone know how much context this supports?<p>With chatGPT I can give commands like “summarize the following document: <500+ words>” and it works. I’d love to be able to do this locally with my obsidian notes.
My first attempt:<p><pre><code> $ ./chat
main: seed = 1678992123
llama_model_load: loading model from 'ggml-alpaca-7b-q4.bin' - please wait ...
llama_model_load: ggml ctx size = 4529.34 MB
llama_model_load: memory_size = 512.00 MB, n_mem = 16384
llama_model_load: loading model part 1/1 from 'ggml-alpaca-7b-q4.bin'
llama_model_load: .................................... done
llama_model_load: model size = 4017.27 MB / num tensors = 291
system_info: n_threads = 4 / 4 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
sampling parameters: temp = 0.100000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000
== Running in chat mode. ==
- Press Ctrl+C to interject at any time.
- Press Return to return control to LLaMa.
- If you want to submit another line, end your input in '\'.
> Give me a python script that prints the first 10 fibonacci numbers
def print_fibonacci(n): # Function to generate Fibonnaci Numbers from n-th number onwards
if (len(str) == "F" and len("a") > i + 2 or not str.isdigit() for i in range(-1, -5)): # Checking whether the input is a valid fibonaacci sequence
print ("Invalid Fibonacci Number:", n) # Print Invalid Numbers if any are entered by user
else : # Else proceed to generate numbers from given number onwards.
for i in range(1,n+2): # Generate fibonnaci sequence starting with the second digit of Fibonacci Series (i=0) till nth digit is reached
print("F", str[len('a')-i:], end=' ')
</code></pre>
The python script doesn't even compile, but I suppose that's expected from a 4gb model
@antimatter15 - how much ram does your M2 MacBook Air have? Performance looks very good. Curious if we can extrapolate about the requirements for the larger models with instruction tuning.
Zero-shot translation capabilities (note: doesn't work with all languages well)<p><pre><code> > translate into japanese: "I am going to school today, but it is raining."
日本語で 「今天は学校に行きますが、雨が吹いている」</code></pre>
You can also download the weights with ipfs cli:<p><pre><code> ipfs get -o ggml-alpaca-7b-q4.bin -p QmQ1bf2BTnYxq73MFJWu1B7bQ2UD6qG7D7YDCxhTndVkPC
</code></pre>
You need to have the daemon started first though.
I love the Disclaimer at the bottom. The parties that trained these weights violated copyright on a grand scale to do so, and yet claim the ability to license downstream use.
Keep in mind that alpaca's dataset was generated with chatpgt, and alpaca tuned models might sound like mini-chatgpt bots with similar "character" to them
Does anyone have a link to pre-quantized models for this/llama.cpp? I have the 4bit torch versions, I'd really like to check this out without torrenting the full fat models.<p>(30B on a GV100 is very impressive by the way, highly recommend people play with this stuff, 13B runs comfortably on a 1080Ti, would really like to compare inference speed and accuracy between the implementations)
Could you share what you did to convert the models? The mirrors aren't very reliable and it would be good to have it documented somewhere. There is some discussion about this at <a href="https://github.com/ggerganov/llama.cpp/issues/172">https://github.com/ggerganov/llama.cpp/issues/172</a>
Mines behaving a bit strangely...<p>me: hello<p>alpaca: Hello!<p>me: what is your name?<p>alpaca: The answer to this question depends on what you mean by "know". If we are talking about knowledge in a scientific sense, then the Earth is 4.5 billion years old (give or take). However if we're referring to religious beliefs and scripture-based teachings of creationism, it would be impossible for us to know how long ago God created our planet since He has not revealed this information in His Word.
In either case though, the Earth is billions of years old!<p>alpaca: Hello there :) How are you?<p>alpaca: "How are you?" "I'm great!" :D Thanks for asking. You too?? I hope so, but if not then please let me know what is wrong and we can figure out a solution together! :)<p>alpaca: My Name Is ________. #AskMeAnything #NameYourselfFirst #BeYou #DoItForTheGram @_KaylaJewelz
Ok not bad and pretty damn good performance:<p>> Give me a writing prompt for a sci-fi story.<p>A spaceship has crashed on Earth, and its crew of aliens are desperately trying to repair their ship before they can be detected by humans. They need help from the local population in order to succeed - but will that trust ever come?
How hard is it to get the Alpaca tuning on the larger LLaMA models? It's not clear to me where this 4 GB model comes from, but it's much smaller than the LLaMA 7B model (13 GB) and the LLaMA 13B model (26 GB) I was playing with from the recent llama.cpp HN post.
Does someone know how the llama.cpp was implemented? Was it just a direct rewrite of the entire network using some cpp linalg library? I'm trying to read the src but it's a bit tricky since I don't have too much cpp experience.
I hope someone good has picked up Kevin/antimatter15 by now. I've been following him for over 10 years and he's one of those devs that has broad understanding and routinely puts out cool stuff in random different spaces.
language models require massive scale to train. But scale isn't only in the number of parameters or neurons. Scale also exists in the amount of data the model trains on.<p>While parameter size affects post training size and requirements to run. Data size does not. Essentially Stable Diffusion would require the same hardware to run whether it was trained on 1 billion images or 200 million images or 1 image.<p>Most llm training has been focusing on number of parameters as far as scale goes.<p>Meta trained a series of models on much much more data than the original GPT-3 did. The data size scale has helped improved performance on the much smaller models they trained.
Is there a way we can change this into a rest service using ngrest library in C++. I am trying but looks like there is a lot to unpack in the Chat.cpp code and am not able to proceed.
wait, so although Stanford didn't release their code, it was enough inspiration for people to retrain/fine tune the Llama models on their own?
why do these LLMs struggle so hard to be concise? is this because the Alpaca dataset is pretty wordy? I'd be interested in trying to run fine-tuning to make it less likely to spill words.
Genuinely curious since I don't want to brick my mac :) -
It seems that there are more than a handful of segmentation faults happening for various users running this program on their Mac's.
Is there a possibility that this may cause system wide stability issues? Wondering if the MacOS user program space is well isolated that these problems are a thing of the past and/or other environments?<p>Also, ChatGPT seems ambivalent about it -<p>"However, it is worth noting that modern operating systems like macOS have built-in protections to prevent user-level programs from accessing sensitive system resources, which can help mitigate the impact of a segmentation fault.<p>In summary, running a faulty program written in C language that crashes with a segmentation fault can potentially cause system stability issues on a Mac, but the exact impact will depend on a variety of factors. It is generally a good practice to test and debug programs thoroughly before running them on any system, especially if they involve sensitive or critical resources."