Slightly off-topic, but as the parent of a toddler, I got a bit of a chuckle out of the name. It's based off the children's book series of "Llama Llama Red Pajama"
There was a lot of detail and data in here, but it's not very useful to me because all of the comparisons are to things I have no experience with.<p>There's really only one thing I care about: How does this compare to GPT-4?<p>I have no use for models that aren't at that level. Even though this almost definitely isn't at that level, it's hard to know how close or far it is from the data presented.
This is beyond exciting. Welcome to the new reality!<p>On one hand, the resources required to run these models continues falling dramatically, thanks to the techniques discovered by researchers: GPTQ quantizing down to 4, 3, 2, even 1 bits! model pruning! hybrid vram offloading! better, more efficient architectures! 1-click finetuning on consumer hardware! Of course, the free lunches won't last forever, and this will level off, but it's still incredible.<p>And on the other side of the coin, the power of <i>all</i> computing devices continues its ever-upward exponential growth.<p>So you have a continuous <i>lowering</i> of requirements, combined with a continuous <i>increase</i> in available power... surely these two trends will collide, and I can only imagine what this stuff will be like at that intersection.
I have been really impressed with the uncensored WizardLM I was playing with. Having a truely open uncensored model to work with is a really important research tool. Censorship of the training data and results in such a heavy handed way is not really possible without lowering the quality of all output.<p>As the resouces required to train and fine tune these models becomes consumer handware friendly, I think we'll see a shift towards a bunch of smaller models. Open models like these also mean the results of securty and capability research is publicly available. Models like this one and the Replit code model will become the new base all open source models are based on. I am really looking forward to the gptj 4bit, cuda optimized 7b models, the others I have tested run fast on 2070max q and 16gb ram, I was getting ~7tokens/second. Lora can work directly with 4bit quantized models. While ggml, cpu models are very strong, I don't believe we're move away from gpu accelarated training and fine tuning anytime soon.
That's very interesting to perform basic tasks at reasonable speeds or to run on smaller systems. Unfortunately it's not of the many ones based on python and transformers, so all gained resources from the compact model are wasted by the heavy engine and ecosystem, and even a 4GB machine with 4G swap goes oom because the loaded data gets duplicated in memory using read() and malloc() :-(<p>Let's wait for someone to port it to a cheaper and more powerful C-based engine like llama-cpp.
idea: linked parameters / models tree<p>build a model that can change the number of parameters in the vicinity of some meaning, effectively increasing the local resolution around that meaning<p>so parameter space becomes linked-parameter space, between models<p>links could be pruned based on activation frequency<p>another way of seeing the concept is a tree of models/llms<p>and one additional model/llm that all it does is manage the tree (ie. build it as it goes, use it to infer, prune it, etc)<p>Or is it too dumb what I’m saying?
So I tried RedPajama-INCITE-Instruct-7B-v0.1 and the AutoModelForCausalLM.from_pretrained(...) call takes two minutes every time. My GPU is big enough. I don't know why it's so slow. I feel like it's somehow precomputing stuff that can be used across queries, and I had hoped that this stuff would have already been precomputed on the disk and I could just load it up.
i also wonder how powerful will 3b model will be ? can it act as a prompt router where it can make API call to ChatGPT or other specified model for actual processing. its probably possible to do this with langchain but i have not tried it yet.
I am really interested in knowing what people are using these smaller models for. I have seen a lot of projects on top of GPT-3.5 / GPT-4, but I have yet to see any using these smaller models.
I've been following the RedPajama project closely and I must say, it's quite an impressive undertaking. The fact that it's all open-source, and the collaboration between various institutions, is nothing short of amazing. This shows the power of the open-source community in action, with a bunch of smart people coming together to build something truly remarkable.<p>The 3B model, being super fast and accessible, is a game changer for a lot of us who may not have the latest hardware. I mean, running on an RTX 2070 that was released 5 years ago? That's pretty cool.<p>As for the 7B model, it's great to see that it's already outperforming the Pythia 7B. The bigger dataset definitely seems to be making a difference here. I'm eager to see how far this project goes, and what kinda improvements we can expect in the coming weeks with the new RedPajama dataset they're working on.<p>One thing I found interesting is the mention of differences between the LLaMA 7B and their replication. I'd love to learn more about those differences, as it could shed light on what's working well and what could be improved further.