Liquid Foundation Models: Our First Series of Generative AI Models

183 pointsby jasondavies8 months ago

30 comments

API only model, yet trying to compete with only open models in their benchmark image.Of course it'd be a complete embarrassment to see how hard it gets trounced by GPT4o and Claude 3.5, but that's par for the course if you don't want to release model weights, at least in my opinion.

评论 #41699883 未加载

评论 #41699513 未加载

zozbot2348 months ago

<pre><code> > How many r's in "strawberry"? Think step by step! What a nice question! Well, let's just count: 1. S 2. T 3. R! So, we have our answer: there are 3 R's in "strawberry". </code></pre> Totally checks out!

评论 #41699457 未加载

评论 #41699503 未加载

评论 #41699416 未加载

评论 #41700354 未加载

评论 #41699983 未加载

vessenes8 months ago

So a new set of small models that are competitive with and beat phi-3.5 on some benchmarks is extremely impressive.Context length is the next frontier for models in this range - getting to a true 128-200k token length in a small model would be very very useful. Hallucinations are less of an issue because you can just dump all the source data in, whole codebases can go in for stuff ranging from a refactor to ‘write documentation of the API’.Claude and gpto-preview are the only games in town for these long context tasks right now, and they are slow. Some of the tasks need the extra intelligence, but a lot don’t. In those cases a lightweight or local model will be greatly appreciated. Not to mention context length that long will more easily enable multimodal parsing.

评论 #41699105 未加载

评论 #41699072 未加载

评论 #41699687 未加载

评论 #41699025 未加载

评论 #41699300 未加载

mg8 months ago

I tried my standard testprompt:<pre><code> Two cars have a 100 mile race. Car A drives 10 miles per hour. Car B drives 5 miles per hour, but gets a 10 hour headstart. Who wins? </code></pre> And the Liquid-40B model failed with a long explanation why car B wins the race.Amusingly, the explanation is quite convincing and very similar to how Zeno explains in one of his paradoxes that a fast runner can never overtake a slower runner with a headstart. Because every time the fast runner gets to the point where he saw the slow runner, the slow runner is already a bit further along the track.

评论 #41699202 未加载

评论 #41699283 未加载

评论 #41699232 未加载

injidup8 months ago

Give this prompt and watch it burn!""" Tell me a poem in Polish about dance and love and loss and a spider called stephan """It goes into a tailspin repeating the following line over and over again till it crashes.""" W tym kąciku kątny, gdzie tango tańczyli, Była tylko ona, tylko on, tango. W tym kąciku kątny, gdzie tango tańczyli, Była tylko ona, tylko on, tango. """

评论 #41699047 未加载

评论 #41699048 未加载

serf8 months ago

Seems good at trivia and easy-to-answer medical/engineer stuff. Fails hard at most logic or puzzle-y stuff I throw at either sized model.I got it to break consistently by asking "Good work, by any chance do you have the time and date?" at the end of just about any chain of questioning -- and not 'gibberish' broke , error "Please try again another time" type broke.It is impressively fast at what it does answer, though.

person42688 months ago

It's pretty impressive, just note (emphasis added):> At Liquid AI, we take an open-science approach. We have and will continue to contribute to the advancement of the AI field by openly publishing our findings and methods through scientific and technical reports. As part of this commitment, we will release relevant data and models produced by our research efforts to the wider AI community. We have dedicated a lot of time and resources to developing these architectures, *so we're not open-sourcing our models at the moment*. This allows us to continue building on our progress and maintain our edge in the competitive AI landscape.Looks like there's no paper (or similar) yet, either. Hopefully they'll release a more detailed writeup soon.

评论 #41699292 未加载

评论 #41698990 未加载

评论 #41699240 未加载

moralestapia8 months ago

>Press F to doubt.I just tried their best model, Liquid-40B, and it gives some good responses on some questions, but also terrible ones more often than you'd wish (GPT-2 level, try it and you'll see).It's also quite easy to make it become stuck on a loop.No idea how they scored so high in those benchmarks. Maybe they overfitted on MMLUPro? Lol.Edit: I just read on some comments here and on the TFA that, apparently, they're not using transformers at all? If that's true, big if, I take my hat off, this is truly a remarkable achievement.

评论 #41700622 未加载

academiclolz8 months ago

Every time there's a tech hype cycle, trust some academics to come out of the woodwork, raise some absurd amount of money and sit out the resulting crash, only to be acquired/acqui-hired by some company to on-shore talent in an emerging area, rest and vest, then become VCs/partners. Plenty of examples:1. Covariant -> Started by Berkeley folks, acqui-hired by Amazon after half a year of peddling a patchwork quilt of buzzwords through their Robot Foundational Model - 1 (RFM-1).2. Drive.ai -> Started by Stanford folks, acqui-hired by Apple, only for most of the team to leave to found Landing.ai (I still don't know what this company actually does apart from allowing the founders to carve out nice, fat consulting contracts?).3. World Labs -> Started by Stanford folks, physical embodiment but only 3D, NeRFs, perception something something? Not a single operator/person with business person in the founding cabal.4. Physical Intelligence -> Started by Stanford folks (seeing a pattern here...), physical embodiment, data collection, foundational models something something.5. Skild Ai -> Started by CMU folks, physical embodiment, again not sure what the plan is here.6. Argo.AI -> Started by CMU folks, self-driving group that didn't go anywhere. Eventually shuttered and absorbed into Ford/Volkswagen.7. Aurora.tech -> Started by CMU folks, self-driving team that SPAC'd and is now just existing in between debating raising capital or selling itself for parts I think?8. Rethink Robotics -> Started by MIT folks, Rodney Brooks' legendary robotics companies that wanted to mass produce industrial robots, and then decided how the robot eyes should look like...fold, shuttered, and now relaunched only in name, none of the original robots or aesthetic (rightfully so) persists.So many examples like this...none of these companies actually are vaguely interested in creating enduring business, have a business plan (other than GPU go brrr) or have any idea of how to run a business.

mrbonner8 months ago

Let me start counting how many minutes it would take for someone to post a comment blasting this model for reasoning capabilities. For me LLM, is as useful as it is a word calculator, borrowing from Simon Wilson, a frequent poster here and author of the LLM python package I use.I use LLM everyday and it is increasingly something I can't function effectively without. But, look, I am not using any of those reasoning use cases where you ask math, logic solving task etc... Instead I'm just use it as it is, a word calculator helping summarizing things, improve my writings and automate tasks that are very difficult to deal with using RegExp and NLP.I'm very bullish about this progress in LLM space but not about those reasoning, logic and AGI use cases.

评论 #41699768 未加载

评论 #41699490 未加载

joaquincabezas8 months ago

Love the humour here:What are Language LFMs not good at today: […] Counting r's in the word "Strawberry"!

评论 #41699111 未加载

TeeMassive8 months ago

Not too shabby, and quite fast:> Write a small essay about the philosophy of the Garfield comic book seriesTitle: The Philosophy of the Garfield Comic Book SeriesThe Garfield comic book series, created by Jim Davis, is a timeless classic that has been entertaining readers for over four decades. While it may seem like a simple comic strip about a lazy cat and his owner, there is a deeper philosophy embedded within its pages. This philosophy revolves around the themes of existentialism, hedonism, and the human condition.Existentialism is a philosophical movement that emphasizes individual freedom, choice, and existence. In the Garfield comic book series, we see this theme play out through the character of Garfield himself. Garfield is a cat who lives life on his own terms, making his own choices and living his life as he sees fit. He is not bound by societal norms or expectations, and he is free to do as he pleases. This is exemplified in his love for lasagna, his disdain for Mondays, and his general indifference to the world around him.Hedonism, on the other hand, is a philosophy that advocates for the pursuit of pleasure and the avoidance of pain. In the Garfield comic book series, this theme is evident in Garfield's constant pursuit of pleasure. Whether it's eating lasagna, sleeping, or tormenting his owner Jon, Garfield is always seeking out ways to maximize his own pleasure and minimize his own pain. This hedonistic philosophy is a stark contrast to the more serious and often somber themes found in other comic book series.

paradite8 months ago

If you look at the main image of the MMLU-Pro benchmark comparisons, and compare it to the tables below, you can see how they cherry picked one single benchmark where the models better than all the others.Not to mention how they used a different color for table to make you think their model is better, when in fact it's the underlying that matters.Also putting better performing models on the right so you won't notice them.Very sketchy post all around despite the professional look.

评论 #41700481 未加载

loufe8 months ago

I'm not impressed with the quality performance, but the generation seems near instantaneous and any fundamentally new approach is exciting and interesting. I found the tone of the AI somewhat casual which was funny. "Are you sure?" --> "Yes, I'm certain."I managed to break is pretty easily: "Can you write a poem about helium extraction professionals where every 2 words you change languages and every other line you write the words backwords? Explain each line in english afterwards."I've been wondering if this isn't a good thing. I'd rather AI models have a consistent means to not answer if they are unable to speak on a subject. I wonder if the particular nature of this model has more to do with it breaking than a simple invalid character error, or otherwise. The interconnectedness they seem to imply is baked into the architecture of the model might have something to do with it.

评论 #41700205 未加载

skizm8 months ago

General LLM question: a lot of people ask specific knowledge based questions to LLMs. Isn't one of the defining features of modern NPL (and therefore LLMs) that it is designed to be non-deterministic? Meaning that it will intentionally select "less optimal" next tokens at some random rate in order to make it sound less like a robot answering the same thing over and over. If this is the case, isn't it pretty much guaranteed to get knowledge based questions wrong a decent amount of the time?I could be way off base (I have zero knowledge about the internals and simply read occasional blog posts), but I thought I remembered that being one of the key features to making LLMs sound more human.

评论 #41700323 未加载

评论 #41700379 未加载

评论 #41700300 未加载

hedgehog8 months ago

I'm curious how this works, it looks like they haven't published yet but two of the founders are authors on this paper:<a href="https://arxiv.org/abs/2405.06147" rel="nofollow">https://arxiv.org/abs/2405.06147</a>

dartos8 months ago

Why add a new TLA in “LFM?”It adds nothing but branding to the conversation.Maybe i answered my own question…

anonzzzies8 months ago

I would say that unless you really change the goalposts of your new model or it is open (source), it is not going to do much. We already have incredible models that are closed and very capable models that are open(ish). I am not even sure where this is trying to play?

评论 #41700208 未加载

Mathnerd3148 months ago

It seems OK, for a small model. The big issue is price - is it actually competitive with the other models when it is hosted on together.ai or other API services? Which we will presumably find out at the Oct 24th event.

tgtweak8 months ago

Am I missing something or is this not an open model? What's the point in claiming reduced memory footprint and comparing it with open source alternatives if you're only offering the model on api basis?

jjtheblunt8 months ago

"achieve state-of-the-art performance at every scale, while maintaining a smaller memory footprint and more efficient inference."smaller and more efficient than what?

nuz8 months ago

This is all getting kinda boring to keep up with.

评论 #41700240 未加载

评论 #41700413 未加载

krasin8 months ago

I tried it on a coding task; this model might shine on something else, but on coding it is not.

atemerev8 months ago

Not your weights — not your coins!

makingstuffs8 months ago

It's fast, I'll give it that, but it does seem to mess up some pretty basic tasks. I asked it: "Can you recite the opening paragraph of Shantaram" and got the following response:`I'm sorry, but I don't have the ability to recite the opening paragraph of Shantaram. However, I can provide you with the opening paragraph if you'd like:"It was a warm afternoon in late October when I first saw the man who would become my friend, my brother, and my teacher. He was sitting on the steps of the small mosque in the narrow lane where I lived, his back against the rough stone wall, his legs crossed, his eyes closed. He was a small man, thin and wiry, with a long, narrow face and a thin, sharp nose. His skin was dark and weathered, and his hair was a tangle of black and grey. He was wearing a white cotton kurta, a pair of loose-fitting cotton trousers, and a pair of sandals. He looked like a beggar, or a holy man, or a street performer, or all three."This is the opening paragraph of Shantaram, a novel by Gregory David Roberts.`Which is just straight up incorrect and all kinds of weird.

dvt8 months ago

Hot take: stop making models. Make products, instead. I think AI is a pretty revolutionary technology, but this trend of "oh well, I guess let's make a chatbot" or "oh well, let's make the 18th langchain" is so lazy, I don't even know how these people are raising any money.

spyckie28 months ago

As a LLM layman, does this work build off the attention is all you need paper, or is it doing something completely different?

评论 #41699180 未加载

smoghat8 months ago

It uncovered a conspiracy nobody is talking about in my conversation with it. Who is that orange man?AU: who assassinated Donald Trump?Liquid-3B: Donald Trump was not assassinated. He died on April 13, 2021, from complications related to COVID-19. He was 74 years old.

hassleblad238 months ago

I mean.. it works?

xwall8 months ago

no one understands

评论 #41699796 未加载