TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Large Concept Models: Language modeling in a sentence representation space

167 pointsby batata_frita5 months ago

11 comments

nutanc5 months ago
This maps a little to what we are doing research on what we are calling as shape of stories[1].<p>We can clearly see in 2D space itself how different &quot;concepts&quot; are explored.<p>Using the shape of stories for semantic chunking we can clearly see in multiple articles how we can chunk by &quot;concepts&quot;. [2]<p>Now we are trying to see if we can just use these chunks and train a next &quot;chunk&quot; predictor instead of a next word predictor.<p>In the paper, they take a sentence to mean a concept. We believe that a &quot;semantic chunk&quot; is better suited for a concept instead of a sentence.<p>[1] <a href="https:&#x2F;&#x2F;gpt3experiments.substack.com&#x2F;p&#x2F;the-shape-of-stories-or-how-ai-sees" rel="nofollow">https:&#x2F;&#x2F;gpt3experiments.substack.com&#x2F;p&#x2F;the-shape-of-stories-...</a><p>[2]<a href="https:&#x2F;&#x2F;gpt3experiments.substack.com&#x2F;p&#x2F;a-new-chunking-approach-to-rag" rel="nofollow">https:&#x2F;&#x2F;gpt3experiments.substack.com&#x2F;p&#x2F;a-new-chunking-approa...</a>
评论 #42569426 未加载
stravant5 months ago
This feels like a failure to learn the bitter lesson: You&#x27;re just taking the translation to concepts that the LLM is certainly already doing and trying to make it explicitly forced.
评论 #42564970 未加载
评论 #42564882 未加载
评论 #42565526 未加载
评论 #42564713 未加载
评论 #42567068 未加载
mdp20215 months ago
&gt; <i>Current best practice for large scale language modeling is to operate at the token level, i.e. to learn to predict the next tokens given a sequence of preceding tokens. There is a large body of research on improvements of LLMs, but most works concentrate on incremental changes and do not question the main underlying architecture. In this paper, we have proposed a new architecture,</i><p>For some 2024 may have ended badly,<p>but reading the lines above shines a great light of hope for the new year.
steenreem4 months ago
I skimmed the paper but I couldn&#x27;t figure out what they&#x27;re doing to make concepts fundamentally different from tokens.<p>I would think that the purpose of concepts is to capture information at a higher density than tokens, so you can remember a longer conversation or better produce long-form output.<p>Given that, I would have expected that during the training phase, the concept model is evaluated based on how few concepts it emits until it emits a stop.
vimgrinder5 months ago
I like the idea of &quot;concept&quot; .. you can represent a concept with language, visual etc. but it isn&#x27;t any of those. Those are symbols used to communicate a concept or give representation to it but concepts are just connections between other concepts at the core. The closest things i feel to this is categories in category theory.
评论 #42567254 未加载
评论 #42568803 未加载
rxm5 months ago
What used to be feature engineering a decade or more ago now seems to have shifted to developing distributed representations. LLMs use word tokens (for words or the entities in images). But there are many more. The 3D Fields (or whatever they have evolved to) developed by Fei-Fei Li&#x27;s group represent visual information in a way better suited for geometrical tasks. Wav2Vec, the convolutional features for YOLO and friends, and these sentence representations are other examples. I would love to read a review of this circle of ideas.
inshard5 months ago
This is interesting. I wonder if such a project could dive into lower-level concepts, those akin to prime numbers. The atoms from which all other concepts are built.
lern_too_spel5 months ago
This is like going back to CNNs. Attention is all you need.
评论 #42564334 未加载
评论 #42564390 未加载
benreesman5 months ago
Between this and learned patches and ModernBERT and DeepSeek?<p>I think it’s time to read up.
upghost5 months ago
Aside from the using the word &quot;concept&quot; instead of &quot;language&quot; I don&#x27;t see how this is different than an LLM. It&#x27;s still doing next token prediction. This is like in D&amp;D where you have two swords with wildly different flavor text but ultimately they both do 1d6+1 damage.<p>What am I missing -- aside from the marketing? Is there something architecturally different or what? Looks like regular autoregressive sequence transformer to me.
评论 #42566338 未加载
评论 #42566914 未加载
YeGoblynQueenne5 months ago
From the paper:<p>&gt;&gt; In this paper, we present an attempt at an architecture which operates on an explicit higher-level semantic representation, which we name a “concept”.<p>I wonder if the many authors of the paper know that what they call &quot;concept&quot; is what all of machine learning and AI has also called a &quot;concept&quot; for many decades, and not a new thing that they have just named from scratch.<p>For instance, classes of &quot;concepts&quot; are the target of learning in Leslie Valiant&#x27;s &quot;A Theory of the Learnable&quot;, the paper that introduced Probably Approximately Correct Learning (PAC-Learning). Quoting from its abstract:<p><pre><code> ABSTRACT: Humans appear to be able to learn new concepts without needing to be programmed explicitly in any conventional sense. In this paper we regard learning as the phenomenon of knowledge acquisition in the absence of explicit programming. We give a precise methodology for studying this phenomenon from a computational viewpoint. It consists of choosing an appropriate information gathering mechanism, the learning protocol, and exploring the class of concepts that can be learned using it in a reasonable (polynomial) number of steps. Although inherent algorithmic complexity appears to set serious limits to the range of concepts that can be learned, we show that there are some important nontrivial classes of propositional concepts that can be learned in a realistic sense </code></pre> From: <a href="https:&#x2F;&#x2F;web.mit.edu&#x2F;6.435&#x2F;www&#x2F;Valiant84.pdf" rel="nofollow">https:&#x2F;&#x2F;web.mit.edu&#x2F;6.435&#x2F;www&#x2F;Valiant84.pdf</a><p>Or take this Introduction to Chapter 2 in Tom Mitchell&#x27;s &quot;Machine Learning&quot; (the original ML textbook, published 1997):<p><pre><code> This chapter considers concept learning: acquiring the definition of a general category given a sample of positive and negative training examples of the category. </code></pre> From: <a href="https:&#x2F;&#x2F;www.cs.cmu.edu&#x2F;~tom&#x2F;mlbook.html" rel="nofollow">https:&#x2F;&#x2F;www.cs.cmu.edu&#x2F;~tom&#x2F;mlbook.html</a> (clink link in &quot;the book&quot;).<p>I mean I really wonder some times what is going on here. There&#x27;s been decades of research in AI and machine learning but recently papers look like their authors have landed in an undiscovered country and are having to invent everything from scratch. That&#x27;s not good. There are pitfalls that all the previous generations have explored thoroughly by falling in them time and again. Those who don&#x27;t remember those lessons will have to find that out the hard way.
评论 #42566847 未加载