Copy is all you need

137 点作者 mottiden将近 2 年前

18 条评论

xg15将近 2 年前

I think the "... is all you need" title here is particularly misleading as the paper does in fact use a BERT model for generating the vectors.So if the implication was that no language model was needed at all and you can just do nearest neighbour on string similarity and patch results together, that implication was clearly wrong.I think what the paper does show though is that there are methods that can make language models topic-specific without fine-tuning and that yield competitive results even with older models.

评论 #36762558 未加载

评论 #36761762 未加载

VHRanger将近 2 年前

This resonates with the current AI skeptic view that language models are a supercharged search engine on the pile of text they're trained on.Also the fact that evaluating language models is difficult, and we tend to end up with models that game the evaluation benchmarks.

评论 #36758714 未加载

评论 #36766639 未加载

MAXPOOL将近 2 年前

What about LLM reasoning ability?Faith and Fate: Limits of Transformers on Compositionality <a href="https://arxiv.org/abs/2305.18654" rel="nofollow noreferrer">https://arxiv.org/abs/2305.18654</a>Transformers solve compositional reasoning tasks by reducing multi-step compositional reasoning into linearized subgraph matching without problem-solving skills. They can solve problems when they have reasoning graphs in the memory.

评论 #36759871 未加载

评论 #36761103 未加载

naillo将近 2 年前

Seems like a common pattern. State of the art models being well replaced by a information retrieval layer (top 10 results) fed into a much lighter model that does something with that plus the original input. Cool result!

评论 #36761675 未加载

评论 #36759434 未加载

评论 #36759638 未加载

Animats将近 2 年前

This approach can probably handle most of the queries search engines and Siri-type chatbots handle. The big GPT-type engines can be reserved for the hard problems. Something along those lines is needed to keep the cost of search down. There's an estimate that using a large language model for search is 10x more expensive than existing search engines. Yet few queries really need that much heavy machinery.

woeirua将近 2 年前

The big advantage here would be the ability to attribute entire blocks of text back to a specific source and cross domains just by building a database of embeddings. The downside is that these networks are probably not as creative as they're limited to only data that's available. It might work best to use something like this as an expert system for a GPT like agent to refer to when needed.

msoad将近 2 年前

Obvious immediate question is, is it as creative? There are a lot creativity left behind when you increase the token size (let's be real, it's just that). As an example creating a new word like "dickstracted"[1] would not ever happen in this model[1] <a href="https://www.urbandictionary.com/define.php?term=Dickstracted" rel="nofollow noreferrer">https://www.urbandictionary.com/define.php?term=Dickstracted</a>

评论 #36760514 未加载

collinc777将近 2 年前

Slight tangent:I once worked with a programmer who, the vast majority of time, would only input text into a text editor via copy and paste.Think anti-vim. His fingers were locked on mouse and crtl+c/v. It was incredible to watch and his programming speed was very impressive.

评论 #36759910 未加载

评论 #36759591 未加载

评论 #36760538 未加载

评论 #36759777 未加载

评论 #36760072 未加载

评论 #36761040 未加载

评论 #36760005 未加载

Der_Einzige将近 2 年前

This has deep connections with my attempt to implement an effective queryable word-level grammatically correct extractive text summarizer (AKA: The way most people actually summarize documents) - <a href="https://github.com/Hellisotherpeople/CX_DB8">https://github.com/Hellisotherpeople/CX_DB8</a>I will try to implement this with the necessary changes to actually make this work properly, where instead of generating a new answer, it simply highlights the most likely text spans.

rapatel0将近 2 年前

Surprised no one has mentioned the obvious issue: plagiarism(Not sure if the authors have indicated any method for attribution of the original data)

soliton4将近 2 年前

this made me think of a fun activity. ask chatgpt to come up with a new word and then google that word. sometimes the word exists in the context of a scify show or a plant. sometimes gpt just added a "se" or "us" to existing words. sometimes it changes a Z to a C but it never actually came up with a new word

评论 #36760878 未加载

xianshou将近 2 年前

Behold, the true stochastic parrot.

评论 #36761580 未加载

amluto将近 2 年前

This is interesting coming on the heels of the gzip-based inference paper. gzip is based on LZ77, and the LZ family of compressors generate and store (and cleverly encode) instructions to copy blocks of text they have seen before to their output.

js8将近 2 年前

I remember that around 2004, before convnets became popular, there was a paper on image texture style transfer using approximate nearest neighbors based on some neighborhood of each point. This technique seems similar but for text.

评论 #36774487 未加载

thanatropism将近 2 年前

> COG<a href="https://wiki.opencog.org/w/The_Open_Cognition_Project" rel="nofollow noreferrer">https://wiki.opencog.org/w/The_Open_Cognition_Project</a>

sfmike将近 2 年前

Thought this was about how you just need good copywriting skills

opnac将近 2 年前

I wish we could stop with the “X is all you need” papers! The first one was unintuitive and so are the rest.

评论 #36760142 未加载

评论 #36759209 未加载

评论 #36758844 未加载

评论 #36759210 未加载

评论 #36758974 未加载

评论 #36760056 未加载

评论 #36758687 未加载

awestroke将近 2 年前

First, hate the titleSecond, this approach seems equivalent to using larger tokens, which means the problems with using tokens instead of letters are just exacerbated