TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

ChatGPT is not all you need. A SOTA Review of large Generative AI models

157 点作者 georgehill超过 2 年前

7 条评论

haldujai超过 2 年前
Is it just me or does anyone else cringe when they read &quot;is&#x2F;is not all you need&quot; in the title of an AI-related paper?<p>Also, what is &quot;SOTA&quot; for review? There isn&#x27;t exactly a benchmark to compare...<p>In terms of comprehensiveness they don&#x27;t mention PaLM + variants, which probably should be mentioned as it is currently the largest LLM with SOTA on several benchmarks (e.g. MedQA-USMLE).<p>In terms of correctness, I admittedly skipped to the sections I&#x27;m familiar with (LLMs) but I don&#x27;t understand why they are distinguishing &#x27;text-science&#x27; from &#x27;text-text&#x27;, they&#x27;re both text-text and there is no reason why you can&#x27;t, for example, adapt GPT3.5 to a scientific domain domain (some people even argue this is a better approach). A lot of powerful language models in the biomedical domain were initialized from general language models and use out-of-domain tokenizers&#x2F;vocabularies (e.g. BioBERT).<p>The authors also make this statement regarding Galactica:<p>&quot;The main advantage of [Galactica] is the ability to train on it for multiple epochs without overfitting&quot;<p>This is not a unique feature of Galactica and has been done before. You&#x27;re allowed to train LLMs for more than 1 epoch and in fact it can be very beneficial (see BioBERT as an example of increasing training length).<p>People GENERALLY don&#x27;t do this because the corpus used during self-supervised training is filled with garbage&#x2F;noise, so the model starts to fit to that instead of what you desire. There is nothing special about Galactica&#x27;s architecture that specifically allows&#x2F;encourages longer training cycles but rather they curated the dataset to minimize garbage. As another example, my research involves radiology NLP and when doing domain adaptive pretraining on a highly curated dataset we have been going up to 8 epochs without overfitting.
评论 #34459131 未加载
评论 #34462887 未加载
评论 #34460226 未加载
评论 #34460126 未加载
soperj超过 2 年前
I really haven&#x27;t found ChatGTP to be very good. Ask it the lyrics to a song and it&#x27;ll come up with some of the lyrics but not all of the lyrics. Like this is what I got for &quot;In Bloom&quot; by Nirvana:<p>Verse 1: Sell the kids for food Weather changes moods Spring is here again Reproductive glands Verse 2: A country battle song Multiply, exciting people Come on, join the party Come on, everybody Chorus: In bloom In bloom In bloom In bloom Verse 3: Subhumanity is fun You have one, you have none A soap impression of his wife Which he ate and donated to the National Trust Verse 4: I&#x27;m not like them But I can pretend The sun is gone But I have a light Chorus: In bloom In bloom In bloom In bloom
评论 #34461817 未加载
amelius超过 2 年前
&gt; This work consists on an attempt to describe in a concise way the main models are sectors that are affected by generative AI and to provide a taxonomy of the main generative models published recently.<p>I had trouble parsing this sentence.<p>And why doesn&#x27;t the abstract provide at least some basic explanation of the title?
评论 #34458776 未加载
评论 #34459705 未加载
评论 #34459130 未加载
评论 #34458468 未加载
coolspot超过 2 年前
No OpenScience’s Bloom (opensource, 175B params), no Google’s T5 (opensource, 11B).<p>The AI&#x2F;ML space moves so fast that this review is already outdated.
评论 #34460167 未加载
la64710超过 2 年前
I think the simple chat interface is why chatGPT became popular. If other people build relatively simple interfaces to leverage other models they might be successful as well.
评论 #34460565 未加载
评论 #34462268 未加载
Xeoncross超过 2 年前
A correct anticipation of the future is a big part of all successful projects.<p>However, that is mostly do the effort required to change course. If we reach a point where it&#x27;s easy enough to regenerate everything from scratch, will it be so important to correctly plan ahead?
评论 #34458257 未加载
simonw超过 2 年前
The diagrams in this paper did not fill me with confidence: there doesn&#x27;t appear to be any reason for the layout of the boxes and lines in them other than to fill some space.