TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Microsoft Kosmos-1: A Multimodal Large Language Model

228 点作者 solarist大约 2 年前

12 条评论

josalhor大约 2 年前
The examples in the paper are pretty impressive. There is an example of a windows 11 dialog image. The computer can figure out which button to press given the desired outcome of the user. If one where to take this model and scale it, I can see an advanced bot in <5 years navigating the web and doing work based on a text input of a human purely by visual means. Interesting times.
评论 #34981822 未加载
评论 #34981375 未加载
评论 #34983999 未加载
评论 #35015924 未加载
tomp大约 2 年前
Is there a better page to link to? I cannot even see &quot;Kosmos&quot; on this page!<p>Edit: Ah, looks like this is the link to the paper: <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2302.14045" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2302.14045</a><p>It was discussed yesterday: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=34965326" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=34965326</a>
评论 #34980933 未加载
ducktective大约 2 年前
It can even solve IQ tests...I mean, how much further are we moving the goal post?<p>Is there a model that can solve differential equations symbolically and numerically? Most of modern engineering just boils down to diff.eqs whether ordinary or partial. It&#x27;s our current best method to reason about stuff and control them.
评论 #34981187 未加载
评论 #34981220 未加载
评论 #34980807 未加载
评论 #34980777 未加载
评论 #34982708 未加载
评论 #34981239 未加载
评论 #34983752 未加载
评论 #34980768 未加载
评论 #34990227 未加载
评论 #34980917 未加载
评论 #34981213 未加载
评论 #34981185 未加载
评论 #34991394 未加载
solarist大约 2 年前
Paper: <a href="https:&#x2F;&#x2F;arXiv.org&#x2F;abs&#x2F;2302.14045" rel="nofollow">https:&#x2F;&#x2F;arXiv.org&#x2F;abs&#x2F;2302.14045</a><p>Examples: <a href="https:&#x2F;&#x2F;twitter.com&#x2F;alphasignalai&#x2F;status&#x2F;1630651280019292161" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;alphasignalai&#x2F;status&#x2F;1630651280019292161</a>
PaulHoule大约 2 年前
I like this feature they are working on<p><a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2212.10554" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2212.10554</a><p>as I&#x27;d say the most obvious limitation of today&#x27;s transformers is the limited attention window. If you want ChatGPT to do a good job of summarizing a topic based on the literature the obvious thing is to feed a bunch of articles into it and ask it to summarize (how can you cite a paper you didn&#x27;t read?) and that requires looking at maybe 400,000 - 4,000,000 tokens.<p>Similarly there is a place for a word embedding, a sentence embedding, a paragraph embedding, a chapter embedding, a book embedding, etc. but these have to be scalable and obviously the book embedding is bigger but I ought to be able to turn a query into a sentence embedding and somehow match it against larger document embeddings.
评论 #34996446 未加载
评论 #34985585 未加载
RcouF1uZ4gsC大约 2 年前
I don’t trust any report of model performance from papers, unless there is a publicly accessible demo. It is way too easy to test things the model has trained on and for the model to then completely fall flat when used by people in the real world.
评论 #34982128 未加载
naasking大约 2 年前
Another one that looks even more compelling:<p>Multimodal Chain-of-Thought Reasoning in Language Models, <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2302.00923" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2302.00923</a><p>By building in chain of thought and multimodal learning, this 1B parameter model beats GPT-3.5&#x27;s 170B parameter model.
nl大约 2 年前
It&#x27;s worth noting that this is a comparatively small model (1.6B params from memory).<p>It&#x27;ll be interesting what capabilities emerge as they grow that model capacity.
评论 #34982662 未加载
aegistudio大约 2 年前
Hmm... LLMs &#x2F; MLLMs might be truly a unified input &#x2F; output interface of a would-be AGI, I think.
评论 #34981189 未加载
drKarl大约 2 年前
At Microsoft:<p>Hey why don&#x27;t we call our new LLM Cosmos? That&#x27;s taken by the Azure Cosmos DB guys Damn it... how about Kosmos-1 ?
评论 #34985202 未加载
评论 #34985320 未加载
Karellen大约 2 年前
Did anyone else initially read that as `Kosmos~1`, and wonder what the full name of the project was?
xfalcox大约 2 年前
Anyone know if this will be an openly available model?