TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Gemini 1.5: Unlocking multimodal understanding across token context [pdf]

1 pointsby lawrenceyanover 1 year ago

1 comment

lawrenceyanover 1 year ago
&gt; Our extensive evaluations with diagnostic and realistic multi-modal long-context benchmarks show that 1.5 Pro is able to maintain near-perfect recall on multi-modal versions of needle-in-a-haystack (see Section 4.2.1.2) and is able to effectively use its context to retrieve and reason over large amounts of data. This enables the model to perform realistic long-context tasks such as long-document QA from 700k-word material and long-video QA from 40 to 105 minutes long videos. Finally, 1.5. Pro has the ability to use in-context learn to translate from English to Kalamang, an extremely low resource language with fewer than 200 speakers (Visser, 2020b). This capability is achieved solely by providing a grammar manual in its context at inference time, which demonstrates the Gemini 1.5 Pro’s remarkable ability to in-context learn from information it has never seen before at training time.<p>Very impressive. Almost total recall within a 1 million token context.