I Spent a Week with Gemini Pro 1.5–It's Fantastic

300 点作者 dshipper大约 1 年前

34 条评论

criddell大约 1 年前

I kind of love the idea of feeding the text of entire books to an AI. More often than I’d like to admit, I’ll be reading a novel and find myself not remembering who some character is. I’d love to be able to highlight a name in my ereader and it would see that I’m 85 pages into Neuromancer and give me an answer based on that (ie no spoilers).Or have a textbook that I can get some help and hints while working through problems and get stuck like you might get with a good study partner.

评论 #39482169 未加载

评论 #39482054 未加载

评论 #39487840 未加载

评论 #39482972 未加载

评论 #39482124 未加载

评论 #39487769 未加载

评论 #39482206 未加载

评论 #39487159 未加载

评论 #39484366 未加载

评论 #39482857 未加载

评论 #39482591 未加载

Eliezer大约 1 年前

This is a slightly strange article to read if you happen to be Eliezer Yudkowsky. Just saying.

评论 #39490662 未加载

评论 #39490252 未加载

评论 #39489258 未加载

评论 #39500655 未加载

评论 #39489153 未加载

评论 #39489357 未加载

评论 #39489109 未加载

jeffbee大约 1 年前

How do people get comfortable assuming that these chat bots have not hallucinated? I do not have access to the most advanced Gemini model but using the one I do have access to I fed it a 110-page PDF of a campaign finance report and asked it to identify the 5 largest donors to the candidate committee ... basically a task I probably could have done with a normal machine vision/OCR approach but I wanted to have a little fun. Gemini produced a nice little table with names on the left and aggregate sums on the right, where it had simply invented all of the cells. None of the names were anywhere in the PDF, all the numbers were made up. So what signals do people look for indicating that any level of success has been achieved? How does anyone take a large result at face value if they can't individually verify every aspect of it?

评论 #39488677 未加载

评论 #39498343 未加载

评论 #39489541 未加载

评论 #39551746 未加载

rkangel大约 1 年前

This is exactly the sort of article that I want to read about this sort of topic.:* Written with concrete examples of their points* Provides balance and caveats* Declares their own interest (e.g. "LlamaIndex (where I’m an investor)")

评论 #39488500 未加载

kromem大约 1 年前

I'm most excited at what this is going to look like not by abandoning RAG but by pairing it with these massive context windows.If you can parse an entire book to identify relevant chunks using RAG and can fit an entire book into a context window, that means you can fit relevant chunks from an entire reference library into the context window too.And that is very promising.

评论 #39487109 未加载

评论 #39489163 未加载

评论 #39489011 未加载

评论 #39489635 未加载

评论 #39489913 未加载

评论 #39488634 未加载

dunefox大约 1 年前

Can I be sure that Gemini doesn't alter any facts contained in a book I pass it due to Googles identity politics? What if I pass it a "problematic" book? Does it adapt the content? For me, it's completely useless due to this fact.

评论 #39492180 未加载

og_kalu大约 1 年前

Yeah. A few people on X have had access for a couple days now. The conclusion is that it's a genuine context window advance, not just length, but utilization. It genuinely utilizes long context much better than other models. Shame they didn't share what led to that.

评论 #39489586 未加载

wkat4242大约 1 年前

Wouldn't that cost a fortune? If I feed the maximum into gpt-4 it will already cost $1.28 per interaction! Or is Gemini that much cheaper too?

评论 #39482776 未加载

评论 #39489201 未加载

4bpp大约 1 年前

I imagine the folks over at NSA must be rubbing their hands over the possibilities this will open up for querying the data they have been diligently storing over the years.

评论 #39482416 未加载

评论 #39482889 未加载

评论 #39487056 未加载

评论 #39482920 未加载

tr3ntg大约 1 年前

> These models often perform differently (read: worse) when they are released publicly, and we don’t know how Gemini will perform when it’s tasked with operating at Google scale.I seriously hope Google learns from ChatGPT's ever-degrading reputation and finds a way to prioritize keeping the model operating at peak performance. Whether it's limiting access, raising the price, or both, I really want to have this high quality of an experience with the model when it's released publicly.

评论 #39489622 未加载

评论 #39487610 未加载

emporas大约 1 年前

>" While Gemini Pro 1.5 is comfortably consuming entire works of rationalist doomer fanfiction, GPT-4 Turbo can only accept 128,000 tokens."A.I. Doomers will soon witness their arguments fed into the machine, generating counter-arguments automatically for 1000 books at a time. They will need to incorporate a more and more powerful A.I. into their workflow to catch up.

评论 #39551782 未加载

评论 #39489967 未加载

simpaticoder大约 1 年前

>It read a whole codebase and suggested a place to insert a new feature—with sample code.I'm hopeful that this is going to be more like the invention of the drum machine (which did not eliminate drummers) and less like the invention of the car (which did eliminate carriages).

评论 #39489869 未加载

评论 #39489232 未加载

评论 #39489843 未加载

Aeolun大约 1 年前

I think it’s a bit disturbing that the author gets an answer that is entirely made up from the model, even goes so far as to publish it in an article, but still says it’s all so great.

评论 #39489686 未加载

评论 #39488020 未加载

评论 #39496994 未加载

platelminto大约 1 年前

GPT-4 Turbo has a context window of 128k tokens, not 32k as the article says.

评论 #39482378 未加载

评论 #39482057 未加载

jgalt212大约 1 年前

> (This is not the same as the publicly available version of Gemini that made headlines for refusing to create pictures of white people. That will be forgotten in a week;Maybe so, but I'm not convinced the guardrails problem will ever be sufficiently solved.

croes大约 1 年前

I'm a bit worried about the resource consumption of all these AIs. Could it be that the mass of AIs that are now being created are driving climate change and in return we are mainly getting more text summaries and cat pictures?

评论 #39482687 未加载

评论 #39482706 未加载

评论 #39482593 未加载

评论 #39482639 未加载

评论 #39492194 未加载

评论 #39489490 未加载

aantix大约 1 年前

Does the model feel performant because it’s not under any serious production load?

评论 #39482650 未加载

jiggawatts大约 1 年前

These huge context sizes will need new API designs. What I’d like to see is a “dockerfile” style setup where I can layer things on top of a large base context without having to resubmit (and recompute!) anything.E.g.: have a cached state with a bunch of requirements documents, then a layer with the stable files in the codebase, then a layer with the current file, and then finally a layer asking specific questions.I can imagine something like this being the future, otherwise we’ll have to build a Dyson sphere to power the AIs…

dynamite-ready大约 1 年前

Being able to feasibly feed it a whole project codebase in one 'prompt' could now make these new generation of code completion tools worthwhile. I've found them to be of limited value so far, because they're never aware of the context of proposed changes.With Gemini though, the idea of feeding in the current file, class, package, project, and perhaps even dependencies into a query, can potentially lead to some enlightening outputs.

eesmith大约 1 年前

> I wanted an anecdote to open the essay with, so I asked Gemini to find one in my reading highlights. It came up with something perfect:Can someone verify that anecdote is true? Here is what the image contains:> From The Publisher: In the early days of Time magazine, co-founder Henry Luce was responsible for both the editorial and business sides of the operation. He was a brilliant editor, but he had little experience or interest in business. As a result, he often found himself overwhelmed with work. One day, his colleague Briton Hadden said to him, "Harry, you're trying to do everything yourself. You need to delegate more." Luce replied, "But I can do it all myself, and I can do it better than anyone else." Hadden shook his head and said, "That's not the point. The point is to build an organization that can do things without you. You're not going to be able to run this magazine forever."That citation appears to be "The Publisher : Henry Luce and his American century".The book is available at archive.org as searchable text returning snippets, at <a href="https://archive.org/details/publisherhenrylu0000brin_o9p4/" rel="nofollow">https://archive.org/details/publisherhenrylu0000brin_o9p4/</a>Search is unable to find the word "delegate" in the book. The six matches for "forever" are not relevant. The matches for "overwhelmed" are not relevant.A search for Hadden finds no anecdote like the above. The closest are on page 104, <a href="https://archive.org/details/publisherhenrylu0000brin_o9p4/page/104/mode/2up?q=Hadden" rel="nofollow">https://archive.org/details/publisherhenrylu0000brin_o9p4/pa...</a> :"""For Harry the last weeks of 1922 were doubly stressful. Not only was he working with Hadden to shape the content of the magazine, he was also working more or less alone to ensure that Time would be able to function as a business. This was an area of the enterprise in which Hadden took almost no interest and for which he had little talent. Luce, however, proved to be a very good businessman, somewhat to his dismay—since, like Brit, his original interest in “the paper” had been primarily editorial. (“Now the Bratch is really the editor of TIME,” he wrote, “and I, alas, alas, alas, am business manager. . .. Of course no one but Brit and I know this!”) He negotiated contracts with paper suppliers and printers. He contracted out the advertising. He supervised the budget. He set salaries and terms for employees. He supervised the setting up of the office. And whenever he could, he sat with Brit and marked up copy or discussed plans for the next issue."""That sounds like delegation to me and decent at business and not doing much work as an editor.There's also the anecdote on page 141 at <a href="https://archive.org/details/publisherhenrylu0000brin_o9p4/page/140/mode/2up?q=Hadden" rel="nofollow">https://archive.org/details/publisherhenrylu0000brin_o9p4/pa...</a> :"""In the meantime Luce threw himself into the editing of Time. He was a more efficient and organized editor than Hadden. He created a schedule for writers and editors, held regular meetings, had an organized staff critique of each issue every week. (“Don’t hesitate to flay a fellow-worker’s work. Occasionally submit an idea,” he wrote.) He was also calmer and less erratic. Despite the intense loyalty Hadden inspired among members of his staff, some editors and writers apparently preferred Luce to his explosive partner; others missed the energy and inspiration that Hadden had brought to the newsroom. In any case the magazine itself—whose staff was so firmly molded by Hadden’s style and tastes—was not noticeably different under Luce’s editorship than it had been under Hadden’s. And just as Hadden, the publisher, moonlighted as an editor, so Luce, now the editor, found himself moonlighting as publisher, both because he was so invested in the business operations of the company that he could not easily give them up, and also because he felt it necessary to compensate for Hadden’s inattention.”"""Again, it doesn't seem to match the summary from Gemini.Does someone here have better luck than I on verifying the accuracy of the anecdote? Because so far it does not seem valid.

评论 #39487269 未加载

评论 #39483825 未加载

p1dda大约 1 年前

"I got access to Gemini Pro 1.5 this week, a new private beta LLM from Google that is significantly better than previous models the company has released. (This is not the same as the publicly available version of Gemini that made headlines for refusing to create pictures of white people. That will be forgotten in a week; this will be relevant for months and years to come.)"Wow, I already hate Gemini after reading this first paragraph.

next_xibalba大约 1 年前

It is hard to imagine Gemini Pro being useful given the truly bizarre biases and neutering introduced by the Google team in the free version of Gemini.

评论 #39482940 未加载

评论 #39483030 未加载

评论 #39488178 未加载

neolefty大约 1 年前

How does it scale to such a large context window — is it publicly known, or is there some high-quality speculation out there that you recommend?

评论 #39482235 未加载

评论 #39482209 未加载

评论 #39482736 未加载

评论 #39485547 未加载

hersko大约 1 年前

> This is not the same as the publicly available version of Gemini that made headlines for refusing to create pictures of white people. That will be forgotten in a week; this will be relevant for months and years to come.I cannot disagree with this more strongly. The image issue is just indicative of the much larger issue where Google's far left DEI policies are infusing their products. This is blatantly obvious with the ridiculous image issues, but the problem is that their search is probably similarly compromised and is much less obvious with far more dire consequences.

评论 #39482950 未加载

评论 #39486917 未加载

评论 #39482937 未加载

评论 #39487852 未加载

评论 #39482977 未加载

Solvency大约 1 年前

How can Google so thoroughly embarrass themselves on the image front and then do well on text?

评论 #39482837 未加载

评论 #39482764 未加载

评论 #39482731 未加载

评论 #39483199 未加载

Sakos大约 1 年前

I love the potential of having such a big context window, but I'm concerned about who will get access to it (or rather who won't get access to it) and what it will cost or who will pay for it.

评论 #39482029 未加载

karmasimida大约 1 年前

I think the retrieval is still going to be important.What is not important is RAG. You can retrieval a lot of documents in full length, not need to do all these chunking/splitting, etc.

评论 #39487043 未加载

hackerlight大约 1 年前

> Second, Gemini is pretty slow. Many requests took a minute or more to return, so it’s not a drop-in replacement for every LLM use case.

kderbyma大约 1 年前

Google Made it?....Nah...I'll wait. They can't even do search anymore....unless I'm looking for ads...haha

coldtea大约 1 年前

>That will be forgotten in a week; this will be relevant for months and years to come.Or, you know, until next month or so, when OpenAI bumps their offer

pickledish大约 1 年前

> This is about enough to accept Peter Singer’s comparatively slim 354-page volume Animal Liberation, one of the founding texts of the effective altruism movement.What? I might be confused, is this a joke I don't get, or is there some connection between this book and EA that I haven't heard of?

评论 #39482785 未加载

评论 #39482880 未加载

lukasb大约 1 年前

Is anyone else disappointed with Gemini Ultra for coding? It just makes basic mistakes too often.

animanoir大约 1 年前

I tested it too—no, it sucks.

gnarlouse大约 1 年前

Is anybody else getting seriously depressed at the rate of advancement of AI? Why do we believe for a second that we’re actually going to be on the receiving end of any of this innovation?

评论 #39489503 未加载

评论 #39489276 未加载

评论 #39490010 未加载

评论 #39489286 未加载

评论 #39489496 未加载