TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Inflection-2: the next step up

78 pointsby jeffdnover 1 year ago

13 comments

josefrescoover 1 year ago
Just got done chatting with Pi for the first time.<p>I asked some very softball questions about it&#x27;s &quot;creators&quot; aka Inflection AI and it would not provide <i>any</i> information. It told me it was prevented from discussing sensitive company information. I then used Google&#x2F;Wikipedia to learn the information I requested, and pasted the results into Pi, after which Pi made limited comments on the founders.<p>I then moved into questioning why this was blocked and why these blocks were not publicly disclosed given that Inflection AI is a &quot;public benefit corporation&quot;.<p>I didn&#x27;t learn much, and generally speaking Pi &quot;agreed&quot; with me but still I could not get it to budge.<p>I understand a block on &quot;sensitive&quot; information, but these &quot;hard coded&quot; limits should be publicly disclosed (correct me if I&#x27;m wrong) otherwise I won&#x27;t trust the tool.
评论 #38385103 未加载
ChildOfChaosover 1 year ago
I quite liked Pi to be honest, it was good to use as a journaling tool and to talk through life and issues when i used it. I just feel sometimes it&#x27;s slightly generic with it&#x27;s answers, but then all the LLM&#x27;s have been when i have used them for this purpose.<p>It felt like a more personal AI, rather than using it to try and code or solve problems, it worked well as kinda a personal guide to talk through life with.<p>Maybe ChatGPT could be good at this with the right prompt and creating a GPT for it, but I am currently on the waitlist for ChatGPT plus and Pi is free, so a bump in performance might still be welcome.
评论 #38381637 未加载
simonhughes22over 1 year ago
The model is bad at hallucinating despite their claims. See the first prompt i tried here: <a href="https:&#x2F;&#x2F;twitter.com&#x2F;hughes_meister&#x2F;status&#x2F;1727400689738162589" rel="nofollow noreferrer">https:&#x2F;&#x2F;twitter.com&#x2F;hughes_meister&#x2F;status&#x2F;172740068973816258...</a>
评论 #38383561 未加载
Philpaxover 1 year ago
Is it me or do they not mention the size of the model at all? Pretty hard to compare it with other models when we don&#x27;t know what weight class it&#x27;s in...
评论 #38383876 未加载
评论 #38383334 未加载
评论 #38383319 未加载
chandureddyvariover 1 year ago
I had lot of fun chatting with Pi. After some poking around to give it’s “system prompt”(long when it came back and prompt injection was a cool thing)., it said it was using some conversational frameworks like Grice’s principle etc. I tried to recreate one in GPT store. I call it Tara. Try it here- <a href="https:&#x2F;&#x2F;chat.openai.com&#x2F;g&#x2F;g-mI1QatRrc-tara" rel="nofollow noreferrer">https:&#x2F;&#x2F;chat.openai.com&#x2F;g&#x2F;g-mI1QatRrc-tara</a>.
maxrmkover 1 year ago
From the press release: &quot;Before Inflection-2 is released on Pi, it will undergo a series of alignment steps to become a helpful and safe personal AI.&quot;<p>I wonder how the post-alignment will perform compared to Claude-2 (which is presumably post alignment), since those processes tend to cause a bit of a performance hit. We&#x27;ll have to see if it retains that coveted 2nd place spot.<p>If they didn&#x27;t account for this, it seems like an unfair comparison.
ianbickingover 1 year ago
I realize part of why Sam Altman feels so important in OpenAI is because he invited everyone to come along. You can build something pretty close to chat.openai.com on the APIs, and their APIs continue to grow as their product grows. I think that&#x27;s closely related to his leadership.<p>This Inflection announcement feels kind of like your neighbor showing you his cool new Jaguar. Can I even take it for a test drive? Well, no... but he&#x27;ll take me out in a drive in a while, you betcha. &quot;Yeah, cool model you have there,&quot; but I&#x27;m eyeing the exit.<p>This is the other side of the &quot;commercial vs mission&quot; argument. Doing commercial activity is the only way to be inclusive. Except open source... but even there it&#x27;s not a clear call. And writing papers touting your achievements is... kind of narcissistic?
评论 #38382383 未加载
intellectronicaover 1 year ago
Inflection: the no-drama AI company :D<p>Pi is great for a &quot;personal&quot; chat. I can&#x27;t wait to use it with the new model.
simonhughes22over 1 year ago
This is just typical of so much work in the field. They pick and choose which models to compare against and on which benchmarks. If this model was truly great, they would be comparing against Claude 2 and GPT4 across a bunch of different benchmarks. Instead they compare against Palm 2, which in a lot of tests is a weak model (<a href="https:&#x2F;&#x2F;venturebeat.com&#x2F;ai&#x2F;google-bard-fails-to-deliver-on-its-promise-even-after-latest-updates&#x2F;#:~:text=The%20crux%20of%20the%20problem,content%20it%20has%20been%20fed" rel="nofollow noreferrer">https:&#x2F;&#x2F;venturebeat.com&#x2F;ai&#x2F;google-bard-fails-to-deliver-on-i...</a>.) and prone to hallucination (<a href="https:&#x2F;&#x2F;github.com&#x2F;vectara&#x2F;hallucination-leaderboard">https:&#x2F;&#x2F;github.com&#x2F;vectara&#x2F;hallucination-leaderboard</a>).
tikkunover 1 year ago
Regular reminder that most open source LLM benchmarks are not very useful (in the sense that they don&#x27;t represent day to day ai chatbot usage and what users care about). If you haven&#x27;t looked through the datasets to see what they actually contain, I&#x27;d encourage you to do so. [1] I think we&#x27;re just in a strange suboptimal schelling point of sorts, where people report their scores on those benchmarks because they think other people care about those sort of benchmarks, and therefore those benchmarks are the ones that people expect and care about.<p>And to recap their statement about it being second most powerful, it&#x27;s based on MMLU scores, which IMO is a non-useful comparison. (Also, doesn&#x27;t test against GPT-4-Turbo or Claude-long-2.1)<p>What they&#x27;re saying is that Inflection-2 ranks #2 relative to other models including GPT-4, Claude-2, PaLM 2, Grok-1, and Llama 2 70b, specifically on MMLU scores.<p>This model could be great, but that&#x27;ll be determined by &quot;do day to day users, both free and paying, prefer it over Claude 2 and GPT-4-Turbo&quot; - not MMLU scores.<p>[1]: <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;datasets&#x2F;lukaemon&#x2F;mmlu&#x2F;viewer&#x2F;abstract_algebra&#x2F;test" rel="nofollow noreferrer">https:&#x2F;&#x2F;huggingface.co&#x2F;datasets&#x2F;lukaemon&#x2F;mmlu&#x2F;viewer&#x2F;abstrac...</a>
评论 #38381726 未加载
blovescoffeeover 1 year ago
Their byline is &quot;the second most capable LLM in the world today.&quot; Ok thanks for the heads up, I&#x27;ll go use the first most capable LLM while Inflection catches up... Their press release shows this model well behind gpt-4. There&#x27;s currently no non-beta API. I&#x27;m just not sure who this is for.
评论 #38381677 未加载
评论 #38382641 未加载
xianshouover 1 year ago
First reaction: okay, should we be impressed?<p>June: flashy ML-perf demo with CoreWeave using 22k H100s<p>November: actually trained model using 5k H100s<p>June: we claim Inflection-1 is the best model in its compute class and are preparing a frontier model<p>November: we beat PaLM 2, which everyone else forgot about long ago anyway<p>Inflection got a ton of hype with its $1.3B raise (likely not cash but principally GPU compute credits) earlier this year, but now is starting to look like the next victim of inflated expectations.
评论 #38381717 未加载
dreamcompilerover 1 year ago
&quot;Our mission at Inflection is to create a personal AI for everyone.&quot;<p>&quot;By messaging Pi, you are agreeing to our Terms of Service and Privacy Policy.&quot;<p>Yeah, no. Any AI that operates in <i>your</i> cloud where I have to agree to <i>your</i> Terms of Service and Privacy Policy is not &quot;personal AI,&quot; no matter how much you want me to believe otherwise.<p>Modern LLMs can do inferencing on my own personal computer. Some of them can even do it on a Raspberry Pi (no pun intended). That&#x27;s &quot;personal AI.&quot;<p>So thus I have to wonder why you insist that I use this in your cloud rather than just downloading an app that works completely offline? Especially if you&#x27;re gonna call it &quot;personal AI.&quot;
评论 #38382302 未加载
评论 #38381700 未加载