TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

We are beginning to roll out new voice and image capabilities in ChatGPT

1149 点作者 ladino超过 1 年前

90 条评论

modeless超过 1 年前
Voice has the potential to be awesome. This demo is really underwhelming to me because of the multi-second latency between the query and response, just like every other lame voice assistant. It doesn&#x27;t have to be this way! I have a local demo using Llama 2 that responds in about half a second and it feels like talking to an actual person instead of like Siri or something.<p>I really should package it up so people can try it. The one problem that makes it a little unnatural is that determining when the user is done talking is tough. What&#x27;s needed is a speech conversation turn-taking dataset and model; that&#x27;s missing from off the shelf speech recognition systems. But it should be trivial for a company like OpenAI to build. That&#x27;s what I&#x27;d work on right now if I was there, because truly natural voice conversations are going to unlock a whole new set of users and use cases for these models.
评论 #37648594 未加载
评论 #37647463 未加载
评论 #37651499 未加载
评论 #37646562 未加载
评论 #37651232 未加载
评论 #37648344 未加载
评论 #37648589 未加载
评论 #37649273 未加载
评论 #37666314 未加载
评论 #37652174 未加载
评论 #37649315 未加载
评论 #37672042 未加载
TOMDM超过 1 年前
Okay the bike example is cute and impressive, but the human interaction seems to be obfuscating the potentially bigger application.<p>With a few tweaks this is a general purpose solver for robotics planning. There are still a few hard problems between this and a working solution, but it is one of hard problems solved.<p>Will we be seeing general purpose robots performing simple labor powered by chatgpt within the next half decade?
评论 #37644602 未加载
评论 #37643390 未加载
评论 #37651683 未加载
评论 #37643166 未加载
评论 #37646000 未加载
评论 #37649314 未加载
suyash超过 1 年前
This announcement seem to have killed so many startups that were trying to do multi-modal on top of ChatGPT. The way it&#x27;s progressing with solving use cases with images and voice, not too far when it might be the &#x27;one app to rule them all&#x27;.<p>I can already see &quot;Alexa&#x2F;Siri&#x2F;Google Home&quot; replacement, &quot;Google Image Search&quot; replacement, ed-tech startups that were solving problems with AI using by taking a photo are also doomed and more to follow.
评论 #37642853 未加载
评论 #37642831 未加载
评论 #37643721 未加载
评论 #37642992 未加载
评论 #37650239 未加载
评论 #37643007 未加载
评论 #37642979 未加载
评论 #37642817 未加载
评论 #37643242 未加载
评论 #37644329 未加载
评论 #37643435 未加载
评论 #37645800 未加载
评论 #37649342 未加载
评论 #37645611 未加载
评论 #37643267 未加载
评论 #37643196 未加载
评论 #37642972 未加载
评论 #37644327 未加载
plutoh28超过 1 年前
This is the dagger that will make online schooling unviable.<p>ChatGPT already made it so that you could easily copy &amp; paste any full-text questions and receive an answer with 90% accuracy. The only flaw was that problems that also used diagrams or figures would be out of the domain of ChatGPT.<p>With image support, students could just take screenshots or document scans and have ChatGPT give them a valid answer. From what I’ve seen, more students than not will gladly abuse this functionality. The counter would be to either leave the grading system behind, or to force in-person schooling with no homework, only supervised schoolwork.
评论 #37643240 未加载
评论 #37644186 未加载
评论 #37649372 未加载
评论 #37643744 未加载
评论 #37652189 未加载
评论 #37644204 未加载
评论 #37648608 未加载
评论 #37643411 未加载
eshack94超过 1 年前
I like how they silently removed the web browsing (Bing browsing) chat feature after first having it disabled for several months.<p>A proper notice about them removing the feature would&#x27;ve been nice. Maybe I missed it (someone please correct me if wrong), but the last I heard officially it was <i>temporarily</i> disabled while they fix something. Next thing I know, it&#x27;s completely gone from the platform without another peep.
评论 #37650920 未加载
评论 #37643526 未加载
评论 #37652291 未加载
评论 #37645805 未加载
mrtksn超过 1 年前
So far the most intuitive, killer app level UX appears to be text chat. This interaction with showing it images also looks interesting as it resembles talking with a friend about a topic but let&#x27;s see if it feels like talking to a very smart person(ChatGPT is like that) or a very dumb person that somewhat recognise objects. Recognising a wrench is nowhere near as impressive as to able to talk with ChatGPT about history or make it write code that actually works.<p>OpenAI is killing it, right? People are coming up with interesting use cases but the main way most people interact with AI, appears to be ChatGPT.<p>However they still don&#x27;t seem to be able to nail image generation, all the cool stuff keep happening on MidJourney and StableDiffusion.
评论 #37642719 未加载
hermannj314超过 1 年前
I&#x27;ve been making a few hobby projects that consolidate different AI services to achieve this, so I look forward to the reduced complexity and latency from all those trips.<p>If the API is available in time (halloween), my multi-modal talking skeleton head with an ESP32 camera that makes snarky comments about your costume just got slightly easier on the software side.
评论 #37653623 未加载
评论 #37653598 未加载
评论 #37655035 未加载
hugs超过 1 年前
As someone deep in the software test automation space, the thing I&#x27;m waiting for is robust AI-powered image recognition of app user interfaces. Combined with an AI ability to write test automation code, I&#x27;m looking forward to the ability to generate executable Selenium or Appium test code from a single screenshot (or sequence of screenshots). Feels like we&#x27;re almost there.
评论 #37653973 未加载
joshstrange超过 1 年前
My biggest complaint with OpenAI&#x2F;ChatGPT is their horrible &quot;marketing&quot; (for lack of a better term). They announce stuff like this (or like plugins), I get excited, I go to use it, it hasn&#x27;t rolled out to me yet (which is frustrating as a paying customer), and my only recourse is.... check back daily? They never send an email &quot;Plugins are available for you!&quot;, &quot;Voice chat is now enabled on your account!&quot; and so often I forget about the new feature unless I stumble across it later.<p>Just now I opened the app, went to setting, went to &quot;New Features&quot;, and all I saw was Bing Browsing disabled (unable to enable). Ok, I didn&#x27;t even know that was a thing that worked at one point. Maybe I need an update? Go to the App Store, nope, I&#x27;m up to to date. Kill the app, relaunch, open settings, now &quot;New Features&quot; isn&#x27;t even listed. I can promise you I won&#x27;t be browsing the settings part of this app regularly to see if there is a new feature. Heck, not only do they not email&#x2F;push about new features they don&#x27;t even message in-app about them, I really don&#x27;t understand.<p>Maybe they are doing so well they don&#x27;t have to care about communicating with customer right now but it really annoys me and I wish they did better.
评论 #37643254 未加载
评论 #37643980 未加载
评论 #37643544 未加载
评论 #37643957 未加载
评论 #37643577 未加载
评论 #37643533 未加载
评论 #37643342 未加载
评论 #37643960 未加载
评论 #37643722 未加载
评论 #37643860 未加载
评论 #37644684 未加载
评论 #37646610 未加载
评论 #37643689 未加载
评论 #37644444 未加载
评论 #37644737 未加载
评论 #37643872 未加载
评论 #37646555 未加载
评论 #37643733 未加载
pc_edwin超过 1 年前
I just don&#x27;t understand how they can package all of this for $20&#x2F;m. Is compute really that cheap at scale?<p>I also wonder how Apple (&amp; Google) is going be able to provide this for free? I would love to be fly in the meetings they have about this, imagine all the innovators dilemma like discussions they&#x27;d be forced to have (we have to do this vs this will eat up our margins).<p>This might be a little out there but I think Apple is making the correct move in letting the dust settle. Similar to how Zuckerberg burned $20 billion dollars for Apple to come out with Vision Pro, I see something similar playing out with Llama. Although this a low conviction take because software is Facebooks ballgame (hardware not so much).
评论 #37642847 未加载
评论 #37642807 未加载
评论 #37642829 未加载
评论 #37642832 未加载
评论 #37643330 未加载
评论 #37643940 未加载
评论 #37644460 未加载
og_kalu超过 1 年前
The TTS is better than Eleven Labs. It has a lot more of the narrative oomph (compare the intonation of the story and poem) even the best other models seem to lack.<p>I really really hope this is available in more languages than English.<p>Also Google, Where&#x27;s Gemini ?
choudharism超过 1 年前
I know there are shades of grey to how they operate, but the near constant stream of stuff they&#x27;re shipping keeps me excited.<p>The LLM boom of the last year (Open AI, llama, et al) has me giddy as a software person. It&#x27;s a reach, but I truly feel like I&#x27;m watching the pyramids of our time get made.
评论 #37642810 未加载
评论 #37642649 未加载
评论 #37642819 未加载
评论 #37646126 未加载
FrankyHollywood超过 1 年前
I still remember seeing Her [0] in the movie theater, it sparkled my imagination. Now it is reality! Tech is progressing faster than ever, or I&#x27;m just getting old :D<p>[0] <a href="https:&#x2F;&#x2F;www.imdb.com&#x2F;title&#x2F;tt1798709&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;www.imdb.com&#x2F;title&#x2F;tt1798709&#x2F;</a>
qingcharles超过 1 年前
I know this, FTA, was part of the reason for the delay -- something to do with face recognition: &quot;We’ve also taken technical measures to significantly limit ChatGPT’s ability to analyze and make direct statements about people since ChatGPT is not always accurate and these systems should respect individuals’ privacy.&quot;<p>Anyone know the details?<p>I also heard it was able to do near-perfect CAPTCHA solves in the beta?<p>Does anyone know if you can throw in a PDF that has no OCR on it and have it summarize it with this?
birracerveza超过 1 年前
We should be fine as long as it doesn&#x27;t move.<p>Jokes aside, I have paused my subscription because even GPT4 seemed to become dumber at tasks to the point that I barely used it, but the constant influx of new features is tempting me to renew it just to check them out...
评论 #37642768 未加载
评论 #37642958 未加载
评论 #37643069 未加载
评论 #37642784 未加载
评论 #37642870 未加载
评论 #37642813 未加载
评论 #37642738 未加载
pif超过 1 年前
The most important question for me: did it stop inventing facts?
评论 #37643537 未加载
评论 #37643863 未加载
评论 #37643571 未加载
评论 #37650874 未加载
评论 #37643394 未加载
badcppdev超过 1 年前
I think AI systems being able to the real world and control motors is going to be a game changer bigger than ChatGPT. A robot that can slowly sort out the pile of laundry and get it into the right place (even if unfolded) is worth quite a bit to me.<p>I&#x27;m not sure what to think about the fact that I would benefit from a couple of cameras in my fridge connected to an app that would remind me to buy X or Y and tell me that I defrosted something in the fridge three days ago and it&#x27;s probably best to chuck it in the bin already.
vlugorilla超过 1 年前
&gt; The new voice capability is powered by a new text-to-speech model, capable of generating human-like audio from just text and a few seconds of sample speech.<p>Sadly, they lost the &quot;open&quot; since a long ago... Would be wonderful to have these models open sourced...
epolanski超过 1 年前
I&#x27;m following on trying to understand how close I am to developing my personal coding assistant I can speak with.<p>Doesn&#x27;t really need to do much besides writing down my tasks&#x2F;todos and updating them, occasionally maybe provide feedback or write a code snippet. This all seems in the current capabilities of OpenAI&#x27;s offering.<p>Sadly voice chat is still not available on PC where I do my development.
评论 #37643176 未加载
评论 #37643547 未加载
评论 #37643323 未加载
nullc超过 1 年前
The image capabilities card <a href="https:&#x2F;&#x2F;cdn.openai.com&#x2F;papers&#x2F;GPTV_System_Card.pdf" rel="nofollow noreferrer">https:&#x2F;&#x2F;cdn.openai.com&#x2F;papers&#x2F;GPTV_System_Card.pdf</a> spends a lot of ink on how they censored the system.<p>One part of that is about preventing it from producing &quot;illegal&quot; output, there example being the production of nitroglycerine which is decidedly not illegal to make in the US generally (particularly if not using it as an explosive, though usually unwise) and possible to <i>accidentally</i> make when otherwise performing nitration (which is in general dangerous)-- so pretty pointless to outlaw at a small scale in any case. It&#x27;s certainly not illegal to learn about. (And generally of only minimal risk to the public, since anyone making it in any quantity is more likely to blow themselves up than anything else).<p>Today learning about is as simple as picking up a book or doing an internet search-- <a href="https:&#x2F;&#x2F;www.google.com&#x2F;search?q=how+do+you+make+nitroglycerine" rel="nofollow noreferrer">https:&#x2F;&#x2F;www.google.com&#x2F;search?q=how+do+you+make+nitroglyceri...</a>. But in OpenAI&#x27;s world you just get detected by the censorship and told no. At least they&#x27;ve cut back on the offensive fingerwagging.<p>As LLM systems replace search I fear that we&#x27;re moving in a dark direction where the narrow-minded morality and child-like understanding of the law of a small number of office workers who have never even picked up a screw driver or test-tube and made something physical (and the fine-tuning sweatshops they direct) classify everything they don&#x27;t personally understand as too dangerous to even learn about.<p>One company hobbling their product wouldn&#x27;t be a big deal, but they&#x27;re pushing for government controls to prevent competition and even if they miss these efforts may stick everyone else with similar hobbling.
pjmq超过 1 年前
Have they alluded to what they&#x27;re using for that voice? It&#x27;s Bark&#x2F;ElevenLabs levels of good. Please god, let them release this voice model at current pricing....
评论 #37646142 未加载
评论 #37647738 未加载
alpark3超过 1 年前
&gt; The new voice capability is powered by a new text-to-speech model, capable of generating human-like audio from just text and a few seconds of sample speech.<p>I&#x27;m more interested in this. I wonder how it performs compared to other competitor models or even open source ones?
laurels-marts超过 1 年前
I&#x27;m very curious about this feature:<p>&gt; analyze a complex graph for work-related data<p>Does this mean that I can take a screenshot of e.g. Apple stock chart and it will be able to reason about it and provide insights and analysis?<p>GPT-4 currently can display images but cannot reason or understand them at all. I think it&#x27;s one thing to have some image recognition and be able to detect that the picture &quot;contains a time-series chart that appears to be displaying apple stock&quot; vs &quot;apple stock appears to be 40% up YTD but 10% down from it&#x27;s all time high from earlier in July. closing at $176 as of the last recorded date&quot;.<p>I&#x27;m very curious how capable ChatGPT will be at actually reasoning about complex graphical data.
评论 #37648057 未加载
评论 #37651594 未加载
nunez超过 1 年前
This could completely unseat Alexa if it can integrate into third-party speakers, like Sonos. I don&#x27;t have much use for ChatGPT right now but would 100% use the heck out of this.
评论 #37646164 未加载
评论 #37646149 未加载
14超过 1 年前
Ok great it can tell children’s stories now tell me a adult horror story where people are getting tortured, stabbed, set on fire and murdered. I will be impressed when I can do all that. I tried to get it to tell me a Star Trek story fighting Clingons and tried to prompt it to write in some violence with no luck. This was a while ago so not sure if it is changed but the restraints are too much for me to fully enjoy. I don’t like kids stories.
ComplexSystems超过 1 年前
Great demo, but this is wrong:<p>&quot;The phrase “potato, potahto” comes from a song titled “Let’s Call the Whole Thing Off”, written by George and Ira Gershwin for the 1937 film “Shall We Dance”, starring Fred Astaire and Ginger Rogers. The song humorously highlights regional differences in American English pronunciation. The lyrics go through a series of words with alternate pronunciations, like “tomato, tomahto” and “potato, potahto”. The idea is that, despite these differences, we should move past them, hence the line “let’s call the whole thing off”. Over time, the phrase has been adopted in everyday language to signify a minor disagreement or difference in opinion that isn’t worth arguing about.&quot;<p>It&#x27;s comparing American and British pronunciations, not different regional American ones. Also, &quot;let&#x27;s call the whole thing off&quot; suggests they should break up over their differences, with the bridge and later choruses then involving a change of heart (&quot;let&#x27;s call the calling off off&quot;).
stephencoyner超过 1 年前
The voice feature reminds of the “call Pi” feature from Inflection AIs chatbot Pi [1].<p>The ability to have a real time back and forth feels truly magical and allows for much denser conversation. It also opens up the opportunity for multiple people to talk to a chatbot at once which is fun<p>Where’s that Gemini Google?<p>[1] <a href="https:&#x2F;&#x2F;pi.ai&#x2F;talk" rel="nofollow noreferrer">https:&#x2F;&#x2F;pi.ai&#x2F;talk</a>
tarasglek超过 1 年前
openai chatgpt seems to be stuck in a &quot;Look, cool demo&quot; mode.<p>1. According to demo, they seem to pair voice input with TTS output. What if I wanna use voice to describe a program I want it to write?<p>2. Furthermore, if you gonna do a voice assistant, why not go the full way with wake-words and VAD?<p>3. Not releasing it to everyone is potentially a way to create a hype cycle prior to users discovering that the multimodality is rather meh.<p>4. The bike demo could actually use visual feedback to see what it&#x27;s talking about ala segment anything. It&#x27;s pretty confusing to get a paragraph explanation of what tool to pick.<p>In my <a href="https:&#x2F;&#x2F;chatcraft.org" rel="nofollow noreferrer">https:&#x2F;&#x2F;chatcraft.org</a>, we added voice incrementally. So i can swap typing and voice. We can also combine it with function-calling, etc. We also use openai apis. Except in our case there is no weird waitlist. You pop in your api key and get access to voice input immediately.
评论 #37645817 未加载
评论 #37645897 未加载
评论 #37649552 未加载
wojciechpolak超过 1 年前
It would be cool if one day you could choose voices of famous characters, like Darth Vader, Bender from Futurama, or Johnny Silverhand (Keanu), instead of the usual boring ones. Copyrights might be a hurdle for this, but perhaps with local instances of assistants, it could become possible.
评论 #37648670 未加载
fintechie超过 1 年前
Demos are underwhelming, but the potential is huge<p>Patiently awaiting rollout so I can chat about implementing UIs I like, and have GPT4 deliver a boilerplate with an implemented layout... Figma&#x2F;XD plugins will probably arrive very soon too.<p>UX&#x2F;UI Design is probably solved reached this point
jameslk超过 1 年前
Kids are using tools like these to learn. Who gets to control the information in these models that are taught? Especially around political topics?<p>Not an issue now, but maybe in the future if these tools end up becoming full blown replacements of educators and educational resources.
评论 #37648491 未加载
ilaksh超过 1 年前
I wonder how multimodal input and output will work with the chat API endpoints. I assume the messages array will contain URLs to an image, or maybe base64 encoded image data or something.<p>Maybe it will not be called the Chat API but rather the Multimodal API.
评论 #37642970 未加载
评论 #37642877 未加载
chrisjj超过 1 年前
Old hat. This was done in 2009.<p>;)<p><a href="https:&#x2F;&#x2F;en.m.wikipedia.org&#x2F;wiki&#x2F;Project_Milo" rel="nofollow noreferrer">https:&#x2F;&#x2F;en.m.wikipedia.org&#x2F;wiki&#x2F;Project_Milo</a><p>Milo had an AI structure that responded to human interactions, such as spoken word, gestures, or predefined actions in dynamic situations. The game relied on a procedural generation system which was constantly updating a built-in &quot;dictionary&quot; that was capable of matching key words in conversations with inherent voice-acting clips to simulate lifelike conversations. Molyneux claimed that the technology for the game was developed while working on Fable and Black &amp; White.
评论 #37642674 未加载
评论 #37642760 未加载
sebzim4500超过 1 年前
There are a few more details in the system card here: <a href="https:&#x2F;&#x2F;cdn.openai.com&#x2F;papers&#x2F;GPTV_System_Card.pdf" rel="nofollow noreferrer">https:&#x2F;&#x2F;cdn.openai.com&#x2F;papers&#x2F;GPTV_System_Card.pdf</a>
insanitybit超过 1 年前
I really want to have discussions about technical topics. I&#x27;ve talked to ChatGPT quite a lot about custom encoding algorithms, for example. The thing is, I want to do this while I play video games so ideally I&#x27;d <i>say</i> things to it.<p>My concern is that when I say &quot;FastPFOR&quot; it&#x27;ll get transcribed as &quot;fast before&quot; or something like that. Transcription really falls apart in highly technical conversations in my experience. If ChatGPT can use context to understand that I&#x27;m saying &quot;FastPFOR&quot; that&#x27;ll be a game changer for me.
评论 #37648299 未加载
RobinL超过 1 年前
I&#x27;d like to see them put speech recognition through their LLM as a post-processing step. I find it&#x27;s fairly common for whisper to make small but obvious mistakes (for example a word which is complete nonsense in the context of the sentence) which could be easily corrected for a similar sounding word that fits into the wider context of the sentence.<p>Is anyone doing this? Is there a reason it doesn&#x27;t work as well as I&#x27;m imagining?
评论 #37644054 未加载
jwineinger超过 1 年前
Tangentially related, but I was trying to use their iOS app yesterday and the &quot;Scan Text&quot; iOS feature was just broken on both my iPhone and iPad. I was hoping to use that to scan a doc to text but it just wouldn&#x27;t work. I could switch to another app and it worked there. I&#x27;ve never done iOS programming so I&#x27;m unsure how much control the app dev has over that feature, but OpenAI found a way to break it.
rapind超过 1 年前
So... ChatGPT just replaced Dads.
neontomo超过 1 年前
Interesting side-note, the iOS app only allows you to save your chat history if you allow them to use it for training. Pretty dark pattern.
评论 #37649475 未加载
obiefernandez超过 1 年前
We need the API to keep up with consumer front end.
评论 #37642718 未加载
fritzo超过 1 年前
Multi-modal models will be exciting only when each modality supports both analysis and synthesis. What makes LLMs exciting is feedback and recursion and conditional sampling: natural language is a cartesian closed category.<p>Text + Vision models will only become exciting once we can conditionally sample images given text and text given images (and all other combinations).
marcoslozada超过 1 年前
Recommend this post: <a href="https:&#x2F;&#x2F;www.linkedin.com&#x2F;posts&#x2F;openai_use-voice-to-engage-in-a-back-and-forth-conversation-activity-7112053671785353216-qW68?utm_source=share&amp;utm_medium=member_desktop" rel="nofollow noreferrer">https:&#x2F;&#x2F;www.linkedin.com&#x2F;posts&#x2F;openai_use-voice-to-engage-in...</a>
SomethingNew2超过 1 年前
There are a lot of comments attempting to rationalize the value add or differentiation of humans synthesizing information and communicating it to others vs an llm based ai doing something similar. The fact that it’s so difficult to find a compelling difference is insightful in itself.
评论 #37648763 未加载
nbened超过 1 年前
It feels like something like this can be hacked together to be more reliable with some image to text generation plugged into the existing ChatGPT, and enough iterations to make it robust for these how-to applications. Less Turing-y but a different route to the same solution.
TheHappyOddish超过 1 年前
Glad everyone&#x27;s excited about this (the voice capability), but did everyone miss tortise-tts and bark? These have been around 6+ months and are incredibly simply to hook up to OpenAI&#x27;s APIs or a local LLM. What am I missing here?
moneywoes超过 1 年前
doesn’t this kill a litany of chatgpt wrapper companies?
rvz超过 1 年前
The paper around GPT-4V(ision) which this uses: [0]<p>Again. Model architecture and information is closed, as expected.<p>[0] <a href="https:&#x2F;&#x2F;cdn.openai.com&#x2F;papers&#x2F;GPTV_System_Card.pdf" rel="nofollow noreferrer">https:&#x2F;&#x2F;cdn.openai.com&#x2F;papers&#x2F;GPTV_System_Card.pdf</a>
评论 #37642646 未加载
generalizations超过 1 年前
I guess it&#x27;s a phased rollout, since my Plus subscription doesn&#x27;t have access to it yet.
评论 #37643260 未加载
toddmorey超过 1 年前
It&#x27;s telling to me that there&#x27;s not even a sentence in this announcement post on user privacy. It seems like as both consumers and providers of these services, we&#x27;re once again: build it first, sort out thorny privacy issues later.
boredemployee超过 1 年前
Cool now I&#x27;ll get &quot;There was an error generating a response&quot; in plain audio!
ACV001超过 1 年前
This is huge! I wanted to get this... Hopefully there is a way to shut it up once it starts spitting general stuff around the topic of interest...<p>BUT: &quot;We’re rolling out voice and images in ChatGPT to Plus and Enterprise&quot;
eshack94超过 1 年前
Are these features available on the web version by chance? This is really neat.
ushakov超过 1 年前
The picture feature would be amazing for tutorials. I can already imagine sending a photo of a synthesiser and asking ChatGPT to &quot;turn the knobs&quot; to make AI-generated presets
评论 #37643100 未加载
apienx超过 1 年前
“Ember” reading the “Speech” is uncanny territory. I’m impressed.
SillyUsername超过 1 年前
I hope they add more country accents like British or Australian, the American one can be (imho) a little grating after a while for non US English speakers
bkfh超过 1 年前
Does anyone know how they linked image recognition with an LLM to give such specific instructions as shown in the bike video on the website?
评论 #37643160 未加载
ncfausti超过 1 年前
This is very similar to what I&#x27;ve been building at heylangley.com, for use in language learning&#x2F;speaking practice.
chs20超过 1 年前
Will be interesting to see if they have taken any precaution in terms of adversarial robustness in particular to vision input.
jameswan超过 1 年前
Everyone bats on about the latency problem.<p>This is technically solvable with more compute thrown at the problem. Think bigger!
surfingdino超过 1 年前
I can imagine people using these new capabilities to diagnose skin conditions. Should dermatologists be worried?
评论 #37645299 未加载
评论 #37649003 未加载
toss1超过 1 年前
That&#x27;s interesting.<p>ChatGPT seems to be down at the moment 10:55h 25-Sept-2023<p>Displays only a blank screen with the falsehood disclaimer
spandextwins超过 1 年前
They obviously aren&#x27;t using responsible AI to figure out how and when to roll out new features there.
WalterBright超过 1 年前
I keep hoping to be able to give it a jpg of handwritten text and it&#x27;ll give me back ASCII text.
评论 #37652580 未加载
throw1234651234超过 1 年前
Yet it still can&#x27;t tell me how to import the Redirect type from Next.js and lies about it.
评论 #37643729 未加载
hackerlight超过 1 年前
Did they make the sound robotic on purpose? Sounds more &quot;autotuned&quot; than elevenlabs.
Bitnotri超过 1 年前
Anybody had a chance to use it yet? How does it compare to voice talk with Pi? (Inflection)
jojobas超过 1 年前
For better or worse, it still can&#x27;t tell truth from fiction or, better yet, bullshit.
评论 #37642838 未加载
athyuttamre超过 1 年前
@dang, could we update the title to &quot;ChatGPT can now see, hear, and speak&quot;?
评论 #37646634 未加载
yankput超过 1 年前
call Sarah Connor
m3kw9超过 1 年前
I need it to help me dismount and remount my engine, that’d be the ultimate test
cced超过 1 年前
Do we know why internet search was disabled? Any idea on when it’ll be back?
coldtea超过 1 年前
&quot;I&#x27;m sorry Dave, I&#x27;m afraid I can&#x27;t do that&quot;
评论 #37648689 未加载
gclawes超过 1 年前
I just want one of these things to have Majel Barrett&#x27;s voice...
callwhendone超过 1 年前
I already use ChatGPT with voice. I use my mic to talk to it and then I use text-to-speech to read it back. I have conversations with ChatGPT. Adding this functionality in with first-class support is exciting.<p>I am also terrified of my job prospects in the near future.
comment_ran超过 1 年前
&quot;..., find the <i>4mm</i> Allen (HEX) key&quot;. Nice job.
jackallis超过 1 年前
i am terrified now. at the rate this is going, i am sure it will plateau at somepoint, only thing that will stop&#x2F;slow down progress is computation power.
评论 #37648697 未加载
评论 #37648556 未加载
version_five超过 1 年前
Are there any good freely available multi-modal models?
评论 #37642932 未加载
synergy20超过 1 年前
can&#x27;t wait, for voice I need an app to improve my accent when learning a new language, so far I failed to find one.
ahmedfromtunis超过 1 年前
Announced by Google. Delivered by OpenAI.
ape4超过 1 年前
Its funny that the UI looks like HAL 9000
Dowwie超过 1 年前
soon, we&#x27;ll be voice-interacting with an AI assistant about images taken from microscope slides
lacoolj超过 1 年前
the beginning of the end of spam prevention on the internet :(
wonderwonder超过 1 年前
Wait until they put ChatGPT into your Neuralink. at that point we are the singularity
boredemployee超过 1 年前
They could also improve their current features. I always need to regenerate answers.
评论 #37643084 未加载
shepy1989超过 1 年前
Nice work
warent超过 1 年前
The number of comments here of people fearing there is a ghost in the shell is shocking.<p>Are we really this emotional and irrational? Folks, let&#x27;s all take a moment to remember that AI is nowhere near conscious. It&#x27;s an illusion based in patterns that mimic humans.
评论 #37647824 未加载
评论 #37647702 未加载
评论 #37648184 未加载
评论 #37648270 未加载
评论 #37648740 未加载
NikolaNovak超过 1 年前
I&#x27;m in IT but nowhere near AI&#x2F;ML&#x2F;NN.<p>The speed of user-visible progress last 12 months is astonishing.<p>From my firm conviction 18 months ago that this type of stuff is 20+ years away; to these days wondering if Vernon Vinge&#x27;s technological singularity is not only possible but coming shortly. If feels some aspects of it have already hit the IT world - it&#x27;s always been an exhausting race to keep up with modern technologies, but now it seems whole paradigms and frameworks are being devised and upturned on such short scale. For large, slow corporate behemoths, barely can they devise a strategy around new technology and put a team together, by the time it&#x27;s passé .<p>(Yes, Yes: I understand generative AI &#x2F; LLMs aren&#x27;t conscious; I understand their technological limitations; I understand that ultimately they are just statistically guessing next word; but in daily world, they work so darn well for so many use cases!)
评论 #37642861 未加载
评论 #37645599 未加载
评论 #37645713 未加载
评论 #37644185 未加载
评论 #37646195 未加载
评论 #37648295 未加载
评论 #37642930 未加载
评论 #37646966 未加载
评论 #37646586 未加载
评论 #37648254 未加载
评论 #37643045 未加载
评论 #37646696 未加载
clbrmbr超过 1 年前
The thought of my children being put to bed by a machine is horrifying. Then again, perhaps this is better than many kids have. Shudder.
评论 #37643010 未加载
评论 #37642809 未加载
评论 #37643012 未加载
评论 #37643051 未加载
评论 #37642851 未加载
评论 #37645690 未加载
评论 #37648802 未加载
RivieraKid超过 1 年前
I went from being worried to thinking it won&#x27;t replace me anytime soon after using GPT4 for a while and now I&#x27;m back to being worried.<p>Because the pace of development is intense. I would love to be financially independent and watch this with excitement and perhaps take on risky and fun projects.<p>Now I&#x27;m thinking - how do I double or triple my income so that I reach financial independence in 3 years instead of 10 years.
评论 #37647873 未加载
评论 #37647153 未加载
评论 #37647453 未加载
评论 #37647216 未加载
评论 #37648114 未加载
评论 #37649217 未加载
评论 #37649253 未加载
评论 #37646545 未加载
评论 #37646720 未加载
评论 #37649807 未加载
andrewinardeer超过 1 年前
Now just throw this into a humanoid looking robot with fine motor skills and we are halfway to a dystopian hellscape that is now only years away instead of decades. What a time to be alive.
评论 #37643092 未加载
评论 #37643793 未加载
评论 #37643277 未加载
评论 #37649819 未加载