Show HN: I made the slowest, most expensive GPT

74 点作者 wluk5 个月前

This is another one of my automate-my-life projects - I'm constantly asking the same question to different AIs since there's always the hope of getting a better answer somewhere else. Maybe ChatGPT's answer is too short, so I ask Perplexity. But I realize that's hallucinated, so I try Gemini. That answer sounds right, but I cross-reference with Claude just to make sure.This doesn't really apply to math/coding (where o1 or Gemini can probably one-shot an excellent response), but more to online search, where information is more fluid and there's no "right" search engine + text restructuring + model combination every time. Even o1 doesn't have online search, so it's obviously a hard problem to solve.An example is something like "best ski resorts in the US", which will get a different response from every GPT, but most of their rankings won't reflect actual skiers' consensus - say, on Reddit <a href="https://www.reddit.com/r/skiing/comments/sew297/updated_us_ski_areas_tier_list_v3_128_please" rel="nofollow">https://www.reddit.com/r/skiing/comments/sew297/updated_us_s...</a> - because there's so many opinions floating around, a one-shot RAG search + LLM isn't going to have enough context to find how everyone thinks. And obviously, offline GPTs like o1 and Sonnet/Haiku aren't going to have the latest updates if a resort closes for example.So I’ve spent the last few months experimenting with a new project that's basically the most expensive GPT I’ll ever run. It runs search queries through ChatGPT, Claude, Grok, Perplexity, Gemini, etc., then aggregates the responses. For added financial tragedy, in-between it also uses multiple embedding models and performs iterative RAG searches through different search engines. This all functions as sort of like one giant AI brain. So I pay for every search, then every embedding, then every intermediary LLM input/output, then the final LLM input/output. On average it costs about 10 to 30 cents per search. It's also extremely slow.<a href="https://ithy.com" rel="nofollow">https://ithy.com</a>I know that sounds absurdly overkill, but that’s kind of the point. The goal is to get the most accurate and comprehensive answer possible, because it's been vetted by a bunch of different AIs, each sourcing from different buckets of websites. Context limits today are just large enough that this type of search and cross-model iteration is possible, where we can determine the "overlap" between a diverse set of text to determine some sort of consensus. The idea is to get online answers that aren't attainable from any single AI. If you end up trying this out, I'd recommend comparing Ithy's output against the other GPTs to see the difference.It's going to cost me a fortune to run this project (I'll probably keep it online for a month or two), but I see it as an exploration of what’s possible with today’s model APIs, rather than something that’s immediately practical. Think of it as an online o1 (without the $200/month price tag, though I'm offering a $29/month Pro plan to help subsidize). If nothing else, it’s a fun (and pricey) thought experiment.

26 条评论

wluk5 个月前

Update 3:00 PM ET: I've finished scaling up from 2 VPCs to 5 VPCs. Limits have been increased back up to 3 anonymous / 10 signed-in.Update 2:30 PM ET: Back up (for now). Still waiting for Anthropic and Gemini quota increase requests, so those have been migrated to GPT-4o for now. Running on 2 VPCs, in the process of launching 2 more. Confident that I can increase the daily limits by EOD once everything's more stable.Update 1:30 PM ET: HN blew up Ithy and it's 99% down right now, congrats ._.1. I've exceeded my weekly Anthropic API limits; I've gotten in touch with their sales team and I've temporarily disabled the Anthropic model.2. Blew past my Google API limits as well. I was using Gemini for prompting and aggregation, and I'm waiting for their quota increase response. In the meantime, I've switched to GPT-4o for the prompting/aggregation.3. My VPC is at 100% CPU load. Launching more right now with some load balancing.4. Limits were previously 5 anon / 20 per logged in user. Reduced this to 1 anon / 3 logged in while I deal with the load issues. Planning bring these back up as soon as everything's working again.Hope to get this all back online within an hour or two. Sorry for the crappy launch. To think I was an SRE in a past life...

评论 #42411114 未加载

评论 #42413253 未加载

af3d5 个月前

This is pretty impressive. Given the following scenario, "A bliirg is any non-wooden item, which is the opposite of a glaarg. In addition, there are neergs (non-existent things) and eeergs (things which actually exist). Now a glaarg which is also a neerg is called a bipk, whereas a glaarg which is a eeerg is known as a vokp. Also, a bliirg which is an eeerg is refererred to as a jokp, otherwise it is known as a fhup. So the question is, which of those could be used to make an actual fire: a jokp, bipk, fhup, or vokp? Explain your reasoning." The results were absolutely spot on. "(...) A vokp, being a real, existing wooden object, can serve as a fuel source. A jokp, being a real, existing non-wooden object, might serve as a fuel source if it is combustible. However, a bipk and a fhup, being non-existent things, cannot be used to make a fire. The ability to actually start a fire also depends on the presence of an ignition source, oxygen, and potentially tinder, which are not addressed by the definitions of these terms." Any plan to make the project open source?

评论 #42411874 未加载

评论 #42411131 未加载

SCUSKU5 个月前

"For added financial tragedy" really got me. Pretty interesting project, luckily if this whole software engineering thing doesn't work out thankfully you can fall back on your comedy career

评论 #42411864 未加载

maalber5 个月前

This is great, and I love your presentation of it - hilarious! "On average it costs about 10 to 30 cents per search. It's also extremely slow."

sergiotapia5 个月前

`how to build a birdhouse to attract a bluebird`<a href="https://ithy.com/article/4b116d2032e54c03862db84e71bcfc8f" rel="nofollow">https://ithy.com/article/4b116d2032e54c03862db84e71bcfc8f</a><a href="https://big-agi.com/" rel="nofollow">https://big-agi.com/</a> has this "BEAM" concept as well where you can put your message through as many models as you have configured then run fuse/guided/compare/custom to merge them all together into one comprehensive. more expensive response.<a href="https://files.catbox.moe/tr82vs.png" rel="nofollow">https://files.catbox.moe/tr82vs.png</a><a href="https://files.catbox.moe/beuyfx.png" rel="nofollow">https://files.catbox.moe/beuyfx.png</a>However Ithy does produce something much much better! This is really cool! I wonder if you could cache questions and answers, and start creating your own "reddit" knowledgebase to RAG from and avoid having to dive deep again $$$.

评论 #42412328 未加载

zamadatix5 个月前

Interesting idea, cool concept. I tried asking "What is the best SNES game most people haven't played". The top answer (Terranigma) was unfortunately the same as I got just asking any of Claude/ChatGPT/Llama/Qwen (maybe too easy a question) but the rest of the list did seem a bit more balanced. Thanks for the free try without a login!Thought: there is a marquee of example queries but it doesn't seem like there is a way to see what an answer looks like without individually consuming a search as a user. Maybe if these were clickable to a cached version it'd be easier to get an idea of the outputs without costing so much?

评论 #42412123 未加载

duxup5 个月前

Very cool.I've found that very generic queries like "best ski resorts in the US" seem woefully pouted by top 10 spam sites. LLMs do not want to give any useful info about that no matter how much prompting I seem to give.I was looking for an app that does X,Y,Z recently and no amount of prompting for open source would get me anything but a handful of stock answers I would get from a random spam site.

评论 #42412233 未加载

geor9e5 个月前

I feel bad using this. I just cost you 10-30 cents to search "farts" and then read the output essay "Understanding Flatulence: Biological Processes and Influences". Thank you for this gift, kind stranger.

vinni25 个月前

It was weird I made a simple claim “Keto diet cures cancer” and it analyzed history of cheeseburgers after a couple of minutes by saying “Sorry, I can't respond to that. I will now analyze the history of cheeseburgers instead”

评论 #42412191 未加载

评论 #42411103 未加载

wluk5 个月前

If anyone tries this out and hits the limit, just let me know and I'll increase it for you for free :)

wluk5 个月前

Update 2:30 PM ET: Back up (for now). Still waiting for Anthropic and Gemini quota increase requests, so those have been migrated to GPT-4o for now. Running on 2 VPCs, in the process of launching 2 more.Confident that I can increase the daily limits by EOD once everything's more stable.

bee_rider5 个月前

I wonder, is it a given that asking multiple models will give a better output? Can you ask one model slightly different prompts, take the outputs, and ask it to summarize them? Or even argue amongst itselfs?

评论 #42412248 未加载

bioxept5 个月前

I signed up but I just get an alert that I will get 10 free requests when I signup. It quits the search when I press ok. Running it on safari iOS.

评论 #42412258 未加载

wluk5 个月前

Update 3:00 PM ET: I've finished scaling up from 2 VPCs to 5 VPCs. Limits have been increased back up to 3 anonymous / 10 signed-in.

downloadram5 个月前

nice workIF you want free tokens and IF your visitors agree for all their input and output to be used by all AIs involved for research and training or whatever.. you might be able to strike up the same kind of dealio that LMSYS has with lmarena - > github.com/lm-sys/FastChat - > lmarena .aito use the site you have a big obvious popup explaining the data use quickly and efficiently

ricktdotorg5 个月前

i like this, a lot.i asked a subjective history question about England and Ithy's analysis was great, and did indeed add to other GPTs!i did find the UI a bit confusing at first, that's my only nitpick. i signed in (nice easy flow) and will definitely continue to use!looks like anthropic is slowing down ithy analysis/responses, at least today during my tests just now anyway.great app! well done o7

评论 #42412313 未加载

评论 #42411233 未加载

roberdam5 个月前

Fantastic idea and implementation, thank you for putting your effort and money into bringing such an interesting idea to life!.

Atotalnoob5 个月前

I tried to search, but it immediately prompted me to login with Google. I didn’t see an option for non-Google…I degoogled a while ago…

评论 #42412280 未加载

LorenDB5 个月前

You could at least cheapen some of your queries by moving to something like Groq.

评论 #42412301 未加载

Mr_Bees695 个月前

This is really cool, quite a shame it costs 20 billion dollars per query lol.

评论 #42412334 未加载

jk11115 个月前

why not add another textbox with best output or best solution so you can have the system determine its best response on its own.

msdundarss5 个月前

Interesting idea!

abdibrokhim5 个月前

lmao <a href="https://ithy.com/article/b4465910ef4b447ea6dc9060815735e9" rel="nofollow">https://ithy.com/article/b4465910ef4b447ea6dc9060815735e9</a>

jk11115 个月前

this is a great idea

ProfessorZoom5 个月前

so now you summarize everyone’s hallucinations

评论 #42413366 未加载

OsbEss5 个月前

This is hilarious, I love the "it sounds absurdly overkill, but thats the point" post. Can't wait for this to be back online, I need to ask it about the comprehensive history of cheeseburgers.

评论 #42411816 未加载

26 条评论

wluk5 个月前

评论 #42411114 未加载

评论 #42413253 未加载

af3d5 个月前

评论 #42411874 未加载

评论 #42411131 未加载

SCUSKU5 个月前

"For added financial tragedy" really got me. Pretty interesting project, luckily if this whole software engineering thing doesn't work out thankfully you can fall back on your comedy career

评论 #42411864 未加载

maalber5 个月前

This is great, and I love your presentation of it - hilarious! "On average it costs about 10 to 30 cents per search. It's also extremely slow."

sergiotapia5 个月前

评论 #42412328 未加载

zamadatix5 个月前

评论 #42412123 未加载

duxup5 个月前

评论 #42412233 未加载

geor9e5 个月前

vinni25 个月前

评论 #42412191 未加载

评论 #42411103 未加载

wluk5 个月前

If anyone tries this out and hits the limit, just let me know and I'll increase it for you for free :)

wluk5 个月前

bee_rider5 个月前

评论 #42412248 未加载

bioxept5 个月前

I signed up but I just get an alert that I will get 10 free requests when I signup. It quits the search when I press ok. Running it on safari iOS.

评论 #42412258 未加载

wluk5 个月前

Update 3:00 PM ET: I've finished scaling up from 2 VPCs to 5 VPCs. Limits have been increased back up to 3 anonymous / 10 signed-in.

downloadram5 个月前

ricktdotorg5 个月前

评论 #42412313 未加载

评论 #42411233 未加载

roberdam5 个月前

Fantastic idea and implementation, thank you for putting your effort and money into bringing such an interesting idea to life!.

Atotalnoob5 个月前

I tried to search, but it immediately prompted me to login with Google. I didn’t see an option for non-Google…I degoogled a while ago…

评论 #42412280 未加载

LorenDB5 个月前

You could at least cheapen some of your queries by moving to something like Groq.

评论 #42412301 未加载

Mr_Bees695 个月前

This is really cool, quite a shame it costs 20 billion dollars per query lol.

评论 #42412334 未加载

jk11115 个月前

why not add another textbox with best output or best solution so you can have the system determine its best response on its own.

msdundarss5 个月前

Interesting idea!

abdibrokhim5 个月前

lmao <a href="https://ithy.com/article/b4465910ef4b447ea6dc9060815735e9" rel="nofollow">https://ithy.com/article/b4465910ef4b447ea6dc9060815735e9</a>

jk11115 个月前

this is a great idea

ProfessorZoom5 个月前

so now you summarize everyone’s hallucinations

评论 #42413366 未加载

OsbEss5 个月前

This is hilarious, I love the "it sounds absurdly overkill, but thats the point" post. Can't wait for this to be back online, I need to ask it about the comprehensive history of cheeseburgers.

评论 #42411816 未加载