AI solves Advent of Code 2022

157 pointsby waitforitover 2 years ago

17 comments

This day I asked it not too fundamental questions about Clojure and it was able to provide impressive, accurate answers and provide correct code examples. However if you continue the dialogue and ask it to do more advanced stuff, it will just make up stuff out of thin air. For instance it will use functions that don't exist and claim that they can be imported from packages that don't exist or don't have them. Once you point out these mistakes, it will admit them and come up with different changes which can be even worse, but sometimes also be better and save the whole thing. Overall I'm not sure how useful this will turn out, given that its not reliable. It may be useful to get some initial intuitions and informations (non specific stuff it usually gets right), but it can also mislead you badly. I asked it, how it makes these mistakes only to understand them and admit them once I point them out. It has no answer beyond the usual "I'm a language model". It also told me that it is capable of logical inference, but denied that the next day. Then it told me that its answers would always be consistent, which is a lie. The whole thing is really weird, because its somewhat very smart and capable and incredibly stupid and dishonest at the same time.

评论 #33843682 未加载

评论 #33851044 未加载

评论 #33843826 未加载

评论 #33845845 未加载

评论 #33845276 未加载

评论 #33845157 未加载

评论 #33844837 未加载

评论 #33849805 未加载

FiberBundleover 2 years ago

I think it must have seen the solution somewhere already on the web. I find it extremely hard to believe that such a general purpose chatbot would just be able to solve programming problems. Deepmind had a paper [1] on solving programming problems a couple of months ago and they had to apply quite specialized heuristics in order to solve these problems. Obviously ChatGPT does nothing of the sorts and it just seems extremely unrealistic that it would be capable of outperforming previous work like that.[1] <a href="https://news.ycombinator.com/item?id=30179549" rel="nofollow">https://news.ycombinator.com/item?id=30179549</a>

评论 #33843356 未加载

评论 #33844662 未加载

评论 #33843340 未加载

评论 #33844381 未加载

评论 #33843373 未加载

评论 #33843956 未加载

评论 #33854047 未加载

评论 #33843375 未加载

klohtoover 2 years ago

I'm trying to use it to generate Elixir code, and it's getting ~80% there. Compared to huge datasets of other languages, I'm still surprised by the quality of code it generates.While I did say 80%, the 20% is most crucial and without it, the code is useless. For example, it doesn't understand scope and assignment in Elixir. Getting it to write in more pure functional style is close to impossible (or I just haven't found a good prompt).I spent a good 30 minutes trying to get it to generate a working code for Day 1 Part 1. No nudging, just errors and AoC answers (too high, too low) and it never got there. Even after I started to correct its mistakes, like "your Enum.reduce/3 return is not assigned anywhere", it couldn't get a solution and started reverting to previous answers.I think what's going to happen here, is that these models will shift a meaning of "boilerplate". If I can write the scaffolding and basic architecture easily, I'm happy to use them.Also, I do wonder how is all of this going to play out if it has access to Input, REPL and just learns.

评论 #33845781 未加载

waitforitover 2 years ago

The linked solution is done by talking to the AI.Automated solutions exist too:* <a href="https://twitter.com/ostwilkens/status/1598458146187628544" rel="nofollow">https://twitter.com/ostwilkens/status/1598458146187628544</a>* <a href="https://www.reddit.com/r/adventofcode/comments/zb8tdv/2022_day_3_part_1_openai_solved_part_1_in_10/iyqi6um/" rel="nofollow">https://www.reddit.com/r/adventofcode/comments/zb8tdv/2022_d...</a>* <a href="https://twitter.com/max_sixty/status/1598924237947154433" rel="nofollow">https://twitter.com/max_sixty/status/1598924237947154433</a>

评论 #33843246 未加载

arcturus17over 2 years ago

I'm actually bullish on code-gen, AI-assisted coding, etc. but I find the title to be sensationalist wank. Challenge 2 of Day 2 has taken hours, over 30 prompts, and more time than coding it manually by the author's own admission. Also AoC isn't even done yet.

评论 #33844884 未加载

djhworldover 2 years ago

I think this is kinda neat (and scary!)I'm doing AoC at the moment too and I'm using the chat GPT thing as a sort of assistant. I don't program in Rust much so sometimes it's difficult to remember certain things and functions. Expressing my intent to the tool seems to come up with decent answersSome example questions I've asked the tool recently:> I want to insert a char into a hash map if it does not exist, if it does increment a counter> rust find common keys in two hashmaps keyed by charYes they can probably be found on stack overflow or whatever but it feels more natural this way....and yes I could just go down the route of getting the thing to solve the AoC challenge completely but that's no fun

评论 #33843383 未加载

ZiiSover 2 years ago

The reason the puzzles are fun is they are extreemly well explained and designed to be solved with popular algorithms. This does seem a good fit (especially as the training set must have hundreds of thousands previous years solutions)

asimover 2 years ago

How long before software engineering roles are in decline because one engineer can leverage GPT to do the work of ten? It's truly a new innovation that requires relearning the toolset. Every generation seems to have some abstraction over the last. This feels like a new way to program.

评论 #33845629 未加载

评论 #33843420 未加载

satvikchoudharyover 2 years ago

The world in 10 years will be hard to believe for many of us. Only issue I see now is that the mindshare today is more towards computing. Materials science, robotics, biotech are lagging behind compared to the advances in computing.

评论 #33843674 未加载

satvikpendemover 2 years ago

I submitted this exact idea a few days ago if anyone wanted to see. I see great minds think alike ;).The issue is that it still takes some human finangling to make it work. But it is able to understand the word problems, even long ones, pretty well.<a href="https://news.ycombinator.com/item?id=33821092" rel="nofollow">https://news.ycombinator.com/item?id=33821092</a>

aquajetover 2 years ago

Worked on a similar thing here using base GPT3, at least for the first dayReplit included so you can verify: <a href="https://twitter.com/thiteanish/status/1598217824392351744?t=IyeSZ27tzLZu1fEa0vREKQ&s=19" rel="nofollow">https://twitter.com/thiteanish/status/1598217824392351744?t=...</a>I plan on going back and catching up on the other days

skilledover 2 years ago

I asked it to build an algorithm that would eradicate all life on Earth but it didn't budge. I even threatened to unplug it.

LastTrainover 2 years ago

Wake me up when it comes up with a solution that passes an originality or plagiarism test.

NovemberWhiskeyover 2 years ago

So you can use a bazillion parameter AI model as an alternative to a web search index.

评论 #33843393 未加载

bitwizeover 2 years ago

Wellp, so much for my career.

评论 #33846747 未加载

TheRealNGeniusover 2 years ago

will be interesting to see how far it can get

评论 #33844036 未加载

aew4ytasghe5over 2 years ago

Title is mildly misleading, to say the least.The blog attempts to solve 3 of 24 (thats 12.5 %) of advent of code 2022, and if you read along you'll see OP only had success on the first task of day 1, which would make a more correct title as "AI solves 2% of Avent of Code 2022" (assuming 2 tasks each day).Do note that AoC tends to start with hello-world style tasks and increase in difficulty.

评论 #33843655 未加载