The Languages of English, Math, and Programming

161 点作者 stereoabuse7 个月前

17 条评论

I took a Udacity class by Norvig [1] and my abilities as a programmer clearly were improved afterward.His code here demonstrates why. It is both shorter and much easier to understand than anything the LLMs generated. It is not always as efficient as the LLMs (who often skip the third loop by calculating the last factor), but it is definitely the code I would prefer to work with in most situations.[1] <a href="https://www.udacity.com/course/design-of-computer-programs--cs212" rel="nofollow">https://www.udacity.com/course/design-of-computer-programs--...</a>

评论 #41897526 未加载

评论 #41892999 未加载

评论 #41893592 未加载

评论 #41894514 未加载

underdeserver7 个月前

I'm just here to point out that since Python 3.10, you don't need to import anything from typing anymore, you could use the built-in types for annotation:<pre><code> from math import prod from itertools import combinations def find_products(k=3, N=108) -> set[tuple[int, ...]]: """A list of all ways in which `k` distinct positive integers have a product of `N`.""" factors = {i for i in range(1, N + 1) if N % i == 0} return {ints for ints in combinations(factors, k) if prod(ints) == N} find_products()</code></pre>

评论 #41897391 未加载

earslap7 个月前

It is more obvious when taken to extreme: With the current feedforward transformer architectures, there is a fixed amount of compute per token. Imagine asking a very hard question with a yes/no answer to an LLM. There are infinite number of cases where the compute available to the calculation of the next token is not enough to definitively solve that problem, even given "perfect" training.You can increase the compute for allowing more tokens for it to use as a "scratch pad" so the total compute available will be num_tokens * ops_per_token but there still are infinite amount of problems you can ask that will not be computable within that constraint.But, you can offload computation by asking for the description of the computation, instead of asking for the LLM to compute it. I'm no mathematician but I would not be surprised to learn that the above limit applies here as well in some sense (maybe there are solutions to problems that can't be represented in a reasonable number of symbols given our constraints - Kolmogorov Complexity and all that), but still for most practical (and beyond) purposes this is a huge improvement and should be enough for most things we care about. Just letting the system describe the computation steps to solve a problem and executing that computation separately offline (then feeding it back if necessary) is a necessary component if we want to do more useful things.

segmondy7 个月前

Tried these with some local models and these are the ones that generated the program one shot, a few of them also generated the results correctly one shot.llama3.1-70b, llama3.1-405b, deepseekcoder2.5, gemma-27b, mistral-large, qwen2.5-72b. <a href="https://gist.github.com/segmond/8992a8ec5976ff6533d797caafe151fa" rel="nofollow">https://gist.github.com/segmond/8992a8ec5976ff6533d797caafe1...</a>I like how the solution sort of varies across most, tho mistral and qwen look really similar.

评论 #41895097 未加载

ryandv7 个月前

It's worth noting that math and programming do not appear to be considered "languages" by much of the academic and/or neuroscientific literature; see [0] on the front page right now and my comments regarding the same [1].[0] <a href="https://news.ycombinator.com/item?id=41868884">https://news.ycombinator.com/item?id=41868884</a>[1] <a href="https://news.ycombinator.com/item?id=41892701">https://news.ycombinator.com/item?id=41892701</a>

评论 #41897246 未加载

评论 #41895377 未加载

评论 #41895578 未加载

tmsh7 个月前

What about o1? I think the world is sleeping on o1. Recently I misread a leetcode/neetcode problem (so I was curious that my version of the problem with an extra constraint could be solved in a different way). And 4o hallucinated incorrectly and double downed when asked follow up questions - but o1 solved it the first time what seemed like easily. It really is a major step forward.

评论 #41897324 未加载

bytebach7 个月前

Related HN discussion - <a href="https://news.ycombinator.com/item?id=41831735">https://news.ycombinator.com/item?id=41831735</a> using Prolog as an intermediate target language for LLM output improves their 'reasoning' abilities.

owenpalmer7 个月前

A formal notation for reasoning could possibly solve some reasoning issues for LLMs. Perhaps something like Lojban or symbolic logic. We don't have a lot of data for it, but it might be possible to synthetically generate it.On a dark note, I wonder if increasing AI reasoning capability could have dangerous results. Currently, LLMs are relatively empathetic, and seem to factor the complex human experience into it's responses. Would making LLMs more logical, cold, and calculating result in them stepping on things which humans care about?

评论 #41892996 未加载

评论 #41892705 未加载

albert_e7 个月前

Gut feel: doing this in two steps (1. write an algorithm for and 2. write code for) or even chain-of-thought prompting might yield better results.

downboots7 个月前

> The language that a problem-solver uses matters!Because the "intelligence" is borrowed from language (lower entropy)

评论 #41894925 未加载

dandanua7 个月前

It's not the language it's the context. LLMs could give very different outputs depending on the supplied context. In this case, words "python" and "program" put it in a far better context to solve the problem.

okibry7 个月前

I have question, we still do not know what behind "neuron network" of machine. What if we gave them some syntax of a language (or syntax of a area) then ask them extract a general rule for that language ? Can them can do that ?

__0x017 个月前

I thought this was going to be an essay on the impact of English, Math and Programming on humanity.I would place English (or all spoken/written languages in general) first, Math (as discovered) second, and programming languages last.

YeGoblynQueenne7 个月前

>> Only 2 of the 9 LLMs solved the "list all ways" prompt, but 7 out of 9 solved the "write a program" prompt. The language that a problem-solver uses matters! Sometimes a natural language such as English is a good choice, sometimes you need the language of mathematical equations, or chemical equations, or musical notation, and sometimes a programming language is best. Written language is an amazing invention that has enabled human culture to build over the centuries (and also enabled LLMs to work). But human ingenuity has divised other notations that are more specialized but very effective in limited domains.If I understand correctly, Peter Norvig's argument is about the relative expressivity and precision of Python and natural language with respect to a particular kind of problem. He's saying that Python is a more appropriate language to express factorisation problems, and their solutions, than natural language.Respectfully -very respectfully- I disagree. The much simpler explanation is that there are many more examples, in the training set of most LLMs, of factorisation problems and their solutions in Python (and other programming languages), than in natural language. Examples in Python etc. are also likely to share more common structure, even down to function and variable names [1], so there are more statistical regularities for a language model to overfit-to, during training.We know LLMs do this. We even know how they do it, to an extent. We've known since the time of BERT. For example:Probing Neural Network Comprehension of Natural Language Arguments<a href="https://aclanthology.org/P19-1459/" rel="nofollow">https://aclanthology.org/P19-1459/</a>Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference<a href="https://aclanthology.org/P19-1334/" rel="nofollow">https://aclanthology.org/P19-1334/</a>Given these and other prior results Peter Norvig's single experiment is not enough, and not strong enough, evidence to support his alternative hypothesis. Ideally, we would be able to test an LLM by asking it to solve a factorisation problem in a language in which we can ensure there are very few examples of a solution, but that is unfortunately very hard to do.______________[1] Notice for instance how Llama 3.1 immediately identifies the problem as "find_factors", even though there's no such instruction in the two prompts. That's because it's seen that kind of code in the context of that kind of question during training. The other LLMs seem to take terms from the prompts instead.

评论 #41900746 未加载

itronitron7 个月前

so, in conclusion, the training data containing 'math' that LLMs have access to is predominantly written as software code, and not as mathematical notation

评论 #41894421 未加载

PaulDavisThe1st7 个月前

Within hours to weeks to months, many LLMs will be better at solving the problem as originally given in TFA ... because of the existence of TFA.Not immediately clear what this means or if it is good or bad or something else.

akira25017 个月前

When an author writes things like:"But some of them forgot that 1 could be a factor of 108"I struggle to take them seriously. The anthropomorphization of AI into something that can "know" or "forget" is ridiculous and shows a serious lack of caution and thinking when working with them.Likewise it leads people into wasting time on "prompt engineering" exercises that produce overfit and worthless solutions to trivial problems because it makes them feel like they're revealing "hidden knowledge."Genuinely disappointing use of human time and of commercial electricity.