Did GoogleAI just snooker one of Silicon Valley’s sharpest minds?

264 点作者 TeacherTortoise超过 2 年前

39 条评论

concinds超过 2 年前

"A lightbulb surrounding some plants" is not English. If a wolf pack is surrounding a camp, we understand what it means. If a wolf is surrounding my camp; does that mean I'm in his stomach? Absurd."A lightbulb containing some plants," makes sense, not "surrounding". It's too small to surround anything, which humans (and apparently, current AI) understand. Paradoxically, only primitive language models would actually understand the inverted sentences; proper AIs should, like humans, be confused by them; since zero human talks like that.The only reason the Huggingface people (in their Winoground paper) got 90% of humans "getting the answer right" with these absurd prompts because of humans' ability to guess what is expected of them by an experimenter. Do it in daily life instead of a structured test, and see if these same people get it right.It's exactly as if I gave you the sequence, in an IQ-test context: "1 1 2 3" and asked you to give me the next number. You'd give the Fibonacci sequence, because you know I expect it; no matter that it's a stupid assumption to make because the full sequence might as well be "1 1 2 3 1 1 2 3 1 1 2 3", and you don't have enough information to know the real answer. Do we really want AIs that similarly "guess" an answer they know to be wrong, just because we expect it? Or (in number sequence example) AIs that don't understand basic induction/Goodman's Problem?I'd like to add that the author, who keeps referring to himself as a scientist, is in fact a psychology professor. In his Twitter bio, he states that he wrote one of the "Forbes 7 Must-Read Books in AI", which discredits him as a fraud since Forbes can be paid to publish absolutely whatever you ask them to (it's not disclosed as sponsored content, and they're quite cheap, trust me).

评论 #32866221 未加载

评论 #32871294 未加载

评论 #32866232 未加载

SilverBirch超过 2 年前

I often hear on places like here that Scott Alexander is interesting and deep and insightful. But then I see bits and pieces like this. This blogger doesn't need to go into some deep analysis of compositionality to go "You came up with a 5 question test and decided 1 answer out of 10 attempts would be a pass". We've gone from 90%+s in imagenet to this as a pass mark?It's like sure we can dissect all the statistical risks of this, but why bother? It's self evident bullshit. You might as well have just posted a link to Scott Alexander's original blog claiming victory with just "Lol ok".Just post a screenshot of the phrase "An oil painting of a robot in a factory looking at a cat wearing a top hat", show the pictures of a robot near a cat that has a top hat, not in a warehouse, and say "lol ok."

评论 #32866723 未加载

评论 #32875380 未加载

评论 #32865885 未加载

AgentME超过 2 年前

It seems like Scott's bet was merely that our modern techniques would be able to make at least some nonzero progress in compositionality (and the terms of his bet spelled this out with how lenient it was), and Gary is treating it as if the bet was about compositionality being solved. It feels like a very bad faith reading from Gary.

评论 #32864731 未加载

评论 #32865543 未加载

ummonk超过 2 年前

This reminds me of the scandal where Youtube science channels did glowing paid reviews of Waymo’s self driving cars without acknowledging they were paid for it. And technooptimists like Scott Alexander or Ray Kurzweil have a common tendency to shift the goalposts and declare they were right with their predictions. Current AI certainly doesn’t demonstrate proto-AGI capabilities.That said, we shouldn’t miss the forest for the trees. We can be skeptical that current The pace of AI progress has been immense and problems that previously seemed difficult (e.g. computer vision classification, or beating top players at Go) have fallen one by one. And AI-skepticism’s have themselves been moving the goalposts in response. I see no reason why composition won’t be the same with time. Indeed, a decade ago machine translation used to struggle to understand the relationships between things, but now seems to be reliable at preserving the compositional relationships post-translation. 2029 is rather optimistic, but AGI does seem to be approaching in the coming few decades.

评论 #32859103 未加载

评论 #32861108 未加载

评论 #32865197 未加载

评论 #32859087 未加载

评论 #32865451 未加载

评论 #32866238 未加载

theptip超过 2 年前

I thought Scott Alexander jumped the gun a bit by declaring victory in this case, just because the prompts used were not the original ones (robot vs. person due to content filters). But Marcus is way off base here and sounding petulant; Alexander is clearly not claiming AI has solved compositionality, his claim is the much narrower one that he won his bet. And the general context to the bet is that usually when he writes an article on AI (at least for the last few years), someone says “we will never get X in the next 5 years”, Alexander makes a bet that it will happen sooner, and X always happens sooner. In this case the X was some loose low bar for the next iteration of compositionality above DALL-E 2 with a multi-year timeframe, and SOTA models at the time of the discussion could (arguably at least) meet that bar.Alexander’s broad claim on compositionality is that simply throwing more scale and/or data at the problem seems likely to solve the problem, to which Marcus counters that these models lack something fundamental and can’t be scaled to human performance.FWIW I find Marcus’ position to be a bit frustratingly ambiguous; he seems to blend two distinct positions:A) NN models are not a model for human intelligence/languageB) NN models cannot reach AGIHe seems to fluidly switch between these critiques in a way I find a bit irritating. I think it’s quite clear that NN architectures have little to do with the way the human brain does language understanding, lacking the gross structure of the brain, which is certain to affect cognitive capabilities and tendencies. So A) is trivially true. But no AI maximalist cares about using these models as a way to understand or model human language. They care about general intelligence.Even granting A), that does nothing to prove B). Perhaps he simply believes B requires A? That would be odd but would explain his approach.

tambourine_man超过 2 年前

> Musk didn’t have the guts to accept, which tells you a lot.Musk actively declined the bet or did he simply not respond? There is a big difference.

评论 #32858839 未加载

cthalupa超过 2 年前

"Compositionality" isn't there yet, but but the rate of improvement is impressive. Today there was a new release of CLIP which provides significantly better compositionality in Stable Diffusion - <a href="https://twitter.com/laion_ai/status/1570512017949339649" rel="nofollow">https://twitter.com/laion_ai/status/1570512017949339649</a>It'll be interesting to see how it fares against winoground once we get a publicly available SD release that makes use of the new CLIP.

评论 #32866677 未加载

emmelaich超过 2 年前

"a lightbulb surrounding some plants" is a weird phrase and a human feeling pedantic might well come up with the picture shown.A more typical phrase would be "lightbulbS around some plants" - note the plural.Maybe I'm missing something but using non-typical language won't work when it's been trained on normal language.

评论 #32864399 未加载

评论 #32866036 未加载

fshbbdssbbgdd超过 2 年前

This piece would have been a lot better if it were maybe three paragraphs long. In summary:1. Scott Alexander should have used an off-the-shelf benchmark like Winoground instead of rolling his own five-question test.2. He shouldn’t declare victory after cherry-picking good results from a small sample of questions.

评论 #32858271 未加载

评论 #32858428 未加载

评论 #32858264 未加载

评论 #32858168 未加载

评论 #32859759 未加载

aaroninsf超过 2 年前

So many trees, so little forest.Gary Marcus comes off in this as very long on pious snark and very short on awareness of his own vulnerability to cognitive error, which is just as striking as any of his targets.The error in his question being: unconsidered linear extrapolation in a domain that is demonstrably non-linear, indeed exponention.To frame this a different way, he's very pious for maintaining a faith in his specific god ("strong AI is like production fusion power, ten to twenty year from now for every now"),but he's worshiping a god of the gaps. The gap in this case being <checks notes> "compositionality."Yes, language is hard. Yes, strong AI isn't here.But to not take a hard look at the jump up the abstraction hierarchy going on with contemporary ML and not nervously wonder if your faith is maybe a little too sure for a "scientist"...?Bad look when you're on the offensive.

ajross超过 2 年前

So weird to see a piece ostensibly about logical fallacies deploy one so cavalierly:> I offered to bet [Elon Musk] $100,000 he was wrong [about AGI by 2029] [...] Musk didn’t have the guts to accept, which tells you a lot.The fact that you couldn't get someone engaged in a conversation absolutely does not "tell you a lot" about the substance of your argument. It only tells you that you were ignored.Now, I happen to think Marcus is right here and Musk is wrong, but... yikes. That was just a jarring bit of writing. Either do the detached professorial admonition schtick or take off the gloves and engage in bad faith advocacy and personal attacks. Both can be fun and get you eyeballs, and substack is filled with both. But not at the same time!

peteradio超过 2 年前

One idea to try to train the AI about compositionality, feed it Fox in Socks by Dr. Seuss. It's hard to understand that it would misunderstand the meaning of "on" or "in" or "under" when there are such nice illustrations. I've got tons of great ideas and I'm open for hire!

评论 #32858787 未加载

评论 #32857864 未加载

skybrian超过 2 年前

Partially this is confusing "Scott Alexander won a bet" with "compositionality is solved." And also, I'm not sure Scott won the bet? Changing people to robots is a cheap trick. I think Imagen should have been disqualified because it won't do people.Vitor took the other side of the bet and he is also not convinced [1]:> I'm not conceding just yet, even though it feels like I'm just dragging out the inevitable for a few months. Maybe we should agree on a new set of prompts to get around the robot issue.> In retrospect, I think that your side of the bet is too lenient in only requiring one of the images to fulfill the prompt. I'm happy to leave that part standing as-is, of course, though I've learned the lesson to be more careful about operationalization. Overall, these images shift my priors a fair amount, but aren't enough to change my fundamental view.Scott putting "I Won" in the headline when it's not resolved yet seems somewhat dishonest, or more charitably wishful thinking.[1] <a href="https://astralcodexten.substack.com/p/i-won-my-three-year-ai-progress-bet/comment/9068389" rel="nofollow">https://astralcodexten.substack.com/p/i-won-my-three-year-ai...</a>

评论 #32861429 未加载

projektfu超过 2 年前

I'm impressed by all of these image generators but I still don't see them working toward being able to say, "Give me an astronaut riding a horse. Ok, now the same location where he arrives at a rocket. Now one where he dismounts. Now the horse runs away as the astronaut enters the rocket."You can ask for all those things but the AI still has no idea what it's doing and cannot tell you where the astronaut is, etc.

评论 #32862682 未加载

评论 #32859222 未加载

评论 #32860184 未加载

评论 #32859552 未加载

评论 #32870507 未加载

评论 #32859550 未加载

评论 #32860225 未加载

IshKebab超过 2 年前

It's interesting that he now casually throws out a 5 year old as the benchmark to beat:> nobody has yet publicly demonstrated a machine that can relate the meanings of sentences to their parts the way a five-year-old child can.Not very long ago that would have been a 3 year old, or maybe even a smart 2 year old. 5 year olds are extremely good at basic language and understanding tasks. If we get to the point of AI that is as good as a 5 year old we're essentially at AGI.

评论 #32859144 未加载

stephc_int13超过 2 年前

We have absolutely no way to tell how far from "AGI" we are.What we know for sure is that we're not there yet. And what seems likely is that we're getting closer, and that's something.That is as much prediction we can get.I don't think that Compositionality is a wall, it is clearly an interesting feature, but I think that it is pretty clear by now that the Turing test or anything in the same spirit is far from sufficient.

adamsmith143超过 2 年前

>I think he is so far I offered to bet him a $100,000 he was wrong; enough of my colleagues agreed with me that within hours they quintupled my bet, to $500,000. Musk didn’t have the guts to accept, which tells you a lot.What a bloviating egomaniac. Does Musk really have the time to deal with pissant researchers like him? Whats 500k to a man worth a hundred billion?

评论 #32858827 未加载

abrax3141超过 2 年前

This test of compositionality is utterly lame. (FtR: I am a cognitive scientist and AI researcher and my PhD was building computational models of how humans do compositionality - which neither I, nor anyone else can spell, and therefore I will hereinafter refer to simply as C! :-) Anyway, the kind of C that they are seeking is trivial compared to the breadth of the capabilities of human C. Here’s a better example:You are engaged in a long conversation with someone, perhaps a friend of a friend who you met for lunch. At some point in the conversation they mention that they have a startup and are seeking someone like you. This revelation colors the whole conversation from that point onward. Indeed, each sentence colors the conversation from moment to moment.But, you reasonably respond, we can’t test that sort of C, modern AIs don’t do even ELIZA-level dialog yet!What’s the phrase??? “I rest my case?”

评论 #32860361 未加载

评论 #32860287 未加载

评论 #32860698 未加载

评论 #32878208 未加载

评论 #32860939 未加载

garymarcus超过 2 年前

so much ad hominem in these comments, relatively little substance. (eg “notorious goal post move, without a single example of something i actually said and changed my mind on)

评论 #32859117 未加载

评论 #32861282 未加载

评论 #32865583 未加载

neaden超过 2 年前

I completely forgot about Google Duplex. It looks like it is still around but very limited in terms of what phones you can use, what cities it can be used in, and what businesses in those cities will accept it. Doesn't appear any progress has really been made in the past few years. I think this is a great point of how companies create something with AI that is initially really cool, but isn't quite there to actually be very usable and gets forgotten when they roll out the next big thing.

评论 #32858124 未加载

评论 #32859515 未加载

评论 #32858177 未加载

raviparikh超过 2 年前

> If you flip a penny 5 times and get 5 heads, you need to calculate that the chance of getting that particular outcome is 1 in 32. If you conduct the experiment often enough, you’re going to get that, but it doesn’t mean that much. If you get 3/5 as Alexander did, when he prematurely declared victory, you don’t have much evidence of anything at all.This doesn’t make much sense. The task at hand is in no way equivalent in difficulty to flipping a coin. This is kind of like saying, “if you beat Usain Bolt in a race 3/5 times, that doesn’t mean anything; it’s like getting 3/5 coin flips to be heads.”

评论 #32857921 未加载

评论 #32859995 未加载

trention超过 2 年前

I'd like to comment specifically on the conception of betting on AI 'achievements' (I think Marcus' bet is underspecified and kind of vague in all 5 of its points).People shouldn't be betting on benchmarks because benchmarks can be and usually are gamed (see Goodhart's law). Also, most people couldn't give less f*ck if an AI can write an award-worthy poem (I personally don't care about any form of AI "art", any sort of text an AI can produce or really any meaningless "feat" it (as in the general category) becomes capable of). The only worthy bets are ones that discuss economic impact. How many people will be structurally unemployed because of AI by year X? Will it lower or increase the GDP growth rate and by how much? Will it shift the balance between labor and capital and how? Etc.So more meaningful bets and less benchmark bullshit that doesn't matter, please.

origin_path超过 2 年前

The reason Imagen isn't made available to the public probably isn't about compositionality. The most notable thing about Alexander's challenge is that Imagen totally failed every single one despite his claim of success because, apparently, it is programmed to never represent the human form. Not even Google employees are allowed to make it draw humans of any kind. They had to ask it to draw robots instead, but as pointed out in the comments, changing the requests in that way makes them much easier for DALL-E2 as well, especially the image with the top hats.If the creators have convinced themselves of some kind of "no humans" rule, but also know that this would be regarded as impossibly extreme and raise serious concerns about Google with the outside world, then keeping Imagen private forever may be the most "rational" solution.

评论 #32858330 未加载

评论 #32858818 未加载

IronWolve超过 2 年前

One of things I noticed is the satire, call backs to common news/ideas can really trip up any AI. Also if you ask it about anything politics, ask it to describe both sides of an argument. Thus why people fall back to the steelman cherry picking of responses to push their arguments.

jessaustin超过 2 年前

Yesterday, as part of a new podcast that will launch in the Spring, I interviewed the brilliant...This seems like the wrong way to go about podcasting. What can you say today that will still be interesting to hear in six months?

评论 #32858562 未加载

darawk超过 2 年前

Every concrete prediction Gary has made has been falsified. All of his others are insufficiently precise to be falsified.His GPT-2 examples were thoroughly defeated by GPT-3. Horse riding astronaut is solved. Neural knowledge graphs are a successful thing now. Compositionality isn't solved, but progress is clearly being made.If he was a serious person, this post could have been a few sentences: "No neural network will achieve <x> score on <y> metric on the Winoground dataset within the next <n> years". Simple, concrete, falsifiable. He has not done this, and one has to wonder why.

评论 #32863794 未加载

wrycoder超过 2 年前

Just keep laughing. I'd like to hear Ray Kurzweil's view (he's working at Google and is awfully quiet.)Human consciousness is over-rated. I'm reminded of Minsky's Society of Mind - a number of separate, communicating systems. To me, that sounds a lot like what is going on in Google, but they are hiding that.

评论 #32873796 未加载

powera超过 2 年前

I don't believe "compositionality" is a serious obstacle.It is a different issue than generating an image based on a bag-of-words, so it isn't surprising that an attempt to solve that issue didn't immediately solve the other.But a variety of approaches can easily solve this problem.

评论 #32860083 未加载

评论 #32859170 未加载

i_like_apis超过 2 年前

I wish more articles followed the standard essay format. At least state your main thesis in the first paragraph.There are interesting things buried in here, but I don’t have time for rambling.The edge cases of image models have been more succinctly summarized and speculated upon elsewhere.

评论 #32859193 未加载

googlryas超过 2 年前

Why Scott Alexander of all people? Isn't he a clinical psychologist?I think, if I had to give the task to the-subset-of-people-appearing-frequently-on-hn, I would give it to Gwern, not Scott.

评论 #32859872 未加载

评论 #32860845 未加载

评论 #32859725 未加载

mtlmtlmtlmtl超过 2 年前

First time I've seen the term "snooker" used outside of the sport Snooker.

kache_超过 2 年前

oh no musk ignored my twitter DM it must be because he's scared of taking a bet and therefore I am rightbtw, AGI is coming 2030. Source? It was revealed to me in a dream. Check my profile to see where you can email to take bets.

评论 #32861235 未加载

SergeAx超过 2 年前

> Full disclosure, I read Alexander’s successor Slate Star Codex, Astral Codex Ten, myself, and often enjoy it…when, that is, he is not covering artificial intelligence, about which we have had some rather public disagreements.Can it be a case of Gell-Mann Amnesia Effect? (<a href="https://en.m.wikipedia.org/wiki/Michael_Crichton#GellMannAmnesiaEffect" rel="nofollow">https://en.m.wikipedia.org/wiki/Michael_Crichton#GellMannAmn...</a>)

评论 #32863593 未加载

rebelos超过 2 年前

Imagine watching the seeds of AI that will terraform society and rapidly displace human labor over the coming decades be planted, and then still splitting hairs over whether or not it'll achieve sentience.Our world is changing before our very eyes while this guy is belaboring the technicalities. You could hardly ask for a keener display of the philosophical gulf between scientists and engineers.

评论 #32859032 未加载

评论 #32859429 未加载

评论 #32859862 未加载

daveguy超过 2 年前

Now ask it a question.

comeonbro超过 2 年前

Regarding Gary Marcus, the author of this piece, and his long and bizarre history of motivated carelessness on the topic of deep learning:<a href="https://old.reddit.com/r/TheMotte/comments/v8yyv6/somewhat_contra_marcus_on_ai_scaling/ieixnwm/?context=2#thing_t1_ieixnwm" rel="nofollow">https://old.reddit.com/r/TheMotte/comments/v8yyv6/somewhat_c...</a>

评论 #32858109 未加载

评论 #32858458 未加载

评论 #32859135 未加载

评论 #32858867 未加载

评论 #32859182 未加载

arisAlexis超过 2 年前

Missing the point: dismissing an apocalyptic possibility as 0 without proof is dangerous -> therefore we should take it seriously. Taleb's work is relevant in the concept of risk analysis.

jgalt212超过 2 年前

They first approached Lex Fridman, but his home-spun test had zero questions. /s

mgraczyk超过 2 年前

It's interesting that people keep coming up with things that are meant to distinguish AI systems from human intelligence, but then when somebody builds a system that crushes the benchmark the next generation comes up with a new goalpost.The difference now is that the timescales are weeks or months instead of generations. I believe we will see models that have super-human "compositional" reasoning within 1 year.

评论 #32859208 未加载

评论 #32858034 未加载

评论 #32858215 未加载

评论 #32858006 未加载

评论 #32858263 未加载

评论 #32858214 未加载

评论 #32858051 未加载