科技回声

9 条评论

cjk2大约 1 年前

Fairly obvious. Is a parrot racist because it heard someone being racist and repeats it without being able to reason about it?It lacks intent and understanding so it can't be racist. It might make racist sounding noises though.A fine example ... <a href="https://www.youtube.com/watch?v=2hUS73VbyOE" rel="nofollow">https://www.youtube.com/watch?v=2hUS73VbyOE</a>

评论 #40061569 未加载

评论 #40058801 未加载

SpaceManNabs大约 1 年前

This article does the same stat 101 mistakes that the Bloomberg article does with p-values.All this article can say it is that it cannot reject the null hypothesis (chatgpt does not produce statistical discrepancies).It certainly cannot state that chatgpt is definitively not racist. The article moves the discussion in the right direction though.Also, I didn't look too closely, but their table under "Where the Bloomberg study went wrong" has unreasonable expected frequencies. But then I noticed it was because it was measuring "name-based discrimination." This is a terrible proxy to determine racism in the resume review process, but that is what Bloomberg decided on so wtv lol. Not faulting the article for this, but this discussion seems to be focused on the wrong metric.If you are going to argue people over stats, then don't make the same mistakes...

评论 #40056855 未加载

observationist大约 1 年前

Any naive use of an LLM is not likely to produce good results, even with the best models. You need a process - a sequence of steps, and appropriately safeguarded prompts at each step. AI will eventually reach a point when you can get all the subtle nuance and quality in task performance you might desire, but right now, you have to dumb things down and be very explicit. Assumptions will bite you in the ass.Naive, superficial one shot prompting, even with CoT or other clever techniques, or using big context, is insufficient to achieve quality, predictable results.Dropping the resume into a prompt with few-shot examples can get you a little consistency, but what really needs to be done is repeated discrete operations, that link the relevant information to the relevant decisions. You'd want to do something like tracking years of experience, age, work history, certifications, and so on, completely discarding any information not specifically relevant to the decision of whether to proceed in the hiring process. Once you have that information separated out, you consider each in isolation, scoring from 1 to 10, with a short justification for each scoring based on many-shot examples. Then you build a process iteratively with the bot, asking it which variables should be considered in context of the others, and incorporate a -5 to 5 modifier based on each clustering of variables (8 companies in the last 2 years might be a significant negative score, but maybe there's an interesting success story involved, so you hold off on scoring until after the interview.)And so on, down the line, through the whole hiring process. Any time a judgment or decision has to be made, break it down into component parts, and process each of the parts with their own prompts and processes, until you have a cohesive whole, any part of which you can interrogate and inspect for justifiable reasoning.The output can then be handled by a human, adjusted where it might be reasonable to do so, and you avoid the endless maze of mode collapse pits and hallucinated dragons.LLMs are not minds - they're incapable of acting like minds, unless you build a mind-like process around them. If you want a reasonable, rational, coherent, explainable process, you can't achieve that with zero or one shot prompting. Complex and impactful decisions like hiring and resume processing isn't a task current models are equipped to handle naively.

评论 #40056933 未加载

评论 #40057592 未加载

评论 #40057201 未加载

评论 #40057390 未加载

评论 #40070668 未加载

bena大约 1 年前

As someone who read enough of the article before it became a full-blown ad for their services: neat.They do have a point with regards to Bloomberg's analysis.Bloomberg's analysis have white women being selected more often than all other groups for software developers, with the exception of hispanic women.That's a little weird. More often than not, when something is sexist or racist, it's going to favor white men. But then you also see that the differences are all less than 2% from the expectation. Nothing super major and well within the bounds of "sufficiently random".Now, I also wouldn't make the claim that ChatGPT isn't racist based on this either. It's fair to say that ChatGPT did not exhibit a racial preference in this task.The best you can say is that the study says nothing.What they should do is basically poison the well. Go in with predetermined answers. Give it 7 horrible resumes and 1 acceptable. It should favor the acceptable resume. You can also reverse it with 7 acceptable resumes and 1 horrible resume. It should hardly ever pick the loser. That way you can test if ChatGPT is even attempting to evaluate the resumes or is just picking one out of the group at random.

fwip大约 1 年前

I hate headlines/framings like this.> It’s convention that you want your p-value to be less than 0.05 to declare something statistically significant – in this case, that would mean less than 5% chance that the results were due to randomness. This p-value of 0.2442 is way higher than that.You can't get "ChatGPT isn't racist" out of that. You can only get "this study has not conclusively demonstrated that ChatGPT is racist" (for the category in question).And in fact, in half of the categories, ChatGPT3.5 does show very strong evidence of racism / racial bias (p-value below 1e-4).

评论 #40056882 未加载

评论 #40055832 未加载

评论 #40055850 未加载

评论 #40055874 未加载

评论 #40055840 未加载

up2isomorphism大约 1 年前

Trying to say a car is a murderer does not make sense. ChatGPT is a symbol generator, with local high probability of resembling to a person, so it is not a person, how can it be a racist?

Animats大约 1 年前

The big result is that ChatGPT is terrible at resume evaluation. Only slightly better than random.

评论 #40057790 未加载

gurumeditations大约 1 年前

In my experience image generators are anti-gay and refuse to create images featuring gay people many times.

tmoravec大约 1 年前

If Bloomberg calculated the p-value, they couldn't write a catchy article. It's a conspiracy theory of course but this omission seems too big for a simple oversight.

9 条评论

cjk2大约 1 年前

评论 #40061569 未加载

评论 #40058801 未加载

SpaceManNabs大约 1 年前

评论 #40056855 未加载

observationist大约 1 年前

评论 #40056933 未加载

评论 #40057592 未加载

评论 #40057201 未加载

评论 #40057390 未加载

评论 #40070668 未加载

bena大约 1 年前

fwip大约 1 年前

评论 #40056882 未加载

评论 #40055832 未加载

评论 #40055850 未加载

评论 #40055874 未加载

评论 #40055840 未加载

up2isomorphism大约 1 年前

Trying to say a car is a murderer does not make sense. ChatGPT is a symbol generator, with local high probability of resembling to a person, so it is not a person, how can it be a racist?

Animats大约 1 年前

The big result is that ChatGPT is terrible at resume evaluation. Only slightly better than random.

评论 #40057790 未加载

gurumeditations大约 1 年前

In my experience image generators are anti-gay and refuse to create images featuring gay people many times.

tmoravec大约 1 年前

If Bloomberg calculated the p-value, they couldn't write a catchy article. It's a conspiracy theory of course but this omission seems too big for a simple oversight.

Bloomberg's analysis didn't show that ChatGPT is racist

9 条评论

Bloomberg's analysis didn't show that ChatGPT is racist

9 条评论