TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

A Way to Detect Bias

282 点作者 bakztfuture超过 9 年前

41 条评论

yummyfajitas超过 9 年前
People, most of whom clearly are not that good at math, are being really harsh on Paul Graham.<p>Graham is mostly right, but slightly incorrect. In particular, suppose group A has the distribution f(x) and B has the distribution g(x).<p>If f(x) and g(x) are shaped significantly differently <i>past the cutoff</i>, then mean(H(x-C)f(x)) and mean(H(x-c)g(x)) might not agree even though there is no bias by construction. (Here H(x) is a step function and C the cutoff).<p>However, there is an easy fix: compute the minima of the support of the distribution rather than <i>mean</i>. min(H(x-C)f(x)) = min(H(x-C)g(x)) = C.<p>In practice, measure the <i>weakest</i> male and <i>weakest</i> female to be accepted in your sample set, or some similar approximation.<p>I&#x27;m pretty sure this is a valid frequentist hypothesis test. I&#x27;ve got half a proof worked out on paper already. It depends very weakly (and non-parametrically) on f(x) and g(x), but it works in basically the exact way Graham wants it to. Every counterexample I can think of is really pathological. My next blog post will probably be a proof of this.<p>All this negativity is really an overreaction. I know it&#x27;s fun to totally debunk someone on details, but these are mostly fixable details.
评论 #10484126 未加载
评论 #10484674 未加载
评论 #10484443 未加载
评论 #10485368 未加载
评论 #10484916 未加载
评论 #10485271 未加载
nkurz超过 9 年前
This approach relies on the unspoken &quot;positivity&quot; assumption that the pool of applicants is large enough that there exist individuals in the biased-against category that were not selected, and moreover that these denied applicants are &quot;exchangeable&quot; with the successful ones.<p>For example, assume that we find that founders who won a MacArthur &quot;genius&quot; grant outperformed the others. Further assume that there are only a limited number of such founders, and that all available were selected. Certainly one wouldn&#x27;t want to conclude in this case that there is a bias against MacArthur fellows.<p>That seems obvious, but it gets trickier once you have lots of factors involved. What if the group you find to outperform consists of female founders with a PhD, substantial industry experience, and red hair[1]. Can you conclude that the process is biased against females? Males with PhD&#x27;s? Anyone with red hair? Generally no, unless you are willing to assume that all of your factors are causal.<p>Worse, you can&#x27;t even assume that it&#x27;s biased against people with all of the measured factors unless you also assume that all unmeasured factors are randomly distributed. If it turns out that &quot;a positive mental attitude&quot; is an unmeasured but defining characteristic of success, if the interviewers rejected applicants who had less of this but you failed to include this in your category, you would be wrong to conclude that there is an unfair bias.<p>[1] <a href="http:&#x2F;&#x2F;www.nature.com&#x2F;nature&#x2F;journal&#x2F;v453&#x2F;n7194&#x2F;full&#x2F;453562a.html" rel="nofollow">http:&#x2F;&#x2F;www.nature.com&#x2F;nature&#x2F;journal&#x2F;v453&#x2F;n7194&#x2F;full&#x2F;453562a...</a>
in3d超过 9 年前
Graham&#x27;s statement about the possible bias of First Round is unfounded. This was not any sort of a real study like Graham thinks and First Round clearly notes that. When the returns are as skewed as they are in venture capital (<a href="http:&#x2F;&#x2F;www.sethlevine.com&#x2F;archives&#x2F;2014&#x2F;08&#x2F;venture-outcomes-are-even-more-skewed-than-you-think.html" rel="nofollow">http:&#x2F;&#x2F;www.sethlevine.com&#x2F;archives&#x2F;2014&#x2F;08&#x2F;venture-outcomes-...</a>), a small sample size and a simple analysis won&#x27;t do. First Round even excluded their investment in Uber because it would skew the results too much.
评论 #10484111 未加载
compbio超过 9 年前
PG:<p>Assumption: There is no fundamental difference between a female and a male founder for achieving start-up success (average rates and variance&#x2F;distribution of rates is the same)<p>Observation: VC funded start-ups with female founders are (on average) 60% more successful than start-ups with male founders<p>Hypothesis: VC funding is biased against female founders. The ones that do receive funding are better vetted, less risky, and have higher individual qualities.<p>Experiment: Start funding more female founders.<p>If we then observe: The numbers start to even out, then there is no fundamental difference. VC funding bias may have been the cause of the difference in success rate.<p>If we then observe: The numbers stay the same, then there is a fundamental difference and our assumption is flawed.<p>Rational choice: Start funding more female founders. This either removes a bias (levels the playing field), or increases your profit (funding more potentially successful founders).<p>PG should of course not use an hypothesis to prove an assumption (experiment&#x2F;probing is needed for verification). But also: The possibility of an uneven distribution should not invalidate such an experiment (or PG&#x27;s line of reasoning), it will merely bring it to light (the numbers would stay the same, thus we have shown that the difference is fundamental and not caused by a sampling bias).
nartz超过 9 年前
Interesting thoughts. However, this argument is biased because it assumes that the performance of the applicants WHO WERE ACCEPTED is not biased by the selection process itself, and that the performance characteristics of the selected sample are representative of the performance characteristics of the total, which could be a weak assumption.<p>An attempt at translating to mathematics (feel free to correct me!):<p>X = event that person belongs to group x<p>Y = event that person belongs to group y<p>S = event that person is selected<p>W = event that person will perform like a &#x27;winner&#x27;<p>for simplicity P(X) + P(Y) = 1<p>Naturally, &#x27;unbiased&#x27; in this case is simply P(S|X) = P(S1), and P(S|Y) = P(S2), i.e. that the selection process is independent of a certain variable X or Y<p>PG says we can measure the the performance of these selected applicant winners for each class, i.e. P(X|S,W).<p>I believe PG assumes that:<p>P(X|W) &#x2F; P(Y|W) should equal P(X|S,W)&#x2F;P(Y|S,W). We can see that these are different distributions, since the second is already conditioned on the selection process.<p>Simplified, PG assumes that P(X|S,W) = P(X|W) i.e. that conditioning on the selection process does not bias the winning results.<p>Its left for the reader exercise to determine the &#x27;pathological&#x27; cases where this selection variable&#x27;s distribution makes PG&#x27;s assumption correct or incorrect.<p>However, this is simply theoretical - the actual distribution may or may not be &#x27;pathological&#x27; and the assumptions made by PG could very well be good.
eridius超过 9 年前
There&#x27;s a few trivial ways to be biased that would not be detected in this way.<p>The first is if you have multiple people accepting applicants and some of them are biased to the point of not accepting applicants of particular types. That means all the applicants that are discriminated against that did make it were simply selected by people who weren&#x27;t biased, and therefore won&#x27;t outperform anyone.<p>The second is if the actual selection process is somewhat random instead of being based on pure performance. The ones who make it through that process won&#x27;t necessarily perform any better, they&#x27;ll just be luckier.<p>The third is if the application process accepts everyone equally, and then randomly prunes out people according to a bias. This is similar to the second except the acceptance criteria is still performance-based, but because it randomly throws out people (instead of throwing out low-performers), the remaining people are still going to perform the same as those who were not pruned.<p>The first footnote on the page also points out that if the selection criteria are different for the different groups then this process won&#x27;t work, which seems like a pretty important caveat that I wish was in the article proper. One really common form of bias (especially in tech) is being biased against women, and that&#x27;s also a situation where it&#x27;s very common to (unconsciously or otherwise) use appearance in judging female applicants but ignore appearance for male applicants.
评论 #10483991 未加载
评论 #10484000 未加载
评论 #10483956 未加载
jameshart超过 9 年前
There&#x27;s a flaw right in the assumptions here: &quot;(c) the groups of applicants you&#x27;re looking at have roughly equal distribution of ability.&quot;<p>Oh. See, the problem is that if an application process <i>is</i> biased, and applicants <i>perceive</i> that bias, then those against whom it is biased will be <i>dissuaded from applying</i> unless they far exceed the required standards. Whereas those towards whom the process is biased will be <i>more</i> likely to apply, even if they are marginally qualified, because they expect to benefit from the bias.<p>So that means that if you do have a biased process, there&#x27;s a good chance it doesn&#x27;t meet criterion c - applicants in the different groups between which its bias discriminates are <i>not</i> equal in ability. So your test might verify a lack of bias, when there is in fact bias present.<p>You <i>can&#x27;t</i> verify a lack of bias just by looking at the outcomes of successful applicants - you need to look at the outcomes for unsuccessful applicants too, to determine whether your applicant pools really do meet criterion c. Or you could look at the outcomes for nonapplicants, but that&#x27;s clearly a much harder problem.
TomGullen超过 9 年前
&gt; A couple months ago, one VC firm (almost certainly unintentionally) published a study showing bias of this type. First Round Capital found that among its portfolio companies, startups with female founders outperformed those without by 63%.<p>Well, they also said in their study:<p>&gt; And we are not claiming that our data is representative of the industry...or even statistically significant.<p>Also, the wording is &quot;startups with a female founder&quot;, not exclusively female founders... I think this is a detail that shouldn&#x27;t be ignored.<p>And, the study doesn&#x27;t show how many companies out of the 300 had female founders! Maybe it was just 1! They also say &quot;Solo Founders do Much Worse Than Teams&quot;, so this is an important detail if there are no solo female teams ever backed in their firm! etc etc, the list goes on. Not exactly strong evidence to support the point PG is making, that bias would be easy to detect.<p>Measuring performance purely in terms of &quot;how much money I make&quot; is one way of doing it, but not the only way. And it wont cover the majority of jobs on the planet (how do you measure performance of someone who stacks shelves in a supermarket?)
MichaelGG超过 9 年前
I don&#x27;t understand his point about First Round Capital showing their female founders did better than companies without female founders. What does that show? How do we know that female founders aren&#x27;t simply better? Or maybe women are scared of applying, so out of women, only the best apply? In that case, the mere idea that there is a bias can cause &quot;pre-selection&quot; bias.<p>I lack the mathematics to prove this, but it seems that on the face of it, pg is simply wrong. Or I&#x27;m misreading terribly.<p>Tangentially: Speaking of bias, why doesn&#x27;t YC publish information on their companies&#x27; tech choices? PG racked up a lot of inferred cachet (positive) by stating that use of Lisp gave them a huge advantage. Now that YC has data, they should be able to show how choice of technology correlates to performance.
评论 #10484657 未加载
评论 #10484510 未加载
wycats超过 9 年前
The implication of this analysis of <a href="http:&#x2F;&#x2F;10years.firstround.com&#x2F;" rel="nofollow">http:&#x2F;&#x2F;10years.firstround.com&#x2F;</a> is that First Round is biased against founding teams with experience at Amazon, Facebook, Apple, Google, Microsoft or Twitter.<p>Can this be true?
评论 #10484895 未加载
bsder超过 9 年前
Um, the data set pg <i>cites</i> actually shows this to be fallacious.<p>They excluded Uber from the results. Which, if included, makes the male-run companies look &quot;oversuccessful&quot;. What would happen if I excluded the top female-run business, I&#x27;d bet that makes the differences between the two groups much smaller.<p>Given both the small sample size as well as the outsized influence of outliers, drawing conclusions from this population group is going to be fraught with issues.
earljwagner超过 9 年前
The phenomena of &quot;stereotype threat&quot; complicates this conclusion however: <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Stereotype_threat" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Stereotype_threat</a><p>When a member of a group is primed with a stereotype that their group underperforms at a task, they are more likely to underperform. So there could be a selection process biased against a group, and a selected member could be an above-average performer otherwise but, because of work environment, be underperforming.<p>Some universities work to remedy this through support groups or other practices aimed at under-represented minorities, and they appear to help students be more successful academically. On the other hand, there&#x27;s the Hawthorne effect... <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Hawthorne_effect" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Hawthorne_effect</a>
评论 #10484110 未加载
Phemist超过 9 年前
Footnote one should be put more generally, biases in the performance metrics (either appearance for women vs ability for men, or how many words of the US national anthem someone knows for US citizens vs rest of the world) will cause this method to fail. Unfortunately, unbiased performance metrics are quite hard on single dimensions, let alone when moving to multi-dimensional metrics
owens99超过 9 年前
Many of these comments remind me of one of the most potent biases known to man: confirmation bias. If you are smart and want to believe something, you will surely be able to come up with a mathematical (albeit flawed) way to convince yourself you are right.
j2kun超过 9 年前
There is an emerging subfield of computer science that studies what it means for data (or algorithms, or decision-making rules) to be biased, and how to remove certain forms of bias.<p>See <a href="http:&#x2F;&#x2F;fatml.org" rel="nofollow">http:&#x2F;&#x2F;fatml.org</a>
felipeerias超过 9 年前
In which a crowd of overwhelmingly white male American SW engineers tries to find a mathematical explanation to bias...<p>&quot;For Bourdieu, cultural capital is the status culture of a society&#x27;s elite insofar as that group has embedded it in social institutions, so that it is widely and stably understood to be prestigious. Schools take it as a sign of native academic ability but do not themselves impart it, performing acts of social alchemy that transform class privilege into individual merit.&quot;
kenko超过 9 年前
&quot;What it means for a selection process to be biased against applicants of type x is that it&#x27;s harder for them to make it through. Which means applicants of type x have to be better to get selected than applicants not of type x. [1] Which means applicants of type x who do make it through the selection process will outperform other successful applicants.&quot;<p>There are many, many reasons that both sentences beginning &quot;which means&quot; are false that someone who is as smart as we&#x27;re told Graham is should be able to come up with quite easily. It&#x27;s astonishing that he made this tripe public.<p>Here&#x27;s a gimme for each.<p>Say I&#x27;m selecting people to receive a prize; there are ten recipients and they&#x27;re putatively chosen by [whatever]. But I don&#x27;t like people with green eyes, so green-eyed candidates had better be pretty pleasing to me. But they can please me in <i>any</i> way, not necessarily in ways relevant to the metric for which the prize is awarded&amp;mdash;maybe I also like tall people so a really tall green-eyed person averages out in terms of my predilections. They aren&#x27;t relevantly better.<p>For the second, again, the question is &quot;better&quot; at what? Better at getting whatever is involved in getting selected? That doesn&#x27;t necessarily correlate with outperforming anyone subsequently, especially if it&#x27;s a matter of startupland. (Remember that New Yorker profile of Marc Andreessen, where Sam Altman basically admitted that he didn&#x27;t know what he was doing in terms of selecting what to invest in? The flipside of that is being selected by Altman for an investment.)
d0m超过 9 年前
I don&#x27;t get it :-&#x2F; Why is there a bias?<p>Even if the VCs are totally unbiased, why couldn&#x27;t the startups with women outperformed the others? It could happen for a variety of reasons. Just hypothetically speaking, maybe startups-with-women have different networking connections or insight that male-only-startups don&#x27;t have?
评论 #10484422 未加载
评论 #10484501 未加载
评论 #10484503 未加载
gizmo超过 9 年前
A related observation (which I&#x27;ve been making for a long time) is that the absence of mediocre women in positions of power is strong evidence of bias. Men can succeed when they&#x27;re mediocre, but women have to be exceptional. Likewise for minorities.
评论 #10484101 未加载
评论 #10483908 未加载
评论 #10484087 未加载
评论 #10485107 未加载
tracker1超过 9 年前
I just asume there is bias... I mean the fact is, bias is what youare <i>trying</i> to work in favor of... that bias being factors of success. Chasing your tail against random statistics won&#x27;t really show much, and a person is more complex that a few statistical groups. As far as investing goes, there&#x27;s also the product, and how that leader&#x2F;founder matches to that product category itself. A founder that succeeds in one category won&#x27;t definitively succeed in another. Many founders fail their first few times, and later succeed. Others fail after some success(es).<p>I think as long as reasonable steps are made to avoid certain obvious bias, the rest is mostly chance.
eatkinson超过 9 年前
This isn&#x27;t really sound reasoning, for reasons mentioned elsewhere and because of the following.<p>You need to know that the probability of acceptance is conditionally independent of the &quot;type&quot; of the applicant given the <i>success</i> of the applicant.<p>For example, consider the following hypothesis for the First Round data: women are more honest than men. A woman presenting a bad idea to a VC will be rejected whereas a man may be able to weasel his way into getting funding. This will make men have a lower success rate, and correspondingly women will have a higher success rate.<p>However, this isn&#x27;t really the same thing as having an across-the-board hidden bias against women.
Mz超过 9 年前
Actual real world example (and application of an antidote) of this basic idea: <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Rooney_Rule" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Rooney_Rule</a>
dthal超过 9 年前
In the post, PG states that First Round&#x27;s study is evidence of gender bias in VC financing. But footnote [2] is important: Uber was excluded as an outlier. Now...excluding Uber is reasonable (it <i>is</i> sort of an outlier), but so is not excluding it (it was a company that First Round invested in). When the conclusion from a data analysis depends on which way you go on something like this - which of two reasonable alternatives you pick - then the results are fragile and they don&#x27;t really support either conclusion very well.
jblow超过 9 年前
All other arguments aside ... this idea also fails if the judging party&#x27;s idea of quality is mostly uncorrelated with actual quality. Which Graham says in other essays is usually the case (it&#x27;s what you mean when you say it&#x27;s almost impossible to predict which companies will be successful).<p>Graham says the subjects of bias &quot;have to be better to get selected&quot;, but what is really going on is they have to be better <i>according to the metrics of the judge</i> which are essentially arbitrary.
评论 #10484213 未加载
评论 #10484247 未加载
proveanegative超过 9 年前
If candidates from group A perform more strongly on average than those from group B there are other possible causes than bias in the selection process itself. For instance, members of group A may only apply at a higher level of self-assessment for how likely they are to succeed than those in group B. The reason for this could be opportunity cost not present for group B, overconfidence or lack of underconference in group B or underconference or lack of overconfidence in group A.
anecon2超过 9 年前
For a formalized and empirical version of this argument applied to the entirety of the US economy, check out the following article: The Allocation of Talent and U.S. Economic Growth by Hsieh et al. (<a href="http:&#x2F;&#x2F;klenow.com&#x2F;HHJK.pdf" rel="nofollow">http:&#x2F;&#x2F;klenow.com&#x2F;HHJK.pdf</a>). It quantifies the gains from the decreases in misallocation of women and african americans as racial discrimination in employment decreased over the past 50 years.
danieltillett超过 9 年前
While we can have lots of fun arguments about the mathematics of this approach, the basic problem is the underlying data is too small and poor to draw any valid conclusion from.
mrwilliamchang超过 9 年前
I think I have a simpler counterexample to disprove pg&#x27;s hypothesis than any other counterexample I&#x27;ve read in the comments. Suppose our goal is to admit the top 5 applicants with the following performances:<p><pre><code> A - 30,000 A - 10,000 A - 9,000 B - 7,000 B - 5,000 # Cutoff point below this line A - 4 B - 3 B - 2 </code></pre> Even though admitting the top 5 by score is perfectly fair, the applicants from group A perform better.
评论 #10484850 未加载
pbnjay超过 9 年前
Fittingly, another type of bias observed in the linked report is that against Solo Founders. The report states that solo founders do worse <i>when measured against the same yardstick as multiple founders</i>. Maybe from a VC perspective this is intended (big raise =&gt; bigger exit?), but I&#x27;d argue that you don&#x27;t need to raise as much when you have a solo founder because dilution is less of a concern.
urs2102超过 9 年前
I think the first footnote in this is extremely valid. It all depends on what performance metric the selection process is identifying compared to the performance metric you use to determine success.<p>I would suspect the larger issue is that people are probably much worse at identifying what performance metrics for selection convert to their respective performance for success.
评论 #10483837 未加载
somberi超过 9 年前
This how I understand this:<p>Look back at decisions you have made under various lenses and learn about your decisions and what biases they have, so that you can avoid them or amplify them (if positive) in future.
logicallee超过 9 年前
Could someone who read it more attnetively tell me, by this methodology,<p>-&gt; If in retrospect YC finds any factor that its selected founders who turn into unicorns ($1b, $10b etc) have in common (more than its non-unicorn, also accepted founders)<p>-&gt; Then by this method, could it conclude retroactively that it had been &quot;biased&quot; against that factor? (since it is present more than in its non-unicorns whom it had also admitted; i.e. in other words, those with the factor are more performant than &quot;would be expected&quot; without the bias against it?)<p>Or have I misunderstood?
174676超过 9 年前
Even assuming equal distribution of ability, there is still the problem of whether you can measure performance without bias.
fanzhang超过 9 年前
The test pg suggests was also proposed by the economist Gary Becker [1]. Like many people here noticed, the catch is that the test only works if you compare <i>marginal</i> performance and not <i>average</i> performance. Economists call this the inframarginality problem [2]. There are a number of solutions to this problem to restore pg&#x27;s result:<p>- As pg himself says, if we assume certain statistical distributions of ability and selection rules, the inframarginality problem goes away.<p>- We&#x27;d also solve the inframarginality problem if we can tell roughly who the marginal applicants were. If pg could ask the VC firm, see who <i>almost</i> got rejected, and compare these two groups, he&#x27;d be set. pg is well-positioned to test this on the YC dataset.<p>Likewise, he could solve this problem if he can observe another variable that reveals who the marginal applicants likely were (for example, the startups that had the fewest co-investors).<p>- There&#x27;s also an entire literature out there that tries to solve the problem using other ways. For example if a system follows the &quot;KPT&quot; sufficient conditions then the inframarginality problem also goes away.<p><i>[1] One prominent approach ... is the “outcome test,” which originated in Gary S. Becker (1957). In the context of motor vehicle searches, the outcome test is based on the following intuitive notion: if troopers are profiling minority motorists due to racial prejudice, they will search minorities even when the returns from searching them, i.e., the probabilities of successful searches against minorities, are smaller than those from searching whites. More precisely, if racial prejudice is the reason for racial profiling, then the success rate against the marginal minority motorist (i.e., the last minority motorist deemed suspicious enough to be searched) will be lower than the success rate against the marginal white motorist. (From [3])</i><p><i>[2] &quot;While this idea has been well understood, it is problematic in empirical applications because researchers will never be able to directly observe search success rates against marginal motorists. This is due to the fact that we cannot identify the marginal motorist, since accomplishing this would require having complete information on all of the variables that troopers use in determining the suspicion level of motorists. Because of this omitted-variables problem, we can observe only the average success rate of searches against white and minority motorists, and not the marginal success rate. Since the equality of marginal search success rates does not imply, and is not implied by, the equality of the average search success rates, we cannot determine the relationship between the marginal search success rates of white and minority motorists by looking at average success rates. In past literature, this has been referred to as the “infra-marginality” problem. (From [3]).</i><p><i>[3] Anwar, Shamena, and Hanming Fang, &quot;An Alternative Test of Racial Prejudice in Motor Vehicle Searches: Theory and Evidence.&quot; American Economic Review. (2006)</i><p><a href="http:&#x2F;&#x2F;economics.sas.upenn.edu&#x2F;~hfang&#x2F;publication&#x2F;racial-profiling&#x2F;aer_final.pdf" rel="nofollow">http:&#x2F;&#x2F;economics.sas.upenn.edu&#x2F;~hfang&#x2F;publication&#x2F;racial-pro...</a>
graycat超过 9 年前
Okay, PG has an <i>hypothesis</i> test.<p>There&#x27;s a large literature for that, e.g.,<p>E. L. Lehmann, <i>Testing Statistical Hypotheses</i>.<p>E. L. Lehmann, <i>Nonparametrics: Statistical Methods Based on Ranks</i>.<p>Sidney Siegel, <i>Nonparametric Statistics for the Behavioral Sciences</i>.<p>In this case, PG will be more interested in the <i>non-parametric</i> case, i.e., <i>distribution-free</i> where we make no assumptions about probability distributions.<p>We start an hypothesis test with an <i>hypothesis</i>, commonly called the <i>null hypothesis</i> which is an assumption that there is no <i>effect</i> or, in PG&#x27;s case, <i>no bias</i>. Then with that assumption, we are able to do some probability calculations.<p>Then we look at the real data and calculate the probability of, say, the evidence of bias being as large as we observed. If that probability is small, say, less than 1%, then we <i>reject</i> the <i>null hypothesis</i>, that is, reject the assumption of no <i>bias</i>, and conclude that the null hypothesis is false and that there is bias. The role of the assumption about the sample is so that we know that the <i>problem</i> is bias and not something about the sample.<p>In hypothesis testing, about all that matters are just two numbers -- the probability of Type I error and that of Type II error. We want both probabilities to be as low as possible.<p>Type I Error: We reject the null hypothesis when it is true, e.g., we conclude bias when there is none.<p>Type II Error: We fail to reject (i.e., we accept) the null hypothesis when it is false.<p>When looking for bias, Type I error can be called a <i>false alarm</i> of bias, and Type II error can be called a <i>missed detection</i> of bias.<p>In PGs case, suppose we have 100 startups and five of those have women founders. Suppose for each of the startups we have the data from &quot;their subsequent performance is measured&quot;.<p>Our null hypothesis is that the expected performance of the women is the same as that of the men.<p>So, let&#x27;s find those two averages and take the difference, say, the average of the women less the average of the men.<p>PG says if this difference is positive, then there was bias, but PG has not given us any estimate of the probability of Type I error, that is, of the probability (or <i>rate</i>) of a false alarm.<p>I mean we don&#x27;t want to get First Round Capital in trouble with Betty Friedan, Gloria Steinem, Marissa Mayer, Sheryl Sandberg, Hillary Clinton, Ivanka Trump, or Lady Gaga unjustly! :-).<p>Let&#x27;s call this difference our <i>test statistic</i>.<p>So, let&#x27;s find the probability of a false alarm:<p>So, let&#x27;s put all 100 measurements in a pot, stir the pot vigorously (we can use a computer for this), pull out five numbers and average, pull out the other 95 numbers and average, take the difference in the two averages, that of the five less that of the 95, and do this, say, 1000 times. Ah, computers are cheap; let&#x27;s be generous and do this 10,000 times.<p>For a random number, how about starting with a 32 bit integer, with appropriately long precision arithmetic multiply by 5^15, add 1, take modulo 2^47, and scale as we want?<p>So, we get an empirical distribution of these differences, from the five less the 95. Looking at the distribution, we see what the probability is of getting a difference as high or high or higher than our test statistic. If that probability is low, say, 1% or less, then we reject the null hypothesis of no bias and conclude bias with our estimate of probability of Type I error 1% or less.<p>If with the 1% we reject, then it looks like First Round has done a transgression, will get retribution from Betty, <i>et al.,</i> and needs to seek redemption and Betty, <i>et al.,</i> are happy to have their suspicions confirmed. Else First Round looks like the good guys, are &quot;certified statistically fair to women&quot;, may get more deal flow from women, and Betty, <i>et al.,</i> can be happy that First Round is so nice!<p>Notice that either way Betty, <i>et al.,</i> are &quot;happy&quot;. That&#x27;s called &quot;happy women, happy life&quot;! Or, heads, the women win, tails they lose, and in no event is there a huge crowd of angry women in front of First Round&#x27;s offices with a bonfire of lingerie screaming &quot;bias&quot;!<p>When we reject the null hypothesis, we want to know that the reason was men versus women and not something else, e.g., a <i>biased</i> sample. So here is where we use our assumption of independence with the same mean.<p>Now we have a <i>handle</i> on Type I error.<p>Here we have done a <i>non-parametric</i> statistical hypothesis test, i.e., have made no assumptions, except the means, about the distributions of the male&#x2F;female CEO performance measurements.<p>And we can select our desired false alarm rate in advance and get that rate almost exactly.<p>For Type II error, that is more difficult.<p>Bottom line, what we really want is, for whatever rate of false alarms we are willing to tolerate, the lowest rate of missed detections we can get.<p>Can we do that? With enough more data, yup. There is a classic result due to J. Neyman (long at Berkeley) and K. Pearson (early in statistics) that shows how.<p>How? Regard false alarm rate as money and think of investing in SF real estate. We put our money done on the opportunities with highest expected ROI until we have spent all our money. Done. For details, an unusually general proof can follow from the Hahn decomposition from the Radon-Nikodym theorem in measure theory, e.g., Rudin, <i>Real and Complex Analysis</i>. Right, in the discrete case, we have a knapsack problem, known to be in NP-complete.<p>What we have done with our pot stirring is called <i>resampling</i>, and for more such look for B. Efron, long at Yale, and P. Diaconis, once at Harvard, now long at Stanford.<p>Tom, with a reputation as a hacker, likes to work late, say, till 2 AM. So, we look at the intrusion alerts each minute between 2 AM and 3 AM (something like the performance of the women) and compare with those of the other minutes of 24 hours (like the performance of the men) much as above and ask if Tom is trying to hack the servers.<p>Or, we have a server farm and&#x2F;or a network, and we want to detect problems never seen before, e.g., <i>zero day</i> problems. So, we have no data at all on the problems we are trying to detect because we have never seen any of those before.<p>So, to do a good job, let&#x27;s pick some system we want to monitor and for that system, get data on, say, each of 10 variables at, say, 20 times a second. Now what?<p>Our work with bias in women venture applications used just one number for our measurement and test statistic. So we were <i>uni-dimensional</i>. Here we have 10 numbers and need to be <i>multi-dimensional.</i><p>Well, in principle we should be able to do much better (pair of Type I and Type II error rates) with 10 numbers than just one. The usual ways will require us to have, with our null hypothesis, the probability distribution of the 10 numbers, but can only get something like that from smoking funny stuff -- not even <i>big data</i> is that big.<p>So, we want to need no assumptions about distribution, that is, be <i>distribution-free</i>.<p>So, we want some statistical a hypothesis test that is both multi-dimensional and distribution free.<p>Can we do that? Yup.<p>&quot;You mean you can select false alarm rate in advance and get that rate essentially exactly, as in PG&#x27;s bias example?&quot; Yup.<p>&quot;Could that be used in a real server farm or network to detect zero day problems -- security, performance, hard&#x2F;software failures, system management errors?&quot; Yup -- just what it was invented for.<p>&quot;Attempted credit card fraud?&quot; Ah, once a guy in an audience thought so!<p>How? Ah, sadly there is no more room in this post!<p>What else might we do with hypothesis tests? Well, look around at, right, <i>big data</i> or just <i>small data</i>.<p>Do we have a case of <i>big data analytics</i> or <i>artificial intelligence</i> (AI)?<p>Ah, I&#x27;ve given a sweetheart outline of statistical hypothesis testing, and now you are suggesting some things really low grade? Where did I go wrong to deserve such an insult?
评论 #10485285 未加载
WildUtah超过 9 年前
On a simple mathematical basis, this is false.<p>Consider two groups of candidates for a scholarship, A and B. We want to select all candidates that have an 80% or better chance of graduation. Group A comes from a population where the chance of graduation is distributed uniformly from 0% to 100% and group B is from one where the chance is distributed uniformly from 10% to 90%, with the same average but less variation in group B.<p>Now suppose that we select without bias or inaccuracy all the applicants that have an 80% or better chance of graduation. That means we select a subset of A with a range of 80% to 100% and a subset of B with a range from 80% to 90%. The average graduation rate of scholarship winners from group A will be 90% and that from group B will be 85%.<p>But we haven&#x27;t been biased against A. We&#x27;ve selected according to the exact same perfect evaluation process and criterion from both groups. It was just their prior distribution that was different.<p>The actual applicant groups for jobs or financing in the real world, when they are divided by demographic factors like age, sex, race, and educational level, will almost always manifest different variances in success levels even when the averages are the same. That makes this test useless and mathematically illiterate.<p>And when we use a normal distribution, as we should always expect given the central limit theorem, the mathematical problems get even more intense.<p>This short comment is not up to pg&#x27;s usual high standards for his essays.
评论 #10484520 未加载
评论 #10484474 未加载
评论 #10484406 未加载
评论 #10484200 未加载
评论 #10484321 未加载
评论 #10484236 未加载
评论 #10484918 未加载
评论 #10484173 未加载
评论 #10484415 未加载
评论 #10484152 未加载
评论 #10484789 未加载
评论 #10485994 未加载
评论 #10485071 未加载
评论 #10484945 未加载
评论 #10485246 未加载
评论 #10484161 未加载
mirimir超过 9 年前
correlation &lt;&gt; causation
devalier超过 9 年前
<i>Fortunately there&#x27;s a way to measure bias that&#x27;s much more reliable, when it can be used....A couple months ago, one VC firm (almost certainly unintentionally) published a study showing bias of this type. First Round Capital found that among its portfolio companies, startups with female founders outperformed those without by 63%.</i><p>Except if you want to use statistics to measure bias, you need a statistically significant sample. And actually, if you are studying complex human affairs, with a hundred different variables, you need more than statistical significance, you need a sensitivity analysis. It is similar to nutrition studies. There are so many variables at play that something can always be found to increase or decrease your risk of cancer by 50%. You really only need to pay attention when statistics show an order-of-magnitude correlation, as with the link between smoking and lung cancer.<p>With the First Round Capital data, they excluded Uber from their calculations, because it would skew everything. If a single data point can switch your findings to be opposite, then you just have to admit that you do not have enough data to make determination one way or another. In science it is sometimes ok to exclude an outlier, since it often indicates a measurement error. But in venture capital, you make most of your money off of the Uber-like outliers. So if you are trying to study the data to be the best venture capitalist possible, throwing out outliers is not valid.<p>Also, the initial premise is incorrect too. You cannot measure bias by comparing average results, <i>because the average is not the marginal</i>. Consider PG&#x27;s footnote: &quot;Although I used female founders as an example because that is a kind of bias people often talk about, the most striking thing was the degree to which First Round undervalue founders who went to elite colleges.&quot; Does he honestly believe that First Round is biased against founders from elite colleges?<p>At my last company my sense was that the MIT grads were better than the average programmer. So were we biased against MIT grads? Should we have hired more MIT grads until the average performance of MIT grads overall equaled the average performance of an employee overall? Should we have done more outreach to MIT? Should the industry as a whole hired more MIT grads?<p>If a talent distribution has a bunch of elite, and then a steep drop-off filled with &quot;pretenders&quot;, then you can get this type of effect without being biased.<p>When we got an elite MIT grad, we hired them. When we got a &quot;pretender&quot;, someone who was trading on the name but did not put in the work, we rejected them. And yes, I personally saw MIT grads that did terrible on simple coding exercises.<p>So even though the average MIT grad we hired was better than the average programmer at our company, there was no way to alter our hiring process to get more MIT grads. If we hired the marginal MIT grad that we rejected, we would have been worse off. Now we could do more outreach to MIT, and we did, but that is a highly competitive process. There were diminishing marginal returns to how much outreach we can do to get more applicants.<p>The statistical illiteracy of PG&#x27;s post is simply stunning. Imagine a YC company gets a 100% ROI from PPC ads, and a 50% ROI from banner ads. Are they biased against PPC ads? Should they buy more PPC ads? Such an analysis is ridiculous. You look at what you are spending on the marginal PPC ad, and you stop spending when the ROI on the marginal ad is at zero, regardless of what the average is. That one advertising channel has a higher ROI on average does not mean that the company is biased against that channel.
评论 #10484068 未加载
评论 #10484065 未加载
jsprogrammer超过 9 年前
All selection processes are biased. ■
bruu_超过 9 年前
What is the significance level? What is the model? This is freshman dorm room level analysis
SeriousM超过 9 年前
Isn&#x27;t every selection process based on experience, knowledge and&#x2F;or mood and therefore baised?
评论 #10483883 未加载