TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Making Sense of Standard Deviation

32 pointsby motxiloover 14 years ago

7 comments

jasonlotitoover 14 years ago
A story of my last use of Standard Deviation.<p>My day to day job deals with credit card processing. A lot of my job deals with ensuring that transactions occur securely and reliably. It's a tedious job, it's not exciting, and involves a lot of testing, but I love the sense of knowing that actual money is flowing through a system I built.<p>Anyways, one of the things I wanted to do was build an automate alert system that would notify me of problems with processing transactions. Looked at it from high up, the system is fairly stable, and while it would be easy to notice if transactions suddenly stopped on the entire system, this rarely, if ever, actually happens (and hasn't happened except for planned &#60; 1 minute outages).<p>However, consider the system runs through it many small individual sites, looking at all transactions is fairly useless. Instead, I wanted a warning to notify me when any specific account was suffering. Each account is different, and accounts for a variable number of transactions each day. Some accounts do more, some less. Their are other variables: some accounts do well at different types of the day because of where they are promoted in the world. Weekends generally see an uptick, but this again is variable.<p>So, I developed a system (using standard deviation) that essentially looked at an accounts history for the past X time period for a certain period of time throughout the day. Some accounts are inspected by looking at the numbers in the past hour (accounts with steady transactions), others are looked at over the last few hours, others are looked at over the day.<p>Obviously, we don't alert ourselves to certain cases that fall outside standard deviation, and we've adjusted the numbers to look at other areas, but the result of using standard deviation in this way suddenly opened up a new way at looking at our numbers and evaluating the current status of our system, as well as the accounts using it. Even if a problem doesn't exist on our end, we can alert the people that handle these accounts that a problem might exist, allowing them to take necessary action.<p>Understanding standard deviation, understanding how it can be used (along with other associated tools of math) makes for some really interesting things that you can do to improve your system as a whole.
评论 #2100289 未加载
评论 #2129873 未加载
telover 14 years ago
A small but vital correction is that the "biased" estimator will approach a population's std dev as the sample size increases. This is clear since the adjustment is N/(N-1) which tends to 1 with large N.<p>It's not often a huge deal, but I'm unconvinced that biased estimators deserve such a derogatory name. Using a unbiased estimate does not actually mean you've got a better estimator. I think this is part of why there seems to be no intuitive description of why the N/(N-1) factor removes bias. You instead need to invoke the powerful, abstract ideas of sufficiency and completeness.
评论 #2100578 未加载
评论 #2100880 未加载
klenwellover 14 years ago
I don't have a lot of practical day-to-day use for standard deviation and possess a vague understanding of it. Like the author, I resort to Wikipedia periodically and go away with no stronger an intuitive sense than when I began. I think the author offers a great explanation for the advantages of a more intuitive understanding. This article and its simple graphics definitely provided a more intuitive sense of the concept.<p>I'll wait a while for the statistics experts to weigh in here before I write it to my cognitive hard disk. :)
crikliover 14 years ago
I explain it like this:<p>It's 3rd and 4. You have two running backs. Running back A averages 8 yards a carry with a standard deviation of 6. Running back B averages 5 yards a carry with a standard deviation of half a yard.<p>Which guy do you choose? If you just look at averages, you probably would choose A. But if you think of the standard deviation as an indication of consistency you choose B, because his 5 yard gains are very consistent, whereas A might bust one for big yards, but he's also likely to get dropped for a loss.
评论 #2100338 未加载
amalconover 14 years ago
The main reason to use standard deviation instead of mean absolute deviation is that<p><pre><code> sqrt((f(x)-g(x))^2) </code></pre> is differentiable, while<p><pre><code> |f(x)-g(x)| </code></pre> is not. Though exaggerating the larger differences is often a desirable property, the ability to do calculus is nearly always a desirable property.
评论 #2101891 未加载
yummyfajitasover 14 years ago
The author makes a minor mistake.<p>You can make an unbiased estimate of the <i>variance</i> using the formula s^2 = (sum of squared deviations) / (N-1). However, taking the square root of this gives you a biased estimate of the <i>standard deviation</i> due to the concavity of sqrt.<p><a href="http://en.wikipedia.org/wiki/Bessels_correction" rel="nofollow">http://en.wikipedia.org/wiki/Bessels_correction</a>
评论 #2101472 未加载
NY_Entrepreneurover 14 years ago
Can think of standard deviation much like a <i>distance</i>.<p>So, if have random variables X and Y, if they have finite means and standard deviations, and if we denote standard deviation by Std(X) and variance by Var(X) and set Z = X + Y, then Var(X) = Std(X)^2 and we get the Pythagorean theorem<p>Var(Z) = Var(X + Y) = Var(X) + Var(Y)<p>if we can assume that X and Y are <i>uncorrelated</i> which is true if X and Y are independent. So, uncorrelated is analogous to perpendicular in geometry.