One of the great joys of my bachelor's degree in psychology was being invited to take a graduate level course on Item Response Theory (with Professor Jack Vevea, now at UC Merced). I wouldn't have fallen in love with programming and become a software developer if I hadn't taken it. 1<p>The Rasch Model is a specifically simplified case of item response theory, but I'd argue that it may not be the best one for stitch fix. That's not to say that it can't be useful, but rather that the simplifications and assumptions of the Rasch model may lead to information that does not reflect the customer's measurements as well as a more sophisticated model could. Of course it very well may be good enough, but it serves as a somewhat useful exploration of the<p>The Rasch model is an attempt to differentiate two associated sets of information, the latent trait of the test taker/question answerer (in this case, their measurements) and the difficulty of the question (in this case, whether the item is too big, too small or just right). Basically the Rasch model treats the level of a latent trait of an individual as a function of the difficulty of a test question and what they answered.<p>But the model purposely ignores the question of the discrimination of the question, that is, how good is the question at differentiating between those who's latent trait differs, and just assumes that the discrimination (the slope of the line reflecting the model of the question's difficulty) is not relevant. Other models see this as relevant.<p>For example, if StitchFix offers a belt with a number of different holes, some people may feel the belt is too small if they are forced to use the last hole, some the second last hole. A question about such a belt that just asked if it was too large, too small, or just right might have low discrimination in terms of identifying an individuals underlying size. Likewise someone who has bigger thighs but a relatively slim torso might have different answers about a pair of slim fit pants of size x which are too small for their thighs, and a belt of size x. Thus questions about pants may have a higher discrimination then questions about a belt.<p>Item Response Theory outside of the Rasch also has a third factor to consider on a per question and per individual basis, which is basically the propensity to guess. Basically, how likely is someone to think carefully about the question as opposed to just putting down a random answer , and likewise are some questions more likely to have people answer blithely instead of earnestly.<p>The other thing to consider is that in most IRT tests, the latent trait is assessed at a single time for multiple questions. But weight/fit/measurements are here being assessed item by item, as they are tried, and the underlying fit may be changing if a person is gaining weight or bulk, retaining water, or recovering from thanksgiving dinner. While it's unlikely that someone's weight or size would change radically in a brief period, a model that weighed items that were tested more recently might better reflect the individual's measurements.<p>Of course it's been years and years since I took the class, so any screwup in this comment should reflect on me, and not my professor.<p>1 I was writing a function in R to speed up an IRT model fitting a curve in a way that let me do it in seconds instead of hours (It's been a while but I think it was identifying the point of the curve where the slope is maximized), in any case it was a time consuming computation if you check every possibility linearly to 6 decimals for hundreds of test takers, but I figured that there weren't local maximums and optimized with something like a binary search (but by decimal place), before I had ever heard of binary searches, and getting that sort of efficiency jump was deeply satisfying.