I appreciate a good ol' logistic regression model. I know deep learning is hot shit right now, but this right here is probably the best way to solve most real world ML problems. Just good data, insightful features, and a simple classifier.
You can't write a blog post on how to build a statistical model <i>without stating how good the model is</i> in actuality, along with validating the other regression details such as independence of features, train/test split, etc. (I am coincidentally working on a blog post along a similar San Francisco dataset which specifically addresses these concerns, so it's on the mind)<p>Logistic regression in particular has many features which provide more information about feature importance <i>or lack thereof</i> and many metric to confirm model quality, and it is disappointing to see this post only do a high-level overview. Yes, it may be a Google trade secret, but there has to be give-and-take.
> we were able to ... utilize anonymous aggregated information from users who opt to share their location data<p>That should read, "from users who did not disable the on-by-default sharing of their location data"
"When we started the training process, many of us thought that the “fingerprint” feature described above would be the “silver bullet” that would crack the problem for us. We were surprised to note that this wasn’t the case at all — in fact, it was features based on the dispersion of parking locations that turned out to be one of the most powerful predictors of parking difficulty."<p>I assume dispersion of parking locations is the distance from parking location to destination? I would have liked to see more about what kinds of inputs they used and how they cleaned them up to account for the confounding factors they mention (public transit users, private parking.)
> in a pre-launch experiment, we saw a significant increase in clicks on the transit travel mode button, indicating that users with additional knowledge of parking difficulty were more likely to consider public transit rather than driving.<p>This shows pretty clear that we shouldn't try to accommodate cars as much as possible when there already is good public transport at a certain location.
I was just retaking CS261 on Coursera alongside a friend (we're in week 3) and they were asking, "What good is this anyways?"<p>Related techniques and how to implement them are covered in the first 2 weeks. While a lot more is going on in this system, one could call the core of the system that does this estimation "simple" for the field.
I am wondering how much of this gets to be real-time. Are they computing the difficulty of finding a spot based on Maps/Waze users' live data or using daily/weekly patterns on past data?