Data scientists blog about Caltrain data, come up with convoluted hypothesis about bias in sensors at two stations.<p>Commenter on blog notices that Caltrain is occasionally single-tracking between those stations due to a bridge replacement. [1]<p>"Data science" ends up with a bloody neck from Occam's razor.<p>[1] <a href="http://www.caltrain.com/projectsplans/Projects/Caltrain_Capital_Program/San_Mateo_Bridges_Replacement_Project.html" rel="nofollow">http://www.caltrain.com/projectsplans/Projects/Caltrain_Capi...</a>
Seriously, need to break that vicious cycle: you get used to things going bad, people who run them get used to not delivering, breaking due process, posting unrealistic schedules, and BAM - now you think the problem can't be solved, only plotted.<p>I don't remember last time when I have not seen a train at a station at scheduled time, and the place where I live doesn't expire confidence. When something is late, it gets in the news. Not every week.
If only Caltrain had a decent API for developers that produced data that is needed for any serious analysis (like train number, speed and lat/long). I hate having to resort to hackey workarounds like scraping for this info. I believe the NextBus API for MUNI has actual positional data and vehicle info. Caltrain, being the mediocre agency it is, has a next to useless API (if I remember the docs correctly on 511.org).