TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Statistical Data Mining Tutorials (2005)

56 pointsby ValentineSmithalmost 10 years ago

2 comments

graycatalmost 10 years ago
Some of his math notation is not so good.<p>His 22 slides on game theory go on and on but are not clear on just the really simple solution: It&#x27;s just a really simple linear programming problem. Could knock it off on one slide, two or three if wanted to be verbose. I did that when I taught linear programming in college and an MBA program.<p>More generally, a large fraction of these topics and a larger fraction of the more basic tools are what was long called the <i>mathematical sciences</i>, where generally the work was done more carefully, and, in particular, the mathematics of operations research along with, sure, and pure and applied statistics.<p>He ends up with genetic algorithms and simulated annealing. Gee, I encountered such a problem only once: Some guys had a resource allocation problem and formulated it as a 0-1 integer linear program with 40,000 constraints and 600,000 variables. They had tried simulated annealing, ran for days, and stopped with results with objective function value of unknown distance from the optimal value.<p>I saw an approach via Lagrangian relaxation, which really needs most of a nice course in optimization, wrote some software, and got a feasible solution with objective function value guaranteed to be within 0.025% of optimality. My software ran for 905 seconds on an old 90 MHz PC.<p>For the bound of 0.025%, Lagrangian relaxation has, on the optimal value of the objective function, both a lower bound and an upper bound and, during the <i>relaxation</i>, lowers the upper bound and raises the lower bound. When the two bounds are close enough for the context, then take the best feasible solution so far and call the work done.<p>I&#x27;d type in the basics here except I&#x27;d really need TeX.<p>The resource allocation problem was optimization, just optimization, and needed just some of what had long been known in optimization. Simulated annealing didn&#x27;t look very good, and it wasn&#x27;t.<p>Optimization, going back to mathematical programming, unconstrained, constrained, the Kuhn-Tucker conditions, linear programming, network linear programming, integer programming, dynamic programming, etc. were well developed fields starting in the late 1940s with a lot of work rock solid by 1980.<p>Good work has come from Princeton, Johns Hopkins, Cornell, Waterloo, Georgia Tech, University of Washington, etc.
评论 #9671587 未加载
cs702almost 10 years ago
These slide tutorials are <i>excellent</i>: engaging and friendly but still rigorous enough that they can be used as reference materials. They&#x27;re a great companion to &quot;Introduction to Statistical Learning&quot; and &quot;The Elements of Statistical Learning&quot; by Hastie, Tibshirani, et al. The author of these tutorials is Andrew Moore, Dean of the School of Computer Science at Carnegie Mellon.
评论 #9671786 未加载