A few comments have mentioned neural nets in this post. adamnemecek mentions in this thread that PGMs are a superset of neural networks, and
and Thomas Wiecki has a few excellent blog posts on creating bayesian neural networks using pymc3.[0][1][2] If you're curious about how these two concepts can be brought together I highly recommend reading through these three posts.<p>[0] <a href="http://twiecki.github.io/blog/2016/06/01/bayesian-deep-learning/" rel="nofollow">http://twiecki.github.io/blog/2016/06/01/bayesian-deep-learn...</a><p>[1] <a href="http://twiecki.github.io/blog/2016/07/05/bayesian-deep-learning/" rel="nofollow">http://twiecki.github.io/blog/2016/07/05/bayesian-deep-learn...</a><p>[2] <a href="http://twiecki.github.io/blog/2017/03/14/random-walk-deep-net/" rel="nofollow">http://twiecki.github.io/blog/2017/03/14/random-walk-deep-ne...</a>
PGM's are great, but my experience from Koller's course is that it is very hard to identify cases where they can be used.<p>Part of the reason is that you need a-priori knowledge of the causal relationships (coarse grained I.e direction) between your variables.<p>Presumably if you're doing ML you don't know those causal relationships to begin with.<p>Particularly good fits are things like physics where laws are known.
This is the best textbook on graphical models, also from Jordan but later (2008): <a href="https://people.eecs.berkeley.edu/~wainwrig/Papers/WaiJor08_FTML.pdf" rel="nofollow">https://people.eecs.berkeley.edu/~wainwrig/Papers/WaiJor08_F...</a>. It also covers some general theory of variational inference. Source: I worked on PGMs in grad school.
This course also referred M.I Jordan book: <a href="http://imagine.enpc.fr/~obozinsg/teaching/mva_gm/fall2016/" rel="nofollow">http://imagine.enpc.fr/~obozinsg/teaching/mva_gm/fall2016/</a><p>One of the best course I have ever taken, F. Bach and G. Obozinski are incredible teachers.
There's an excellent course on PGM by Koller on Coursera. My friend took it and now he's a PGM evangelist. If you are wondering where PGM lies in the spectrum of machine learning, you should research the difference between generative and discriminate modeling. We have been driven to PGM to solve our ML problem that was hard to frame as A NN. Mainly because we had some priors we needed to encode to make the problem tractable. It reminds me a little of heuristics in search.<p>The person I'm talking to: an early ML student.
Good article: "Big-data boondoggles and brain-inspired chips are just two of the things we’re really getting wrong" - Michael I. Jordan ref: <a href="http://spectrum.ieee.org/robotics/artificial-intelligence/machinelearning-maestro-michael-jordan-on-the-delusions-of-big-data-and-other-huge-engineering-efforts" rel="nofollow">http://spectrum.ieee.org/robotics/artificial-intelligence/ma...</a>
From: <a href="http://spectrum.ieee.org/robotics/artificial-intelligence/machinelearning-maestro-michael-jordan-on-the-delusions-of-big-data-and-other-huge-engineering-efforts" rel="nofollow">http://spectrum.ieee.org/robotics/artificial-intelligence/ma...</a><p>Jordan: Well, humans are able to deal with cluttered scenes. They are able to deal with huge numbers of categories. They can deal with inferences about the scene: “What if I sit down on that?” “What if I put something on top of something?” These are far beyond the capability of today’s machines. Deep learning is good at certain kinds of image classification. “What object is in this scene?”<p>I think Jordan refers here to Bayesian models that incorporate gravity, occlusion, and other such concepts.<p><a href="http://www.cv-foundation.org/openaccess/content_cvpr_2013/html/Jiang_Hallucinated_Humans_as_2013_CVPR_paper.html" rel="nofollow">http://www.cv-foundation.org/openaccess/content_cvpr_2013/ht...</a> e.g. postulates entire humans to improve scene understanding.<p>What I get out of this: Deep learning has to be enriched with progress from other machine learning fields
Nice high level talk on Statistical Inference for Big Data by Jordan. Been one of my favorites since his LDA/PLSA papers in 2003 with Andrew NG.
<a href="http://videolectures.net/colt2014_jordan_bigdata/?q=jordan" rel="nofollow">http://videolectures.net/colt2014_jordan_bigdata/?q=jordan</a>
Note there's an exploding literature that reads these models as causal models: <a href="http://ftp.cs.ucla.edu/pub/stat_ser/r350.pdf" rel="nofollow">http://ftp.cs.ucla.edu/pub/stat_ser/r350.pdf</a>