Nice read. I did something sort of similar with the same dataset about a year ago. I compared LDA (Latent Dirichlet Allocation) to TF-IDF as tools to find similar beers based on their review text. Lots of intuitive and funny topics discovered.<p>I suggest you play with LDA, it seemed to work really well at generating topics. There is also a lot of fascinating, very readable research using it. Check out SNAPs work on the same dataset [1] and some of the Yelp Dataset challenge winners [2]. If you end up interested in doing so, Gensim [3] was pleasant enough to work with.<p>[1] <a href="http://snap.stanford.edu/data/web-BeerAdvocate.html" rel="nofollow">http://snap.stanford.edu/data/web-BeerAdvocate.html</a><p>[2] <a href="http://www.yelp.com/dataset_challenge" rel="nofollow">http://www.yelp.com/dataset_challenge</a><p>[3] <a href="https://radimrehurek.com/gensim/wiki.html#latent-dirichlet-allocation" rel="nofollow">https://radimrehurek.com/gensim/wiki.html#latent-dirichlet-a...</a>
Great post! I've been thinking about writing something similar with that same BeerAdvocate data. Good job beating me to it :)<p>Instead, I ended up writing a satirical beer snob bot [1] which tweets nonsensical beer reviews using Markov Chains. Some are bad, but some are pure gold. You can read about it here [2]. The code's also on GitHub [3].<p>[1] <a href="https://twitter.com/BeerSnobSays" rel="nofollow">https://twitter.com/BeerSnobSays</a><p>[2] <a href="http://www.gregreda.com/2015/03/30/beer-review-markov-chains/" rel="nofollow">http://www.gregreda.com/2015/03/30/beer-review-markov-chains...</a><p>[3] <a href="https://github.com/gjreda/beer-snob-says" rel="nofollow">https://github.com/gjreda/beer-snob-says</a>
For anyone interested in beer and data science, my startup[1] uses machine learning and artificial intelligence to build flavor profiling and quality control tools for craft beverage producers.<p>Our models flag and predict flaws, taints, contaminations, and batch-to-batch deviations in real time from human sensory data. We then leverage our clients quality control data for flavor profile optimization, demographic targeting, and cognitive marketing - helping them sell consistently better products to their most valuable consumers.<p>[1] www.Gastrograph.com
I just came across a relevant site this morning. Hilarious hipster brew review satire: <a href="http://vicioustasting.com/" rel="nofollow">http://vicioustasting.com/</a>