TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: Help me choose a data science research project

6 pointsby csdraneover 11 years ago
I&#x27;m mulling over the idea of working my way through a data science text and self-teaching. I&#x27;d probably create a blog to document my efforts and to serve as notes to myself. I think I&#x27;d learn more and have more fun if I had a research project that I could work through as I learn.<p>I&#x27;m very much interested in finance and economics. Additionally, professionally I work in commercial real estate. However, I don&#x27;t know how well these subjects would lend themselves to research projects. Generally trying to predict the markets is a fool&#x27;s game. So I&#x27;m wondering what unexplored, worthwhile areas of research might exist. I&#x27;m reaching out to the HN community to see if you guys have any interesting ideas. Thanks!

5 comments

j2h6mWover 11 years ago
The hardest part of my practical coursework in statistics was picking good, free data sets for final projects. Pick something awesome, your final presentation will be awesome; pick something lame, and your final project gets you an A in Dejected Foot-Shuffling 101. If anything, the best data sets <i>weren&#x27;t</i> from the sexy, unexplored fields. Remember how everyone tells you to pick classes by the professor, not the subject? It&#x27;s a similar counter-intuitive thing for data sets. Find a data set that&#x27;s rich and complete, and even if it&#x27;s not a topic you&#x27;re interested in now, you&#x27;ll secretly love it by the end of the term.<p>Enough sermonizing. Here&#x27;s a list of data set ideas that served me well in my youth:<p>1. R comes with a lot of built-in data sets. Open up R and run the command &quot;data()&quot; to see the list. Many R packages come with additional data sets (I like the diamonds one from ggplot2). All these built-in data sets are sort of small and not really project-worthy, but they&#x27;re nice if you&#x27;re just playing around with new techniques.<p>2. Government agencies release large, interesting data sets. Weather, census reports, travel statistics, public health data... The only problem is that they&#x27;re usually a pain to query. Think outside your own country. And get ready for spatial stuff.<p>3. Academic institutes release pretty neat data, too. Natural science stuff, geology stuff... Again, here comes spatial data analysis.<p>4. Data journalists sometimes publish their data along with the story, and usually, they haven&#x27;t found <i>nearly</i> all the cool stuff in there yet. This, for instance, looks insanely fun: <a href="http://project.wnyc.org/dogs-of-nyc/" rel="nofollow">http:&#x2F;&#x2F;project.wnyc.org&#x2F;dogs-of-nyc&#x2F;</a><p>5. Sports data is free like tap water, terrifyingly detailed, and deeply cool indeed.<p>6. Natural language processing. Check out Project Guterberg! I like these analysis projects... <a href="http://lotrproject.com/statistics/books/" rel="nofollow">http:&#x2F;&#x2F;lotrproject.com&#x2F;statistics&#x2F;books&#x2F;</a>, <a href="http://bost.ocks.org/mike/miserables/" rel="nofollow">http:&#x2F;&#x2F;bost.ocks.org&#x2F;mike&#x2F;miserables&#x2F;</a><p>7. Make your own data! Do you have a pedometer? Records of what temperature your house is? Some bloggers in the &quot;Quantified Self&quot; movement seem awfully cavalier about their own privacy, but they have undeniably boffo data.<p>8. And finally: commercial real estate?! There has got to be <i>so</i> much interesting data to work with there. I know you don&#x27;t think you can predict the markets, but at the very least you could make pretty maps and pictures. Maybe your company will let you play with some data, provided you show them your insights? Don&#x27;t know if they&#x27;d let you blog it all over town, though...<p>Congrats, my friend, you are one of us now. The people who drool over CSV files.
tmoulletover 11 years ago
I know that there are a few municipalities in the U.S. that have made some of their governmental records available via API. Off the top of my head, I&#x27;m not sure which ones, but there is likely a boat load of under analyzed data there. Similarly, the Census Bureau has a lot of large data sets.<p>Which book are you going to be studying?
评论 #6298032 未加载
ig1over 11 years ago
Have you looked at Kaggle ? - it&#x27;s a good place to learn as you can benchmark yourself against others and after a contest has closed there&#x27;s normally a fair amount of post-game analysis as people share approaches.
agibsoncccover 11 years ago
Browse through here: <a href="http://archive.ics.uci.edu/ml/" rel="nofollow">http:&#x2F;&#x2F;archive.ics.uci.edu&#x2F;ml&#x2F;</a><p>You might be able to pick up a few things you want to predict based on the datasets here.
stocktradrover 11 years ago
Are you looking for a programming challenge or something? I&#x27;ve got an idea off the top of my head that I&#x27;d love to share but don&#x27;t know what kind of experience you have&#x2F;looking for.
评论 #6298786 未加载