TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Engineering Practices in Data Science

24 pointsby numlockedover 12 years ago

4 comments

jbogganover 12 years ago
Well I guess I'll pat myself on the back for always using git. Having seen a lab's major university research project literally evaporate due to lack of proper version control I've been keen on it ever since. Besides, I like the incremental feel of accomplishment when I make the next commit.<p>I agree that setting up a dedicated pipeline early on in a project and committing yourself to work within its confines aids organization, but it also contributes to putting your creativity in the right place. We often enjoy building things for the sake of building and those of us with stronger proclivities towards engineering can sometimes get too jolly putting together new pipes when we should be training models. I have been guilty in the past of writing a Perl script that spawned shell scripts that cued Perl scripts on a cluster that ran R scripts that piped back into Perl scripts. Then again I always liked the boardgame Mouse Trap.
rm999over 12 years ago
Cool article, thanks. I used to spend 90% of my downtime trying to improve my stats and machine learning knowledge, but in the last couple years I've come to realize how much my lack of proper engineering was hurting me.<p>A (flawed) analogy: if data science is storytelling, stats is the story and engineering is the words you use to tell it. You need to do well at both to effectively tell your story.<p>&#62;...good engineering going out the window quickly with elaborate ensembling.<p>This is one of my criticisms of data mining contests (sorry kaggle!). When I was in grad school I liked doing these contests to get practical experience - my last company actually recruited me through one. But as they got more popular I found the best engineers and data scientists stopped having as much of a chance of winning. Good modelers get 90% of the way there and then are beaten by impractical solutions that would get someone fired from a job.
评论 #4614483 未加载
numlockedover 12 years ago
It appears our blog is down. Oof. Apologies for the inconvenience. While we're figuring it out on it here is a mirror of the post: <a href="http://blog.untrod.com/2012/10/engineering-practices-in-data-science.html" rel="nofollow">http://blog.untrod.com/2012/10/engineering-practices-in-data...</a><p>Edit: And we're back up.
评论 #4614485 未加载
peatmossover 12 years ago
I love definitions that wax my ego! I grok revision control, and can sling enough R / stats to officially make me a nerd of an urban planner. Does this mean I can start calling myself an "urban data scientist"? I definitely need a pay raise.