TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

An Interview with an Anonymous Data Scientist (2016)

265 pointsby PaulJuliusover 7 years ago

25 comments

Terr_over 7 years ago
Good interview, there are a bunch of bits I feel like I ought to be Quoting For Truth but then I&#x27;d end up with a pretty bloated reply.<p>&gt; I want to emphasize that historically, from the very first moment somebody thought of computers, there has been a notion of: “Oh, can the computer talk to me, can it learn to love?” And somebody, some yahoo, will be like, “Oh absolutely!” And then a bunch of people will put money into it, and then they&#x27;ll be disappointed.<p>Reminds me of a pre-transistor computing quote from Charles Babbage, about some overeager British politicians:<p>&gt; On two occasions I have been asked, — &quot;Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?&quot; In one case a member of the Upper, and in the other a member of the Lower, House put this question. I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
评论 #15905815 未加载
评论 #15903492 未加载
评论 #15902262 未加载
评论 #15902288 未加载
srikuover 7 years ago
&gt; You become so acutely aware of the limitations of what you’re doing that the interest just gets beaten out of you. You would never go and say, “Oh yeah, I know the secret to building human-level AI.”<p>A colleague of mine called these &quot;educated incapacities&quot; - where we become acutely aware of impossibilities and lose sight of possibilities. Andrej Karpathy, in one of his interviews iirc, said something like &quot;if you ask folks in nonlinear optimization, they&#x27;ll tell you that DL is not possible&quot;.<p>It is useful to keep that innocence alive despite being educated, especially if the cost to trying something out doesn&#x27;t involve radical health risks. That plus a balance with scholarship.<p>Knowledge, courage and the means to execute are all needed.
评论 #15902808 未加载
评论 #15902769 未加载
评论 #15903460 未加载
nocoderover 7 years ago
I work at a tech company and one of the things I have recently noticed is how ML and AI terms are being increasingly used by the business people. The guys who have no technical understanding, these are accountants or marketing guys saying we should ask tech team to design ML to solve these problems. Its as if ML is a thing to through at every kind of imaginable problem and it will be magically solved. I believe a lot of this has to do with PR around this by big tech companies. Take for example, the recent alpha zero vs stock fish PR, it has been spun around by Google in a way as if it was some magic. You hear a lot about how it took just 4 hours and I find it hard to explain to people that 4 hour time is meaningless. It is about how many games it could play in that time. Moreover the match happened between two systems on a different hardware and that is a big difference and also the fact that it used a arbitrary type of time control of, 1 min&#x2F;move. Again this can make big difference but it is a big struggle to get past this PR fluff. To be clear, I am not denying the advances made by deep mind, I just want people to understand that it has come on back of probably the the world best team of scientists alongside state of the art Google designed hardware and incredible monetary resources of Google.
评论 #15908503 未加载
trtsover 7 years ago
This articulated so much I have learned about the field in the past 5 years. As someone who inherited the title &#x27;data scientist&#x27; because that&#x27;s how my department designated us when it became fashionable, felt fraudulent due to the unlimited expectations of what data science is vs. what I understood it to be, and subsequently has interviewed probably nearly a hundred data science and machine learning &#x27;experts&#x27;, there seems to be little cohesion to what these terms describe, little understanding by laypersons about data science besides that it is some kind of magic that only the very gifted can command, and no greater distance between hubris and praxis that I have seen sustain itself for so long and so intensely.<p>The whole interview was an absolute joy to read.
carlsborgover 7 years ago
It was 2016 and he said &quot;I’ve noticed on AWS prices was that a few months ago, the spot prices on their GPU compute instances were $26 an hour for a four-GP machine, and $6.50 an hour for a one-GP machine. That’s the first time I’ve seen a computer that has human wages..&quot;<p>Minimum wage (or thereabouts $7.20) now gets you a whopping p2.8xlarge (8 GPU, 32 vcpus, 488GB RAM), and the single GPU machine p2.xlarge is now $0.9 per hour.<p>This is a crazy data point. What will minimum wage buy you five years from now?
评论 #15903926 未加载
评论 #15905384 未加载
CalChrisover 7 years ago
This reminds me of ... What’s the difference between a data scientist and a statistician? A data scientist lives in San Francisco.
评论 #15903788 未加载
评论 #15902692 未加载
评论 #15903222 未加载
评论 #15903944 未加载
评论 #15905796 未加载
sundarurfriendover 7 years ago
It&#x27;s an interesting read, though not very enlightening in terms of new information. It&#x27;s same old pre-existing arguments put in a more informal, more directly honest package.<p>As another person who&#x27;s seen robots fall over again and again and has a scope for the difficulty of the problem, I&#x27;d say there&#x27;s also the risk of the day to day failures making us lose sight of the forest for the trees, with availability bias working against us.<p>Also,<p>&gt; the Y Combinator autistic Stanford guy thing<p>&gt; the Aspy worldview<p>It&#x27;s a bit worrying that use of these terms has turned into a kind of slur, to lump a kind of imagined stunted-worldview with a medical diagnosis. Not particularly pissed that this guy used these, more worried about what it indicates - that these have become so common as to infiltrate friendly informal conversations from seemingly intelligent people.
评论 #15905943 未加载
MikeGaleover 7 years ago
It is just so amazingly refreshing to read something not put together by a know-nothing.<p>I wish I saw more than one or two of these a year.
comstockover 7 years ago
Any bets on when the current deep learning bubble is going to burst?<p>It’s shocking to me how much technical people buy into this, how “this time it’s different” and AI isn’t “over-promising and substantially under-delivering” this time. Really odd to watch it come round again, when the reality is we’re more likely to see some near incremental progresses, partly fueled by more compute and algorithmic advances. Partly by a lot of PR.
评论 #15902388 未加载
评论 #15902786 未加载
评论 #15905130 未加载
评论 #15902335 未加载
deviationblueover 7 years ago
I&#x27;ve noticed an alarming uptick in articles around job titles and what people call themselves, so I feel compelled to say something. I couldn&#x27;t be bothered what someone calls themselves as long as they can actually get shit done. The focus on titles is misplaced, especially for people who work in BigCo, as most titles in such places are handed down by HR anyways so I don&#x27;t focus too much on them. But what does the person actually doing on a day to day basis? Is it stats? Is it exploratory analysis and modeling? Are they using ML, or working with data that doesn&#x27;t fit on a single commodity machine? Writing people off based on what titles they might have had at some job (which they probably might not have any control over) is a good way to lose out on talent that you might have appreciated. But of course, this cuts both ways, would you want to work for someone who gets hung on things like that?<p>Anyway, overall great article, but this was the one thing that bothered me enough to comment.
nicolewhiteover 7 years ago
I enjoyed his comments on Tensorflow.<p>&gt; It’s really bad to use. There’s so much hype around it, but the number of people who are actually using it to build real things that make a difference is probably very low.<p>I wonder how many data scientists out there are actually developing Tensorflow models for a mission-critical project at work. I&#x27;m not. I have used Tensorflow successfully within my personal projects, but I&#x27;ve yet to need it for anything &quot;real.&quot;
评论 #15902952 未加载
评论 #15902720 未加载
评论 #15903623 未加载
EdwardDiegoover 7 years ago
Can anyone comment on his point about Spark&#x27;s ML libs? I note that was from last year (about 2015 code), not sure what level of beta they were at, but yeah, I use it for batch processing, but have never used the ML aspects, so just curious.<p>&gt; And even up to last year, there’s just massive bugs in the machine learning libraries that come bundled with Spark. It’s so bizarre, because you go to Caltrain, and there’s a giant banner showing a cool-looking data scientist peering at computers in some cool ways, advertising Spark, which is a platform that in my day job I know is just barely usable at best, or at worst, actively misleading.
评论 #15904552 未加载
Jesus_Jonesover 7 years ago
Hah, this is a great interview! [You can&#x27;t really trust someone who calls themselves a data scientist, they are just taking that exciting and financially rewarding name], loosely paraphrasing. Too bad it is anonymous. It totally fits my unfair preconceptions of this field. I know, I&#x27;m a &quot;computer scientist&quot; with a phd, its not a real science if you have to put science in the name, that&#x27;s what they tell me.
评论 #15902173 未加载
perturbationover 7 years ago
I&#x27;ve been seeing nothing but negative, dismissive comments about data science on HN lately, which is really disappointing. There&#x27;s definitely a lot of hype right now about DL, but almost all of my job does not deal with Big Data or Deep Learning, &#x27;just&#x27; machine learning + stats + calc + scripting + data cleaning + deploying models.<p>I think most people don&#x27;t have big data (Amazon has an x1 with 4 TB of RAM, after all!) but there&#x27;s no shame in that. I&#x27;ll use a big machine for grid search or other embarrassingly parallelizable stuff, but I can confirm that Spark is usually a bad tool for actual ML unless you use one of their out-of-the-box algos. Even then, tuning the cluster on EMR with YARN is a pain, especially for pyspark. There&#x27;s a gap, I think, between the inflated expectations of &quot;I&#x27;m going to get general AI in 5 years and CHANGE THE WORLD&quot; and &quot;this K-means clustering will be a good way to explore our reviews&quot;, but somewhere in the middle there is actual value.<p>(I also hate that &quot;AI&quot; is becoming the new hype-train; I don&#x27;t consider anything of what I do to be &quot;AI&quot;, but you have people calling CNNs or even non-deep-learning models &quot;AI&quot;). This is only going to result in inflated expectations- DS practitioners have to communicate the value without hype, and also find a way to weed out charlatans.
评论 #15913055 未加载
评论 #15904461 未加载
评论 #15904244 未加载
otalpover 7 years ago
Jeff Hamerbacher, the guy who coined the term Data Science, also said &quot;The best minds of my generation are thinking about how to make people click ads. That sucks.”
评论 #15904370 未加载
d--bover 7 years ago
As important as it is to debunk the hype surrounding AI, it is also important to note that the recent advances in neural nets hinted that we&#x27;re onto something regarding the functioning of the brain, and in my opinion, it would be equally foolish to dismiss the _possibility_ of a breakthrough that would get us much closer to general AI (for instance if someone came up with some kind of short-term &#x2F; long-term memory mechanism that works well)<p>I personally think that the main reason why general AI may be very far away is because there is little incentive today for working on it. Specialized AI seemss good enough to drive cars. Specialized AI should be good enough to put objects in boxes, cut vegetables and flip burgers and so on, and the economical impact of building that is much greater than the economical impact of making a robot that barely passes the turing test and that&#x27;s otherwise fairly dumb or ethically unbounded.
brucephillipsover 7 years ago
&gt; the data sets have gotten large enough where you can start to consider variable interactions in a way that’s becoming increasingly predictive. And there are a number of problems where the actual individual variables themselves don’t have a lot of meaning, or they are kind of ambiguous, or they are only very weak signals. There’s information in the correlation structure of the variables that can be revealed, but only through really huge amounts of data<p>This isn&#x27;t really true, since this can be said of any ML model. ML is nothing new. Deep learning is new. It works because we have so much data that we can start to extract complex, nonlinear patterns.
vadimbermanover 7 years ago
&gt; I feel like the Hollywood version of invention is: Thomas Edison goes into a lab, and comes out with a light bulb. And what you’re describing is that there are breakthroughs that happen, either at a conceptual level or a technological level, that people don’t have the capacity to take full advantage of yet, but which are later layered onto new advances.<p>Brilliant.
ramtatatamover 7 years ago
I&#x27;m not native English speaker and I find this sentence from the article weird:<p>&gt; Because the frightening thing is that even if you remove those specific variables, if the signal is there, you&#x27;re going to find correlates with it all the time, and you either need to have a regulator that says, “You can use these variables, you can&#x27;t use these variables,” or, &gt; I don&#x27;t know, we need to change the law. As a data scientist I would prefer if that did not come out in the data. I think it&#x27;s a question of how we deal with it. But I feel sensitive toward the machines, because we&#x27;re telling them to optimize, and that&#x27;s what they’re coming up with.&quot;<p>So is he saying that he is worried optimisation throws results that are not what he would like to see?
评论 #15907799 未加载
ytersover 7 years ago
DL is hyped as a big thing, but why are multiple layers on a NN a breakthrough? The only breakthrough is hardware, but I don&#x27;t see that hyped.
评论 #15906095 未加载
评论 #15908233 未加载
kerbalspaceproover 7 years ago
Am I the only one who was expecting to learn about data science and instead I got some moralising?
DrNukeover 7 years ago
Different communities play a game at different times: the pioneers at first, then the early comers, then the businessmen, then the masses, in the end the legislators.
reesefitzover 7 years ago
I feel so many data scientists are bullshit. I had the worse interviews, like someone telling me about how ARIMA is so good and why would I even use a LSTM network. Even worse is they cite some bullshit consulting article with skewed data to prove their point.
reesefitzover 7 years ago
some interviews ask me the stupidest questions &quot;how large is your dataset?&quot; , &quot;have you ever worked with 100GB of data&quot;. fucking morons
eanzenbergover 7 years ago
Eh, pretty disappointing interview. It doesn’t tske a team to utilize gpu computing, it takes one person and I’ve done it. Also, you can’t complain about there being no strong-ai companies and then list accomplishments of strong-ai companies.<p>I personally don’t like the phrase data scientist but I get it and I get why it’s science as opposed to engineering. I personally like the split between machine learning, BI, and data engineering.
评论 #15902879 未加载