TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Every modeler is supposed to be a great Python programmer

189 点作者 jeffreyrogers超过 2 年前

35 条评论

version_five超过 2 年前
This was something that surprised me after I did my PhD as well. I thought that employers would focus on my specialized skills and &quot;someone else&quot; would somehow pick up the pieces and make something out of what I did. Turns out this is completely wrong, and I now see how frustrating it is to work with people that have this kind of attitude.<p>Most of most jobs is a bunch of mundane stuff. I&#x27;ve seen it in software development, and I&#x27;ve seen it in management consulting. The best people, typically, are those that will happily do both, understanding that the fun stuff comes with a lot of baggage.<p>The &quot;someone else is better at the stuff I don&#x27;t want to do than me&quot; argument rarely holds up either. The friction that comes from dividing the work along lines like modeling and production and trying to hand off is rarely worth it when one person can do both.<p>Anyway, I&#x27;ve been where the author is, but personally I think it&#x27;s wishful thinking, unless maybe you want to start your own shop and structure it around yourself that way.
评论 #33916083 未加载
评论 #33915376 未加载
评论 #33915520 未加载
评论 #33916400 未加载
评论 #33914898 未加载
评论 #33923896 未加载
评论 #33914903 未加载
评论 #33916819 未加载
评论 #33914724 未加载
评论 #33915215 未加载
评论 #33915447 未加载
评论 #33918237 未加载
评论 #33926993 未加载
opportune超过 2 年前
Everybody wants to work on the fun stuff. Turns out in software, getting something to actually deliver value (meaning convert it from prototype or proof of concept to something running in production) is 90% not fun stuff. Most people learn that in their first job.<p>As a software engineer the only way I’d become a “data scientist&#x2F;trader’s servant” is if I’m getting paid exorbitantly. Otherwise it’s the worst kind of work because someone else is going to get credit for everything that goes right while I take the flak for all the hard stuff.<p>The converse is that to get a software engineer to be your servant you better also be really good, or else you’re probably going to end up paired with someone who doesn’t have the luxury of taking other jobs and maybe won’t be particularly good at getting your stuff to actually run.
评论 #33920936 未加载
评论 #33916996 未加载
评论 #33919374 未加载
noobermin超过 2 年前
The modeler who can write code unlike you will get hired first.<p>Okay, now that I said the provocative thing to kind of drive home how real and serious this point is, I will say that I am not a statistician, but I run more computational physics simulations, so less statistical modelling but modelling of experiments and systems based on the PDEs. The one thing I observe is that there really is only this patience for this lack of understanding your own tools for theorists. Experimentalists can fix their own tools, they can open up the casing and resolder the boards if they need to, heck most of them can fix their own cars. But you have no idea how many computational scientists just load up Lumerical or Ansys and just click around but really have no concept or idea how it works under the hood beyond just things they show on intro slides to talks. Some know how to script say Meep or something if they&#x27;re good but they&#x27;ve never implemented a DE solver themselves unless it was in a class in college or first year grad school then they forgot it all.<p>You really only have this disconnect from your own tools for theorists. Programming is your breadboard, your substrate. Code is the material you use to do your work. I don&#x27;t understand why it is okay for theorists of all stripes to slide on never really understanding how their on research actually works on a computer whereas every experimentalist I&#x27;ve ever known could recreate their entire experimental apparatus from scratch if they were paid to do so. But that&#x27;s okay, because that means as long as too many theorists can only write equations and then have to have someone to hand hold them so they actually <i>do</i> the things they&#x27;ve written down, I will be valuable and have job opportunities for myself. It would however make life easier for myself and lessen the many headaches I&#x27;ve been subject to, and heck, may be science could move forward a little better, yadda yadda.<p>I probably shouldn&#x27;t encourage my competition like that especially when they&#x27;re injuring themselves but that &quot;move science forward and lessen my headaches&quot; vibe does make me want to share the sentiment so that theorists at least understood on some level how the libraries they import work sometimes.
评论 #33920754 未加载
bluedino超过 2 年前
I loved Python. It was easy to learn, very powerful, has libraries for everything...<p>Then I started supporting researchers and scientists who wrote &quot;python code&quot; to run simulations etc.<p>Most of it&#x27;s pretty basic, install some scientific code published by some research group. They chuck their data in and run it.<p>But then they started abusing virtual environments, writing their own code, cutting and pasting, commenting out random lines because they saw someone else &quot;fix&quot; something that way... and they all want Jupyter notebooks.<p>Now it&#x27;s like an eternal September plus I get to deal with annoyingly slow package managers like Conda and rough academic projects with poor documentation and little testing.
评论 #33918287 未加载
评论 #33916707 未加载
评论 #33916463 未加载
评论 #33916524 未加载
评论 #33916673 未加载
FridgeSeal超过 2 年前
&gt; every one of them wants Python. I haven’t seen a single one where they’re looking for R or even C++; Python rules this roost.<p>Tried putting R into production recently? It’s a frustrating and brittle experience. Don’t get me wrong, R is fantastic at what it does - analysis, research, statistics, and arguably the API’s on the R data frame packages are a lot saner than Pandas.<p>C++ is out for different reasons I suspect. As this touches on, “modellers” (and data scientists&#x2F;engineers) <i>ought</i> to be decent developers, but a lot of them are not, and in my experience actively refuse to learn any of these skills, the comfort zone is “jupyter notebooks” and that’s it. Getting them to write C++ (disregarding language debates), a language that is unequivocally more difficult and fraught with complexity than Python is basically a non-starter.<p>Do I wish it was different? Yep. Do I wish there was some more variety in the “language ecosystem” so it’s more than just the “lowest common denominator Python” dominance? Absolutely.
评论 #33916439 未加载
评论 #33916550 未加载
评论 #33916171 未加载
评论 #33918803 未加载
a_t48超过 2 年前
&gt; I’d much rather get something working and then hand it off to someone else who can refactor it for speed and clarity, and have it conform to the desired style conventions, etc. etc.<p>I&#x27;ve been on the coding end of this - when everyone actually has those fixed roles and goes into it eyes wide open, it goes pretty well! When instead the PhD is assigned to go do the feature, and then the programmer is called in later when it&#x27;s not implemented well, it tends to go quite poorly and nobody is happy, as everyone&#x27;s time is wasted.
toddm超过 2 年前
This resonates so strongly with me.<p>I&#x27;m a modeler and best described as a lifelong scientific programmer. I&#x27;m much, much better at doing the specialty science I was trained to do and have done than write unit tests (for Jupyter notebooks? why?) or struggle through big-O questions (again, why? I write ~100 line programs that are never production code). Like the author of the post, I am not a professional developer and don&#x27;t pretend to be or want to be. There are people out there way better than me for doing those jobs.<p>Recently, I interviewed for a role as a computational chemist - this is an ideal fit for a person like me with the domain knowledge, advanced degree in the subject area, passion, and a proven (if dated) track record of publishing in the domain. The idea is to use software, amend what&#x27;s there if&#x2F;when needed, and apply my knowledge to the process and what comes out of it.<p>What did the interview start with? A surprise interactive coding challenge that I wasn&#x27;t prepared for, and thankfully the interviewer was kind and professional enough to understand that my value-add is in, well, computational chemistry and not in the details covered by a CS major.<p>I thought Jesus, I bet not a single software engineer at this company got asked to do a simple organic synthesis or even a redox problem during their interview process.
评论 #33915463 未加载
评论 #33925762 未加载
c54超过 2 年前
If OP wants to make models but never worry about them after doing the fun bits, it sounds like they might enjoy academia. The industry premium salary is in part from doing all the work around the “fun part”. As others have mentioned that’s where a lot of the value is, and nobody wants to be your servant.<p>Though even in academics, you have to write the paper yourself after doing the fun bits.
评论 #33919739 未加载
评论 #33918597 未加载
djha-skin超过 2 年前
&gt; If a company says that they need excellent Python skills, and they mean it, then I’m not the right person for that job.<p>Eh, let the company decide that. That&#x27;s one of the biggest things I tell my buddy who thinks he&#x27;s bad at programming and really doesn&#x27;t want to have to look for a job. He&#x27;s convinced he&#x27;s horrible but I don&#x27;t think he is. He just doesn&#x27;t want to get rejected, so he just avoids interviews as much as possible and tries to stay at the company he&#x27;s at.<p>I would imagine that most companies are okay with python code that is rough and ready as long as it can be integrated by the rest of the team into the production code. Most data programmers are probably expected to be better at the data than the programming.
评论 #33917929 未加载
xLaszlo超过 2 年前
He should find a SWE and partner up. As more clients enter the ML space there are more less sophisticated ones who has less of a concept of cross functional teams. Rather than educate themselves these clients have unrealistic demands.<p>While I am a huge advocate of DSea writing better code going fullstack is unrealistic for them. What does it even mean to integrate? Fastapi? Docker? Helm charts? Monitoring and observability? SRE? The list is endless.<p>This is a classic case of &quot;the client doesn&#x27;t care the want business value&quot;.<p>If as a DS you want to get better at writing code join our Code Quality for Data Science (CQ4DS) discord:<p><a href="https:&#x2F;&#x2F;discord.com&#x2F;invite&#x2F;8uUZNMCad2" rel="nofollow">https:&#x2F;&#x2F;discord.com&#x2F;invite&#x2F;8uUZNMCad2</a>
gundamdoubleO超过 2 年前
As someone who was naively hired as a &quot;data-scientist&quot; without my employer checking if I actually knew anything about modeling I ended up falling into the &quot;person who fixes up the code, maintains tests it, etc.&quot; role, while my seniors, who were much less enthused about software engineering, were the ones pushing out models that were admittedly very impressive but complete disasters in terms of code.<p>We worked very well off of each other, it was interesting to pick at a model from a software engineering perspective, how the code could be structured and improved, where some tradeoffs would need to be made and how we would test and verify if it actually worked for our users. I eventually left because the company was more concerned with getting new models out as soon as possible regardless of their actual performance, but it did ignite my passion for software engineering and devops.
nevermore超过 2 年前
&gt; Python doesn’t yet have anything remotely close to ggplot for rapidly making exploratory graphics, for example.<p>Plug for plotnine (<a href="https:&#x2F;&#x2F;plotnine.readthedocs.io&#x2F;en&#x2F;stable&#x2F;" rel="nofollow">https:&#x2F;&#x2F;plotnine.readthedocs.io&#x2F;en&#x2F;stable&#x2F;</a>). I don&#x27;t know R but use ggplot indirectly through this library for exploratory data analysis, and comparing the experience to any other python plotting library, I understand why R folks are usually so sad to be using Python.
评论 #33918636 未加载
lp4vn超过 2 年前
I think nobody here really pointed out a very relevant issue that&#x27;s completely widespread at least with tech job market: companies don&#x27;t want to pay people.<p>You don&#x27;t see this kind of problem in other established professions, you don&#x27;t expect an accountant to be able to perform the job of a lawyer neither you expect a nurse to be able wear the hat of a nutritionist.<p>Now with the technological professions, let&#x27;s use the term knowledge professions as an umbrella term, companies take advantage of the fact that these professions have not been around for that long and are not that established to keep expanding their rol of responsibilities.<p>We see that all the time with tech companies. It&#x27;s not rare that you&#x27;re supposed to know the frontend, backend, testing, devops, some of them even domain knowledge and the list keeps expanding even though sometimes they entail different sets of skills. The salary, not surprisingly, doesn&#x27;t grow proportionally to the list of requirements. Companies don&#x27;t want specialized people anymore, they want someone who will quickly pick the job of other people when&#x2F;if they finish theirs.<p>That&#x27;s what I believe the author&#x27;s rant was about. He has been looking for a job in his field, he is not a software engineer. Yet people are expecting him to be a professional developer on the top of being a professional data modeler.
评论 #33925798 未加载
zmachinaz超过 2 年前
Not sure that I would recommend actually hiring that guy. He does not seem to understand that modelling is only one part of the equation. If you model something non-trivial, usually you can not just hand over your R scripts to someone else and say: Please implement that in python&#x2F;C. You have implementation constraints which feed back into the modelling itself, like latency or scalability. Furthermore, good luck in letting someone translate your non-trivial math into another language hoping that he won&#x27;t break it in some subtle or non-subtle way. Its just far more efficient if you have someone who can actually do the prototype directly in python or C, and let then a pro-developer optimize specific parts of the code.
koliber超过 2 年前
People who are experts at their domain are not always good at explaining things to others. This means that if a data scientist does not know how to code, they need to partner with a programmer and be able to explain things to them. Same goes for a bunch of other professions. For some people, it is easier to learn to code than it is to learn to communicate their ideas with people.<p>If you are not going to be able to implement something, whether in code, or with a saw and hammer, you must be able to explain things really well. If you can not do either, you will have limited ability to apply your craft.
jvans超过 2 年前
modeling is 1-2% of the overall work involved in having a model serve customers something useful. There&#x27;s just so much other work to do that it rarely makes sense to have people who can only model. At some level of scale, building out infrastructure to support people to model full time and constantly run a&#x2F;b tests make sense but that is not the vast majority of use cases for ML in the real world and building out that support infrastructure is a huge investment
bitwize超过 2 年前
My dad got me into computers, but he&#x27;s not a programmer by trade. He&#x27;s a mechanical engineer, and he used BASIC on a TRS-80 to do engine simulations to characterize the mechanical forces on the crankshaft for different crank types. Back then, BASIC was basically what MATLAB is today for engineers. They just slammed some code in there to do the math they needed to do and ran it. They didn&#x27;t care about things we care about: testability, maintainability, observability.<p>Similarly, Judge Alsup of Oracle v. Google fame writes astronomy software in QBasic. He doesn&#x27;t give a shit about best practices, if it helps him aim his telescope correctly it&#x27;s all good.<p>Welcome to a world with citizen programmers. A world of terrible code that does the job. I frickin&#x27; love it.
Fiahil超过 2 年前
Yes, that’s expected of every scientist I am working with. The reason is quite simple and have little to do with Engineering work being more expensive than Scientific one (it’s actually the opposite): get a nice problem to solve or a great model to build, and very soon, you have an ivory tower completely disconnected from the original objective.<p>To be fair, I don’t ask scientists to understand concurrency issues in programming. Only the basic stuff that is required for delivering a functional program. Yes, Pandas and Scikit-learn belong to the basic stuff.
评论 #33916104 未加载
notacop31337超过 2 年前
To be honest I don&#x27;t believe this to be isolated to just modelling and production code, I feel as though this issue presents itself in most of the job ads I see today, Backend roles with requirements for React, Python roles with requirements for Javascript, Backend roles with requirements for DevOps&#x2F;SRE.<p>I don&#x27;t necessarily have an issue with widely skilled engineers, but I would prefer it&#x27;s for the right reasons, and I largely believe that it&#x27;s an exercise in laziness on most companies behalf. They just want to hire less people and have more of their workers do tasks that are outside of their remit.<p>I have zero interest in writing Javascript, absolutely none, I don&#x27;t want to do it, and I have pushed my career in directions that mean that for the most part, I don&#x27;t have too. I&#x27;m happy with this decision and have made it willingly.<p>It&#x27;s the same with a lot of &quot;DevOps&quot; tasks, having previously been a DevOps Engineer, I now just want to write code, real code, but it feels as though most places now are just not hiring DevOps&#x2F;Infra people, and just telling their other engineers to do it, which I understand, but it results in a far worse experience for both sides of it. I have to regularly force my hand down from volunteering for things that I have the technical experience to do, and do properly, versus colleagues that don&#x27;t have the experience, because I&#x27;m tired of being shoehorned back into a role that I intentionally left. All of this is because the idea of &quot;cross functional teams&quot; no longer means hiring specialised engineers to do specialised roles, and just getting everyone to do everything, and then being surprised when the context switch penalty actually exists, and it&#x27;s done to a worse standard than someone who is skilled in that role.
lysecret超过 2 年前
I know exactly what you feel like and I used to be exactly like you. Even down to preferring R to Python. All I can say is: Bite the bullet. Learn Python, forget about R. R is nice and all but there is nothing that couldn&#x27;t also be done in Python.<p>What helped me is: Learn to appreciate the beauty in actual coding, in deployments in environments in well structured, maintainable code. In scaling issues, databases etc.. There is an endless world out there which is extremely fascinating as soon as you get over the &quot;all I want to do is modeling&quot; mindset.<p>Good luck. You can definitely do it, because I did it as well.
Lyngbakr超过 2 年前
I&#x27;m a data engineer who works in an R-centric engineering team, which is quite unusual. Our experience has been that R works well for our use cases and lets us work closely with our analysts and data scientists who are all R users. There are no silos due to different teams speaking different languages. That said, am I still writing Python whenever possible? Damn straight. I&#x27;m well aware of how peculiar our team is and, like the post says about modelling, Python (+ SQL) is the default language of the data engineering world. If I want another job, I need Python.
NHQ超过 2 年前
Tensorflow.js is as good as the Python version, and Node.js is a better production platform, and models can run in browsers. Python is easy though; hiring about a language is a sign management doesn&#x27;t know enough about programming to trust their own judgement.<p><a href="https:&#x2F;&#x2F;www.tensorflow.org&#x2F;js" rel="nofollow">https:&#x2F;&#x2F;www.tensorflow.org&#x2F;js</a>
eftychis超过 2 年前
Heads up to the author of the article:<p>a) writing is hard<p>b) nobody likes writing production code<p>c) nobody is good at production without trying<p>d) most people I have seen that have claimed they are good at modeling, but not good at writing that for production, have actually not reached their &quot;I am good at modeling&quot; state yet.<p>e) (Most) People don&#x27;t write code to write code. Like fiction writers do not write lines of text to write text. Writing is a means to an end. It is part of the idea birth process.<p>TL;DR: Write code for production; you will be great at it relatively soon.<p>P.S. Curious if people have had the opposite experience or counterexamples. Edit: stylistic.
musicale超过 2 年前
&gt; II. Modelers have to be coders<p>Since you can basically use numpy like a large calculator, it seems like a potentially useful tool to have under your belt. And matplotlib is good for making graphs. Python&#x2F;numpy&#x2F;etc. seems like a reasonable alternative to matlab (etc.) in many cases.<p>Symbolic math tools like Mathematica are useful as well.<p>It&#x27;s certainly helpful if modelers can understand the code that implements the model and spot obvious errors in the code as well as in the results. It&#x27;s also extremely beneficial if whoever is writing, testing, and using the model code has a very good understanding of the model itself.<p>A potential step toward this happen is implementing the model as a standalone library of very straightforward code that everyone on the project can understand.
mattnewton超过 2 年前
Isn’t this because they are hiring to solve the problem, like, a function that predicts X, and if the model engineer is also the production engineer this is one hire that solves the business problem?<p>The alternative is hiring two engineers and possibly additional pm&#x2F;management workflow to make sure they mesh and the prod engineer is not blocked on the model engineer, and the model engineer delivers things that are usable. It’s a bit like when we used to have a “webmaster”, or now I suppose they could be a “full stack consultant” who was in charge of making sure the right pixels appeared on thecompanywebsite.com more or less by any means necessary because that was the business need.
bfung超过 2 年前
In Silicon Valley back in 2014, when the seeds of ML&#x2F;DS started to get traction, all the people with Data Science titles knew how to write Java webapps or Hadoop MR jobs to ingest, clean, transform, model&#x2F;analyze, and serve results that went into production.<p>The specialization def. has narrowed scope in the last ~10ish years, but the original roots were that: Java + stats + database knowhow.<p>So yes, learn some production level skills. Having far too specialized people also runs the risk of lost-in-translation models that only work in the original implementation, until edge cases show up and model is out of date.
评论 #33918801 未加载
BlueTemplar超过 2 年前
What stood out to me even more was the &quot;machine learning&quot; buzzword - even though it doesn&#x27;t seem like there&#x27;s any guarantee that training a neural network would actually improve the modeling (and it&#x27;s just another tool that the modeler should be able to decide on their own to use or not).<p>Specifically, my advisor just suggested that I ignore this bit, and send in my resumé to those job offers anyway (this being something that we should be able to learn on the fly at our level anyway I guess...)
wwilim超过 2 年前
Isn&#x27;t what they mean just &quot;be able to export the model in a form that a) is portable enough to be taken and integrated by someone else, b) has a reasonable runtime performance, and then c) explain how to use it to an application developer&quot;? Those are high demands already. There are companies with excellent modelers who never actually get anything deployed because there is no one to bridge the technological gap between what they&#x27;re doing and production services.
ford超过 2 年前
My hypothesis is that few companies are big enough to be able to dedicate many people to _exclusively_ modeling.<p>If a modeler has 4 weeks to spend on learning new things, odds are most companies would benefit more from the modeler learning how to do basic parts of the operational (python) part of their job (which needs to happen for models to be useful work) vs spending 4 weeks diving deeper on some aspect of modeling (which may or may not yield percentage points of improvement on some problem).
programmarchy超过 2 年前
Is there anything like Blender Nodes, but for data modeling? It&#x27;s an amazingly powerful system. [1] The learning curve is still fairly steep in terms of having to learn all the nodes and how they interact, but you can&#x27;t make a syntax error.<p>[1] <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=7EeIsUErzLE" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=7EeIsUErzLE</a> #nodevember - Simon Thommes - Procedural Shader Showreel (Blender)
评论 #33918976 未加载
hermitcrab超过 2 年前
From the article: &gt;R is extremely slow at a lot of tasks, for one thing, even more than Python.<p>Base R is quite slow. R + data.table is faster than Python + Pandas in a benchmark that I did recently.<p>For a 1 million row CSV file, Read + Sort + self-Join + Write took on a Windows box:<p>Base R: 47.56s<p>Python + Pandas: 6.44s<p>R + data.table: 2.99s<p>More details at:<p><a href="https:&#x2F;&#x2F;www.easydatatransform.com&#x2F;data_wrangling_etl_tools.h" rel="nofollow">https:&#x2F;&#x2F;www.easydatatransform.com&#x2F;data_wrangling_etl_tools.h</a>...
评论 #33924153 未加载
PaulHoule超过 2 年前
I was talking to somebody at the bus stop the other day who I met back in my physics days and still teaches scientific computing about how I got dragged kicking and screaming into Python (people just kept showing up with work to be done), that I’d almost like to drop Python from my practice because there is no way I can quit Java or JavaScript but libraries keep me in the ecosystem and how we are both shocked that anybody uses R.
jgalt212超过 2 年前
As a manager, I have no interest in model-based solution unless we have a good plan to refit &#x2F; update and test the model on an on-going basis. And for that, Python &gt;&gt; R. Not my line, but I really like it: R is for Research, Python is for Production.
throwaway894345超过 2 年前
I support a research environment for biostatisticians and other researchers and we have Python and R offerings, and R is the overwhelming favorite, or so product tells me. As someone who isn’t a modeler, it’s interesting how this varies by industry.
chrisgd超过 2 年前
How does one learn “production code” outside of industry? Are there examples of programs that do something with production code?