I guess my response is "OK" ? Maybe this is surprising for some people, but... why would it be surprising? Data engineers are usually working in a few ways.<p>1. With in memory datasets using Python Notebooks<p>2. With big data storage systems like Hive<p>3. With something like Spark or maybe even Flink<p>All of these tools were built for people who, for the most part, only need to know Python and SQL. And they do a bunch of interesting stuff with those tools. Also, I guess maybe it's worth noting that under the hood the implementations are almost always C/C++.<p>I guess maybe someone could mistakenly think that this is a good thing? Or a bad thing? Is that the point? Sorry I don't have Twitter so I don't know if there are further replies clarifying this. I'm sort of at a loss for the point because this is not surprising at all but maybe this is targeting people really early on in their careers who aren't aware - that's sort of the problem with putting a Twitter post here, it feels like HN is not the right audience, but their followers may be really green and exactly the people who would not be aware.
Could be data scientists, in which case I would probably describe their skills as:<p>"Know python and SQL and have post-graduate level skills in data analysis and mathematics"
The actual work is not the issue. It's getting a response from an application and then passing the interview that's the challenge.<p>I've applied to Netflix many times for many different positions and have never once gotten a response. I assume they are inundated with thousands of applications for every position.
Full tweet text:<p>>I know data engineers who know just Python and SQL who make $500k at Netflix. You don’t need to know the high performance languages to make a killing as a data engineer!<p>This post's title is a little bit rage-bait without the last part.
It's <i>data</i>. I won't say the tooling is irrelevant, but surely the important thing is understanding the domain and the finer points of statistical reasoning that regular people so often get wrong. I'm sure there are cryptographers who can't code their way out of a paper bag but are one of 50 people on the planet who actually understand how elliptic curves work. My wife is pretty invaluable at her job even though she's asking me dumb newb tech questions all the time because of her background in electrical engineering. Someone needs to check the 25 year-old hotshots who come in knowing all the hottest JS frameworks but don't get why it makes no sense to just take the user options for tasking an electro-optical satellite and port them over exactly to a SAR platform. You can type check to perfection, but if your type doesn't accurately model the thing in the world you're trying to represent, no compiler can tell you that. You have to actually know something about the world outside of software.
I work with engineers who barely know python and matlab who make as much or more than I do; they are experts in other things. There are things beyond programming languages you can become a valuable expert in.
This isn't interesting or new.<p>When Python and SQL support was added to Spark many years ago, Data Engineers quickly embraced it.<p>It's standard amongst <i>all</i> Data Science people that you know both. But the people who actually do well in the industry know a lot more such as what the underlying infrastructure e.g. JVM is doing. As well as some other languages like Scala or R for more specific tasks.<p>Data Engineers in particular are expected to take full ownership over performance and reliability as well. So more than a little bit of cross-over with DevOps and SRE.
Life pro tip, when deciding between features to implement, if one of those features is a dashboard. Do the dashboard. Executives love summary information.
Obvioulsy, they also know about data manipulation and analysis.<p>What coding skills does your MD have? How much do they make?<p>Specialized and in demand skills bring high salaries...
You can do anything in Python, and do it fast, as long as you don’t need it to actually run fast.<p>That is a completely acceptable scenario for a great deal of data analysis.