<p><pre><code> At this point a pipeline built on top of Stitch / Fivetran /
dbt is far more reliable than one built on top of custom-
built Airflow tasks.
</code></pre>
I'd be curious if anyone who has used or integrated these products into their infrastructure could verify or comment on whether they are as effective as the author seems to suggest.<p><pre><code> If you hire a data engineer who just wants to muck around in
the backend and hates working with less-technical folks,
you’re going to have a bad time.
</code></pre>
I'm not sure this was the intent, but I found this somewhat dismissive. I think communication skills are indeed important and being able to effectively explain technical considerations to less-technical parties (or parties whose technical expertise is not aligned with Data Engineering), but I have encountered in my own experience an active disregard for those considerations by data scientists as orthogonal to their needs at best or at worst, details for which they cannot be bothered. This is underscored by the notion that we, as Data Engineers, "muck around in the backend." We do, and we have to, and it helps to like it.<p>There are a few other areas of input and contribution that a good data engineering team can provide, that I don't think get enough attention in the post:<p><pre><code> 1) Machine Learning Productionization
2) Being a source of data expertise (consulting) with other
developers (working on services or the main product) in
the organziation
</code></pre>
Regarding 1, while the author seems convinced that the ELT/ETL tooling and ingestion pipeline building can be taken off-the-shelf, I don't if it is as likely that there is the same kind of mature tooling for machine learning model deployment/integration. Though, I believe that is changing, slowly.