TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

AWS releases Glue Databrew, a visual ETL tool

169 pointsby ManWith2Plansover 4 years ago

19 comments

ctvoover 4 years ago
The thing folks don&#x27;t mention regarding AWS is the inherent competitive advantage their micro-startups have. We focus on AWS launching managed ElasticSearch or managed Kafka, and talk about them (legally) using open source contributions to make money, but I think those are minor compared to things like this.<p>What AWS has is a culture and institutional knowledge on how to launch new products that take foundational AWS services (S3, Lambda, EC2, DDB, etc.) and glues (!) them together better than what a competing non-AWS company can do. This is a bold claim (since AWS launches some very crappy products), but imagine being able to use AWS infrastructure at cost, having internal knowledge on how to best optimize that infrastructure and access to the engineers that own those services while you build abstractions and better user experiences on top of them.<p>I don&#x27;t know how cos that compete in any related space can survive. When AWS is willing to throw whatever against a wall (launching 50+ services a year) to see what sticks, sooner or later they&#x27;re going to land in your space.<p>Become more locked into AWS&#x27;s foundational services -&gt; these abstractions on top of them start to make more sense in engineering complexity &#x2F; delivery time &#x2F; possible cost dimensions -&gt; Use more of these -&gt; Become more locked into AWS&#x27;s foundational services.<p>This feels very different from Azure or GCP.
评论 #25072923 未加载
评论 #25074244 未加载
评论 #25072612 未加载
评论 #25072481 未加载
评论 #25073730 未加载
评论 #25072877 未加载
评论 #25076282 未加载
评论 #25072904 未加载
评论 #25072683 未加载
aketchumover 4 years ago
I am a big fan of AWS and am happily running our entire tech stack with their services for a very reasonable price. That said, Glue is an absolute dumpster fire of a product. My team and I have wasted countless hours trying to wrangle a DynamoDB -&gt; Glue -&gt; Athena -&gt; Quicksight pipeline and Glue refused to cooperate (we ended up building our own DDB to SQL pipeline after finally giving up on Glue). Hopefully this will increase the usability of the Glue product and actually enable out of the box ETL.
评论 #25073255 未加载
评论 #25073320 未加载
评论 #25072797 未加载
评论 #25072840 未加载
评论 #25080970 未加载
评论 #25072514 未加载
orfover 4 years ago
Glue is an absolute horrendous mishmash mess that seemed to suffer from a serious lack of investment or vision. The managed spark component is a good product buried under an all-round terrible developer and console UX, and the data catalog&#x2F;schema crawling is really useful.<p>But I’m glad the lack of investment is turning around with this, the recently released Glue Studio and the fantastic “glue 2 fast startup” job types.
ghcover 4 years ago
Running any sort of innovative data infrastructure startup (whether data prep, database, data pipeline, etc.) is now an exercise in futility. The big three cloud providers <i>will</i> embrace your innovation, extend your product, and extinguish your business.<p>Given the market power of cloud providers, every infrastructure innovation now a &quot;sustaining innovation&quot; in Christensen&#x27;s terminology.<p>The key to success seems to be building a product for a niche the cloud providers think is too small, and then either maximizing your value within that niche so that if your market grows large enough for a cloud provider like AWS to come after you, you can pivot to providing customizations to your highest margin customers. MongoDB is a good example of this.<p>On the other hand, none of the major cloud providers seem capable of moving up the value chain to the application level, so if I were starting a company today I would focus on leveraging my infrastructure-level innovation to create a vertical opportunity in a high margin market instead of seeking to build a horizontal platform (IoT platform for example).
评论 #25074041 未加载
fs111over 4 years ago
Don&#x27;t buy the copy, buy the original: <a href="https:&#x2F;&#x2F;www.trifacta.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.trifacta.com&#x2F;</a>
评论 #25076267 未加载
评论 #25077748 未加载
typpoover 4 years ago
I&#x27;m glad to see a competitor to Trifacta&#x2F;Google Cloud Dataprep. My company relies on it heavily, but we constantly run into bugs, UI glitches, and crashes that can sometimes block people for hours or days. It&#x27;s the sort of software that you hate to use, but the benefits are too good to ignore.<p>The benefit to visual ETL is that non-engineers can do a lot of basic data engineering. We tie this into our more complex code-based ETL pipelines. It was a game-changer for us and helps us get a lot more done.
评论 #25072903 未加载
评论 #25085238 未加载
seddonm1over 4 years ago
These GUI driven&#x2F;Visual ETL tools certainly have their place but are firmly at one end of the ease of use vs engineering discipline based ETL continuum.<p>As other posters commented Visual ETL often suffer from source control or limited extension ability but do provide the rapid development environment that users (generally more business oriented) seek. They also tend to trivialize the value of experience&#x2F;discipline - for example I go to an accountant for my tax because they apply learned-experience relating to tax that I do not have (even though the math is easy) whereas in data engineering seemingly simple tasks such as correctly applying data typing to money or dealing with timezones seems to be glossed over in the pursuit of DIY - and wondering why your money columns don&#x27;t reconcile or you lose data in failure scenarios.<p>At the other end of the continuum large teams writing bespoke ETL code for every job does not scale well for many reasons (<a href="https:&#x2F;&#x2F;reorchestrate.com&#x2F;posts&#x2F;code-doesnt-scale-for-etl&#x2F;" rel="nofollow">https:&#x2F;&#x2F;reorchestrate.com&#x2F;posts&#x2F;code-doesnt-scale-for-etl&#x2F;</a>). I think the positive reaction to ideas like Data Mesh comes from the failures of these large, centralized teams which coincided with the Hadoop era.<p>Our solution has been to develop an open source (MIT) declarative framework (<a href="https:&#x2F;&#x2F;arc.tripl.ai&#x2F;" rel="nofollow">https:&#x2F;&#x2F;arc.tripl.ai&#x2F;</a>) that allows configuration driven ETL - mostly developed via a Jupyter Notebook environment (to allow rapid development and appeal to a larger audience) - whilst making most of the difficult tasks mentioned above easier. This has been in development for a few years now and continues to evolve. We value your feedback.
somurzakovover 4 years ago
looks pretty basic limited copy-cat of trifacta&#x2F;tableau prep&#x2F;alteryx.<p>this tool requires ready mostly clean-ish data to work with. but the #1 problem in data engineering is lack of such data
georgewfraserover 4 years ago
Visual ETL is not as good of an idea as it seems at first. You end up putting a ton of business logic into the menus of these tools, and it’s not version controlled, and it’s not searchable. You’re better off doing this kind of work in SQL.
评论 #25078153 未加载
iblaineover 4 years ago
Having used GUI ETL tools for years (SSIS, Informatca, Talend, Appworx) and now using Airflow(Prefect is an excellent alternative btw), I hope to never go back. Great to see Glue improving and for the industry’s sake I hope it doesn’t catch on. Most ETL should be treated as code. As code, ETLs are easier to write, maintain, and manage complexity.
评论 #25079528 未加载
2wristover 4 years ago
Have to say as slick as stuff like this looks I do find myself gravtatiing towards ETL in code. (It feels easier to read&#x2F;understand)<p>How would you change control something like this?
评论 #25083337 未加载
评论 #25072504 未加载
shmoogyover 4 years ago
Are there any alternatives to this style of application? I was going to try to make something similar to this for my team to use that would be able to give them access to map columns and simple transforms, then push the resulting flow to me to move it into an airflow dag.<p>I would really like a visual editor I can adjust with Regex functions and mappings that I can self host and iterate on.
评论 #25077790 未加载
crb002over 4 years ago
<a href="https:&#x2F;&#x2F;conexus.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;conexus.com&#x2F;</a> should get more love. Based off of <a href="https:&#x2F;&#x2F;www.categoricaldata.net&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.categoricaldata.net&#x2F;</a> . Ensures that complex transforms are provably correct.
ManWith2Plansover 4 years ago
Haven&#x27;t used this yet, but this looks like a really good user experience from their demo video. Haven&#x27;t used competitors like Alteryx myself, but just having this integrate so well into the AWS ecosystem makes this seem really useful.
ineedasernameover 4 years ago
Seems like this would be a good fit to expand to cover SageMaker &amp; AirFlow for a really powerful GUI workflow editor that includes ML directly.
manigandhamover 4 years ago
Looks very similar to GCP&#x27;s Cloud Dataprep (which itself is powered by Trifacta).
VectorLockover 4 years ago
The $1 per 30 minute session pricing really jumped out at me.
QuinnyPigover 4 years ago
This service name makes me viscerally angry.
awinter-pyover 4 years ago
my brain keeps re-parsing this to &#x27;grue datablew&#x27;