TechEcho

12 comments

kevinsundarabout 3 years ago

Wow this is an oversimplification. I've had years of experience working in a data lake within a FAANG handling > 5 PBs of data per day ingest. There's so many things this misses:1. What if the domain teams don't actually care to maintain data quality or even care about sharing data in the first place? This model requires every data producer to maintain a relationship with every data consumer. That's not gonna happen in a large company.2. Who pays for query compute and data storage when you're dealing with petabytes and petabytes of data from different domains? If you (the data platform team) bill the domain teams then see above, they'll just stop sending data.3. Just figuring out what data exists in the data mart (which this essentially is describing) is a hassle and slows down business use cases, especially when you have 1000s of datasets. You need a team to act as sort of a "reference librarian" to help those querying data. You can't easily decentralize this.4. How do you get domain teams to produce data in a form that is easy to query? Like what if they write lots of small files that are computationally difficult to query, whose gonna advise them? Data production is very related to data query performance at TBs scale. The domain team is not gonna become experts or care.5. What do you do when a domain team has a lot of important data but no engineering resources? Do you just say "oh well, we're just a self-service data platform so no one gets to access the data"?

评论 #30740094 未加载

评论 #30738749 未加载

评论 #30739247 未加载

评论 #30740542 未加载

robertlagrantabout 3 years ago

It really feels like data mesh is a fairly half baked concept born out of short term consulting gigs and a desire to become a technical thought leader.

评论 #30740512 未加载

评论 #30738027 未加载

politelemonabout 3 years ago

Is there an underlying assumption here that all of the datasets' domains are perfectly in sync with each other in the context of domain metadata?As an example, a Team1 might define the manufacturer of a Sprocket as the company that assembled it, whereas a Team2 might define the manufacturer as the company that built the Sprocket's engine. Since the purpose of a datamesh is to enable other teams to perform cross-domain data analytics, there needs to be reconciliation regarding these definitions, or it'll become a datamess. Where does that get resolved?

评论 #30739278 未加载

评论 #30737779 未加载

LaserToyabout 3 years ago

I looks like a weird attempt to build a consulting business around a simple idea.Treat data assets like micro services and pipelines like network. Period.Prescribing everything else rubs me wrong way.So, data mesh is: architecture in which data in the company organized in loosely coupled data assets.

i_like_waitingabout 3 years ago

So if I understand this correctly, data mesh is just data mart, that doesn't bring data in database as a table, but uses S3 storage instead (I assume because thats cheaper in the cloud?)

评论 #30738301 未加载

pkleeabout 3 years ago

The concept of a data-mesh is more of a business concept as opposed to tech. IMHO the idea being proposed is that of a conceptual data-server (not to be confused with database server) much like a HTTP server / Mail Server where people can engage with data as a first class citizen and create "data" products. This is especially true as we move from HTML to somewhat HDML (Hyper data markup).By making data as the product (abstracting all the gory details), you are fundamentally engaging with data through a UI or an API. As you expose these products they become accretive while fundamentally encapsulating the domain expertise within them.

mountainriverabout 3 years ago

This seems like mostly common sense. Infrastructure teams should always be building tools that the org consumes (and ideally the general public)In a lot of orgs this goes sideways and the infrastructure teams end up owning everything and never have time to do anything else. Usually this happens due to upper management putting on the squeeze.In order for teams to actually own their infrastructure and data we need better tooling to help them. This is coming along nowadays but isn’t fully there.

cwpabout 3 years ago

Dunno about the merits of this, but it does seem to be part of the overall effort to rethink how to organize large groups of people working together. With the internet we can afford peer-to-peer communication, and we don't have to organize into hierarchies. But we can't just do full-mesh communication either, because that's overwhelming to individuals, as anyone who lived through the initial slack-and-zoom remote work of early 2020 can tell you. (Though lots of people are still living through it, unfortunately)So what kind of communication structures are good, and in what circumstances? How do we structure work so that we don't have to communicate about everything? When do we fall back to ad-hoc video chat or even in-person meetings? These are the kinds of questions that 21st-century management has to answer. It's fascinating to watch people grapple with them.

akoabout 3 years ago

Lots of concerns and scepticism in the discussions here. Any suggestions about good, achievable data strategies and data architecture that work at enterprise level?

评论 #30742378 未加载

timwisabout 3 years ago

It sounds almost entirely about team responsibility and governance, rather than technical architecture. What’s the difference from a data lake on a technical level?

评论 #30740825 未加载

zozbot234about 3 years ago

Isn't this usually called a "data mart" as opposed to "data mesh"? Or is the "mesh" term intended to point to something more unstructured, like team- or business division-level equivalent to a data lake? But isn't that just a data pond?

评论 #30745105 未加载

sdzeabout 3 years ago

If you need so many "slides" to persuade your clients of something, I think you lost already.

评论 #30737940 未加载

12 comments

kevinsundarabout 3 years ago

评论 #30740094 未加载

评论 #30738749 未加载

评论 #30739247 未加载

评论 #30740542 未加载

robertlagrantabout 3 years ago

It really feels like data mesh is a fairly half baked concept born out of short term consulting gigs and a desire to become a technical thought leader.

评论 #30740512 未加载

评论 #30738027 未加载

politelemonabout 3 years ago

评论 #30739278 未加载

评论 #30737779 未加载

LaserToyabout 3 years ago

i_like_waitingabout 3 years ago

So if I understand this correctly, data mesh is just data mart, that doesn't bring data in database as a table, but uses S3 storage instead (I assume because thats cheaper in the cloud?)

评论 #30738301 未加载

pkleeabout 3 years ago

mountainriverabout 3 years ago

cwpabout 3 years ago

akoabout 3 years ago

Lots of concerns and scepticism in the discussions here. Any suggestions about good, achievable data strategies and data architecture that work at enterprise level?

评论 #30742378 未加载

timwisabout 3 years ago

It sounds almost entirely about team responsibility and governance, rather than technical architecture. What’s the difference from a data lake on a technical level?

评论 #30740825 未加载

zozbot234about 3 years ago

评论 #30745105 未加载

sdzeabout 3 years ago

If you need so many "slides" to persuade your clients of something, I think you lost already.

评论 #30737940 未加载

Data Mesh Architecture

12 comments

Data Mesh Architecture

12 comments