TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Designing Schemaless, Uber Engineering’s Scalable Datastore Using MySQL (2016)

83 点作者 mangatmodi超过 7 年前

7 条评论

dcposch超过 7 年前
FYI, MySQL has a fresh new JSON data type now.<p>It has some great properties. It lets you mix data with a strict schema and data without a strict schema, getting some of the benefits of both worlds.<p>The JSON datatype avoids many of the annoying legacy considerations that other SQL column types have. You don&#x27;t have to specify a length--so you won&#x27;t make a VARCHAR(255), then get burned when one day a value has more than 255 characters. You don&#x27;t have specify a character encoding--JSON is always utf8mb4, the right one. (MySQL&#x27;s &#x27;utf8&#x27; encoding, perversely, supports only a subset of utf8 and will break if you try to write an emoji.)<p>Here&#x27;s a table that illustrates some of the power:<p><pre><code> create table unitType ( id bigint not null auto_increment, buildingId bigint not null, info json, name varchar(255) as (info-&gt;&gt;&#x27;$.name&#x27;) not null, primary key(id), foreign key (buildingId) references building(id) on delete cascade, unique key(buildingId, name) ); </code></pre> We&#x27;re modeling unit types in a building. For example, one building might contain 1-bedrooms, some nicer 1-bedrooms, and some 2-bedroom units.<p>- It&#x27;s very easy to add new fields. If, tomorrow, we decide that each unit type needs a `minSqft` and `maxSqft`, I can add them with no database migration.<p>- We still get most of the benefits of a schema. The database makes it impossible for a unitType to exist that does not belong to a building. The database also makes it impossible for a single building to have two unitTypes with the same name. (With a truly schemaless DB like Mongo, the complexity of preventing or dealing with those kinds of invalid data end up in the application code.)<p>- It makes it easy to use SQL directly, with no ORM. SQL is a powerful language; ORMs are often a leaky abstraction and a source of unessential complexity. With JSON columns for extensibility, you end up with way fewer migrations and way less need for auto-generated SQL.<p>- Computed columns (like name above) are really powerful.<p>Most of the above is possible in Postgres as well. Postgres does not have computed columns, as far as I can tell.<p>--<p>This is just to say: 99% of people on Hacker News are closer to where we are (rapid prototype phase) than where Uber is (Web ScaleTM). If that&#x27;s you, consider just using JSON columns to maximize your development velocity! You can always do something fancier (like Schemaless) later on.
评论 #16255553 未加载
评论 #16255082 未加载
评论 #16255539 未加载
评论 #16255402 未加载
评论 #16256473 未加载
ronnier超过 7 年前
Summary:<p>&gt; We ended up building a key-value store which allows you to save any JSON data without strict schema validation, in a schemaless fashion (hence the name). It has append-only sharded MySQL with buffered writes to support failing MySQL masters and a publish-subscribe feature for data change notification which we call triggers. Lastly, Schemaless supports global indexes over the data.
评论 #16254322 未加载
whalesalad超过 7 年前
Still blows my mind that it took Uber so long to migrate away from a single db solution. The bit about wanting an event system to handle downstream trip processing w&#x2F;o having one failure block the whole job was shocking.<p>I’m all for avoiding premature optimization but this was taken to the extreme.<p>PostgreSQL is capable of all of this out of the box. Wonder why a custom tool was built instead?
评论 #16255210 未加载
评论 #16257109 未加载
评论 #16256476 未加载
评论 #16254753 未加载
mewse超过 7 年前
I wish I had video of the faces I was undoubtedly pulling, during the few seconds I spent puzzling out the pronunciation and meaning of the word &quot;Schemaless&quot;. &#x2F;Shema-leez&#x2F; ? &#x2F;Szhee-males&#x2F; ?<p>Naming products is demonstrably a hard problem.
评论 #16255405 未加载
评论 #16256625 未加载
mangatmodi超过 7 年前
Any critique on Uber&#x27;s use of Triggers for triggering billing service? I have been reading that Triggers shouldn&#x27;t be used to esp, call external services as, the external service might not be ACID compliant(no rollback?) and if expensive, they can hold the DB lock on the row for really long time.
verst超过 7 年前
This was discussed here 2 years ago.<p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=10894047" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=10894047</a>
评论 #16254231 未加载
rubyn00bie超过 7 年前
Can this have a 2016 added to it, please?
评论 #16256446 未加载
评论 #16254401 未加载