TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Building a Distributed Log from Scratch, Part 1: Storage Mechanics

278 pointsby tylertreatover 7 years ago

4 comments

marknadalover 7 years ago
This is a really neat and good article! I did a talk in Sweden the other month about how to build a distributed database, hopefully it may also be fun&#x2F;useful&#x2F;informative for others: <a href="https:&#x2F;&#x2F;youtu.be&#x2F;5fCPRY-9hkc" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;5fCPRY-9hkc</a> (it uses CRDTs instead, so is a counter-part that isn&#x27;t totally ordered and isn&#x27;t append-only).<p>I like the OP article though, cause I learned about NATS streaming, which I hadn&#x27;t heard of before - just Kafka. Will have to check it out.
评论 #15985391 未加载
nicolaslemover 7 years ago
I have to admit that I only recently got familiar with logs. I was designing a B+Tree[1] in Python for fun and was struggling to make it survive crashes: a single insertion in a tree often results in multiple page writes which is not atomic.<p>The solution to this problem is simple and elegant with a Write-Ahead Log. Every page write is appended to the log and only merged back into the tree file when it&#x27;s sure that the log is safely written to storage.<p>SQLite has an extensive documentation of its WAL file format, which is great for learning.<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;NicolasLM&#x2F;bplustree" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;NicolasLM&#x2F;bplustree</a><p>[2] <a href="https:&#x2F;&#x2F;www.sqlite.org&#x2F;fileformat.html#the_write_ahead_log" rel="nofollow">https:&#x2F;&#x2F;www.sqlite.org&#x2F;fileformat.html#the_write_ahead_log</a>
评论 #15993965 未加载
kthielenover 7 years ago
This is a good introduction, and a very useful abstraction. At Morgan Stanley we have built a PL&#x2F;compiler and tools around a method of logging like this -- logging algebraic data types and live querying out of them with Haskell-like comprehensions&#x2F;pattern-matching&#x2F;etc:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;Morgan-Stanley&#x2F;hobbes" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;Morgan-Stanley&#x2F;hobbes</a>
ww520over 7 years ago
One question about the index file, &quot;in Kafka, the index uses 4 bytes for storing an offset relative to the base offset and 4 bytes for storing the log position.&quot; Isn&#x27;t the relative offset to the base offset already pointing to the physical location of the message in the segment file? What&#x27;s the purpose of the second 4-byte field log position?
评论 #15985771 未加载