科技回声

4 条评论

mike_hearn超过 1 年前

It's a tech talk so I watched it and made some notes. It's an argument for using in-process databases when doing data science, rather than an external DB. The speaker pitches DuckDB as a concrete example, which seems to be an in-process DB for Python data frames.The speaker presents measurements showing how much overhead the wire protocols for various DBs have. MySQL is the best, Postgresql is orders of magnitude worse due to a very inefficient binary format design. The best is still 10x worse than netcat.Apache Arrow is trying to design a universal protocol for DB access that's more efficient than what's out there currently.Speaker asserts that scale-out is usually not needed in data analytics, no need to use Spark etc unless you want it on your CV.Audience member asks "what about multi-user/multi-process access", speaker admits DuckDB basically doesn't do that.Speaker pitches for using embedded in-proc DBs inside AWS Lambda functions. Not practical to install Oracle RDBMS in something that only runs for 100msec.A web shell for DuckDB is demonstrated, it uses WASM.Decentralization is pitched as a reason to avoid 2-tier architecture (separate db engine w/ client protocol).

评论 #38407827 未加载

评论 #38412448 未加载

评论 #38407496 未加载

0xbadcafebee超过 1 年前

One of the major failures of the modern computer science age (among others) is a lack of direction away from traditional i/o. We still are stuck on files and directories and tcp sockets. Yet what we actually want to do with i/o is not read a file from a local disk, or connect to a server and transmute the contents of the file over some additional protocol.What we really want is to store some data somewhere, and later be able to retrieve it, without necessarily knowing what it was we stored or where or how. And we don't want to think about what server it's on, or what hard drive, or what folder. And we don't want to think about client protocols or query languages.All of that would be possible if we reinvented i/o. Basically, just imagine what you want your experience to be, and then start making up names for functions that do that. Stuff that in a kernel, or a standard library. Now you have i/o that's based on how you really want to use data. The backend implementation of it can vary, but the point is to make the user experience what we actually want rather than what somebody else thinks is practical. Make the data interface you want to use, and make it a standard.

评论 #38408473 未加载

评论 #38408375 未加载

评论 #38408793 未加载

评论 #38414368 未加载

评论 #38407945 未加载

blibble超过 1 年前

that chart of the "inefficiency of client protocols" tripped my bullshit alarmthe paper is here: <a href="https://15721.courses.cs.cmu.edu/spring2023/papers/15-networking/p1022-muehleisen.pdf" rel="nofollow noreferrer">https://15721.courses.cs.cmu.edu/spring2023/papers/15-networ...</a>it's a super-contrived example that's not using any of the functionality of the database and is just using it as "cat"basically just doing cat over localhost, well, what a surprise, if you add a layer of serialisation of course it's slower that just doing memcpy()if you're using your database to store files... maybe don't do that

评论 #38406695 未加载

评论 #38408500 未加载

评论 #38414977 未加载

sertbdfgbnfgsd超过 1 年前

Starts by saying "i sent the abstract drunk, then i had to create a talk", basically admitting he started with the conclusion and then built the argument.

评论 #38409383 未加载

4 条评论

mike_hearn超过 1 年前

评论 #38407827 未加载

评论 #38412448 未加载

评论 #38407496 未加载

0xbadcafebee超过 1 年前

评论 #38408473 未加载

评论 #38408375 未加载

评论 #38408793 未加载

评论 #38414368 未加载

评论 #38407945 未加载

blibble超过 1 年前

评论 #38406695 未加载

评论 #38408500 未加载

评论 #38414977 未加载

sertbdfgbnfgsd超过 1 年前

Starts by saying "i sent the abstract drunk, then i had to create a talk", basically admitting he started with the conclusion and then built the argument.

评论 #38409383 未加载

Two Tier Architectures Are Anachronistic [video]

4 条评论

Two Tier Architectures Are Anachronistic [video]

4 条评论