How SQL Database Engines Work, by the Creator of SQLite (2008) [video]

813 pointsby zbaylinalmost 7 years ago

17 comments

ryanworlalmost 7 years ago

There is a more recent lecture on the same topic from 2015 at CMU: <a href="https://youtu.be/gpxnbly9bz4" rel="nofollow">https://youtu.be/gpxnbly9bz4</a>

评论 #17389277 未加载

评论 #17388143 未加载

评论 #17389047 未加载

评论 #17388579 未加载

评论 #17389056 未加载

dmorenoalmost 7 years ago

I recently created a database engine (exosql [1]), only query and no storage. It uses postgres-like foreign data wrappers to get all data.It's not valid for big datasets, as it stores all in memory (or maybe it is?), but as a learning experience has been amazing to think and develop a real database: planner, executor, choose algorithms, implement features as lateral joins and so on.I will definetly listen very carefully to these talks.[1] <a href="https://gitHub.com/Serverboards/exosql" rel="nofollow">https://gitHub.com/Serverboards/exosql</a>

评论 #17390074 未加载

评论 #17388580 未加载

pipualmost 7 years ago

I truly recommend CMU's Andy Pavlov's video lectures on the topic (and also more advanced stuff)<a href="https://www.youtube.com/playlist?list=PLSE8ODhjZXjYutVzTeAds8xUt1rcmyT7x" rel="nofollow">https://www.youtube.com/playlist?list=PLSE8ODhjZXjYutVzTeAds...</a>

评论 #17389001 未加载

评论 #17388431 未加载

评论 #17391618 未加载

logicalleealmost 7 years ago

Any video filter experts here?Request to any video filter expert------------------------------------I started watching this. The slides are unreadable but the camera is perfectly still and the slides are for several "key frames" where the compression algorithm decides to replace one set of compression artifacts for another.For example try to read the first keyword under "Translates into:":<a href="https://www.youtube.com/watch?v=Z_cX3bzkExE&t=2m14s" rel="nofollow">https://www.youtube.com/watch?v=Z_cX3bzkExE&t=2m14s</a>The keyword is unreadable at the start but as you keep looking at it over 50 keyframes it becomes readable to me.Since the camera is in a fixed position it should be possible to combine the data from those artifacts into a single superresolution video with very small assumptions. (i.e. the assumption that the image is the same image until at least 5% change or something). There's not even anyone moving in front of it.-> Can someone who actually knows this stuff apply a superresolution interlacing filter to this video and post the superresolution version somewhere?I hope this is not too much work, and I am sure we would all appreciate the results since the slides are not human-readable before applying some kind of superresolution!

评论 #17390251 未加载

评论 #17388590 未加载

评论 #17389584 未加载

dicrocealmost 7 years ago

so I was trying to figure out why a query was slow the other day... it was a nasty query with like 14 joins... I used explain and saw that it was a mess... now in my case I was able to switch to outer joints and nest related joins and got it fast.. but I had some interesting thoughts.In SQL, indexes are implicit.. they are used if available but it's easy get a large query to scan sometimes when it shouldnt... what if there was a different query language with explicit index syntax.. I think you'd get a lot more predictable performance.

评论 #17389307 未加载

评论 #17390871 未加载

评论 #17389214 未加载

评论 #17391784 未加载

apialmost 7 years ago

Sqlite is incredible. Tiny and usually used for small stuff but I have heard of 1TB+ databases with acceptable performance.

stevoskialmost 7 years ago

If you are a Java programmer and want to learn how an SQL database engine works, take a look at the source code of H2.Even better, try to add a basic feature to H2 (eg. a new built-in function). It is surprisingly easy, and you come away with a decent understanding of the basics of building an SQL database engine.

A_Personalmost 7 years ago

Gosh, I must say there seems to be some misunderstanding of RDBMS concepts in some posts in this thread!I was writing database systems professionally, back in the days before the RDBMS concept was even a thing. So here's my (enormously long and convoluted) 2 cents worth. Please be sure to pack a sandwich and get hydrated before you continue.Say you were dealing with doctors and patients, and needed to store that information in a database. Back in the day, you'd typically use a so-called hierarchical database. To design one of those, you need to decide, what is the most common access method expected to be: getting the patients for a given doctor, or the doctors for a given patient? You'd design the schema accordingly. Then, the preferred access method was easy to code, and efficient to run. The "other" access method was still possible, but harder to code, and slower to run. The database schema depended on how you thought the users would access the data.But that is absolutely what NOT what to do with an RDBMS. Certainly you look at the users' screens, reports, and so on - but that's just to determine what unique entities the system must handle - in this case, doctors and patients. Then you ignore how the users will access the data, and work out what are the inherent logical relationships between all the entities.Your initial answer might be this. A doctor can have many patients, and a patient can have many doctors. As any competent relational designer will instantly know, this means you need a resolving table whose primary key is a composite key comprising the primary keys of the other two tables. So if Mary was a patient of Tom, you'd add Mary to the patients table (if not already there), Tom to the doctors table (ditto), then add a Mary/Tom record to the resolving table. By that means, a doctor could have any number of patients, a patient could have any number of doctors, and it's trivially easy to write simple, performant SQL to access that data however you want.But then you'd have a ghastly thought: patients can also be doctors, and doctors can also be patients! Say Tom was also a patient of Mary! Now you need a Tom record in the patient's table, but that would inappropriately duplicate all his data from the doctors table! Something's clearly wrong. You'd soon see that from a data modelling viewpoint, you don't want doctors and patients as separate entities - you want a single entity Person, and a resolving table to relate arbitrary people in specified ways.So what?!!So this. IMHO, many developers using relational databases have absolutely no idea about any of that. They design hopelessly unnormalised schemas, which then need reams of ghastly SQL to get anything out. The query planner can barely parse all that crap, let along optimise it. The database has to stop every five minutes to wet its brow and take a drink.So here's my advice to any inexperienced relational database designers who have actually managed to get this far!! If you can answer the following questions off the top of your head, you're probably on the right track. If you can't, you're lacking basic knowledge that you need in order to use an RDBMS properly:- what is a primary key? - what is a foreign key? - what is an important difference between primary keys and foreign keys? - what is a composite key? When would you use one? - what are the three main relations? - what is normalisation? - what is denormalization? - what is a normal form? and so on.Just my 2c! :-)

评论 #17393966 未加载

评论 #17391022 未加载

评论 #17392507 未加载

okketalmost 7 years ago

Sadly bad audio (room mic with all the ambient noise) and bad video quality (slides are almost unreadable). But great content.

randopalmost 7 years ago

Thank you. Very educational. Interesting to know that ORDER BY includes significant performance penalty without LIMIT.

serioushahaalmost 7 years ago

slides : <a href="https://www.slideshare.net/VikasBansal23/how-sqlite-works" rel="nofollow">https://www.slideshare.net/VikasBansal23/how-sqlite-works</a>

angelfreakalmost 7 years ago

Really great, thanks for posting.

bitmapbrotheralmost 7 years ago

This is a much better talk with better video and sound.<a href="https://www.youtube.com/watch?v=Jib2AmRb_rk" rel="nofollow">https://www.youtube.com/watch?v=Jib2AmRb_rk</a>

cup-of-teaalmost 7 years ago

What is that acronym he keeps saying? MBCC?

评论 #17388687 未加载

anothergoogleralmost 7 years ago

Love how he started. People who don't stop their conversations for a presenter are the worst. People who don't stop their conversations for a presentation by Richard Hipp deserve a spell of laryngitis.

评论 #17391611 未加载

blackrockalmost 7 years ago

I'm not going to call anyone out here, but why do people keep using the word orthogonal?It doesn't even compute. It doesn't even make any sense, in how they use it in relation to the topic.Are the issues at right angles of one another? No.Are the issues statistically independent of one another? Perhaps.I suggest to use a more appropriate descriptive word to describe the situation.You folks should read the urban meaning of orthogonal, to understand how people roll their eyes at you, when you inappropriately use the term.<a href="https://www.urbandictionary.com/define.php?term=orthogonal" rel="nofollow">https://www.urbandictionary.com/define.php?term=orthogonal</a>Just another friendly PSA.

评论 #17389815 未加载

评论 #17390005 未加载

评论 #17390407 未加载

评论 #17390461 未加载

评论 #17391725 未加载

jokoonalmost 7 years ago

I don't like to use SQL engine because I don't understand how they work, I never really know if my query will be O(1), O(log(n)), O(n), etc, or what kind of algorithm will optimize my query.Who really does understand how a SQL engine work? Don't you usually require to understand how something work before starting using it? Which SQL analyst or DB architect really knows about the internals of a SQL engine? Do they know about basic data structures? Advanced data structures? Backtracking?That's why I tend to avoid systematically using a SQL engine unless the data schema is very very simple, and manage and filter the data case by case in code. SQL is good for archiving and storing data, and work as an intermediary, but I don't think it should drive how a software works. Databases can be very complex, and unfortunately, since developers like to make things complicated, it becomes hairy.I think SQL was designed when RAM was scarce and expensive, so to speed up data access, it has to be properly indexed with a database engine. I really wonder who, today, have data that cannot fit in RAM, apart from big actors.I tend to advocate for simple designs and avoid complexity as most as I can, so I might biased, but many languages already offers things like sets, maps, multimaps, etc. Tailoring data structures might yield good results too.Databases still scare me.

评论 #17388428 未加载

评论 #17388430 未加载

评论 #17388387 未加载

评论 #17388439 未加载

评论 #17388384 未加载

评论 #17389301 未加载

评论 #17440697 未加载

评论 #17388504 未加载

评论 #17390653 未加载

评论 #17388639 未加载

评论 #17388433 未加载

评论 #17388532 未加载