I recently got a Frontend Engineering offer from Facebook. I have dabbled in backend work earlier in my carrier but never did anything beyond simple API’s on a single server. I set a goal for myself that every year I pick a computer science topic & study it thoroughly. I dont have a CS background but I am begining to love CS.<p>I started with Data Structures & Algorithms last year, studied & did over 250 leets, this is why I was able to land an offer with FAANG.<p>Next on my list is Databases, I want to know how they work internally, build a simple RDBS from scratch, learn SQL(I know simple CRUD operations) advanced concepts like procedures & the latest that is being used today.<p>I have googled yes, but I havent found any resource that meets my needs. I also plan to switch to backend soon.<p>Thanks in advance<p>Edit:
I know I will not be building databases at Facebook, & I also know they probably have internal tools or ORM to access databases. My goal is not to become a database developer but to have a good knowledge of how they work just to satisfy my curiosity.
It gets recommended all the time in these kind of threads, but it's so good I don't care. Bill Karwin's SQL Antipatterns. You need a decent understanding of the basics to get the most from it, but there's some excellent information and examples of what to (and what not to) do.<p><a href="https://www.oreilly.com/library/view/sql-antipatterns/9781680500073/" rel="nofollow">https://www.oreilly.com/library/view/sql-antipatterns/978168...</a>
Great intro and overview: <a href="http://coding-geek.com/how-databases-work/" rel="nofollow">http://coding-geek.com/how-databases-work/</a><p>Great book: <a href="https://dataintensive.net/" rel="nofollow">https://dataintensive.net/</a><p>Also great read and overview: <a href="http://www.redbook.io/" rel="nofollow">http://www.redbook.io/</a><p>Great paper over-viewing the architecture of a DB: <a href="https://perspectives.mvdirona.com/content/binary/ArchitectureOfDatabaseSystem.pdf" rel="nofollow">https://perspectives.mvdirona.com/content/binary/Architectur...</a><p>If you're looking into building your own database, there are some great open source projects you can reference here: <a href="https://github.com/danistefanovic/build-your-own-x#build-your-own-database" rel="nofollow">https://github.com/danistefanovic/build-your-own-x#build-you...</a><p>If you want to actually dive into source code - SQLite is amazing. It has very clean and readable code, so I'd suggest using it as a reference as well: <a href="https://github.com/mackyle/sqlite" rel="nofollow">https://github.com/mackyle/sqlite</a>
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems<p><a href="https://dataintensive.net/" rel="nofollow">https://dataintensive.net/</a>
Anything by Joe Celko: <i>SQL for Smarties</i>, <i>Trees and Hierarchies in SQL for Smarties</i>, Joe Celko<p>Also, the internals of Django ORM (<a href="https://github.com/django/django/tree/2.2.5/django/db/models" rel="nofollow">https://github.com/django/django/tree/2.2.5/django/db/models</a>) and SQLAlchemy Core (<a href="https://github.com/sqlalchemy/sqlalchemy/tree/rel_1_3_8/lib/sqlalchemy/sql" rel="nofollow">https://github.com/sqlalchemy/sqlalchemy/tree/rel_1_3_8/lib/...</a>) and its dialects (<a href="https://github.com/sqlalchemy/sqlalchemy/tree/rel_1_3_8/lib/sqlalchemy/dialects" rel="nofollow">https://github.com/sqlalchemy/sqlalchemy/tree/rel_1_3_8/lib/...</a>) + ORM (<a href="https://github.com/sqlalchemy/sqlalchemy/tree/rel_1_3_8/lib/sqlalchemy/orm" rel="nofollow">https://github.com/sqlalchemy/sqlalchemy/tree/rel_1_3_8/lib/...</a>)
I have three things for you<p>1. Designing data intenstive applications<p>2. Database internals <a href="https://www.amazon.com/Database-Internals-deep-dive-distributed-systems/dp/1492040347" rel="nofollow">https://www.amazon.com/Database-Internals-deep-dive-distribu...</a><p>3. Andy Pavlo's database course videos at cmu and guest lecture series
<a href="https://www.youtube.com/channel/UCHnBsf2rH-K7pn09rb3qvkA" rel="nofollow">https://www.youtube.com/channel/UCHnBsf2rH-K7pn09rb3qvkA</a>
I really suggest against building a database from scratch. It's just too annoying, and there's so much code to write (parser, storage, indexing, query planner, connections). If you're interested in internals, I'd say look at the sqlite codebase instead: <a href="https://sqlite.org/src/doc/trunk/README.md" rel="nofollow">https://sqlite.org/src/doc/trunk/README.md</a> . If anything, reading code that works is probably more useful than writing code that almost certainly won't without months and possibly years of effort.<p>A lot of the more complex database things are only really learned by having a large database system. Performance, distributed databases, and complex schemas come to mind here. Most of the times with simple examples, you'll do something wrong performance wise, but you'll never know because of the scale (such as forgetting an index, or doing a bad join).<p>Many times, you don't need to know that much about database other than some basic SQL.
These are more academic than practical (i.e. build a DB from scratch) but still interesting I think.<p><a href="https://github.com/rxin/db-readings" rel="nofollow">https://github.com/rxin/db-readings</a>
I found this an enjoyable resource for learning about one of the fundamentals of RDBMS, indices: <a href="https://use-the-index-luke.com/" rel="nofollow">https://use-the-index-luke.com/</a>
anything on this channel - <a href="https://m.youtube.com/channel/UCHnBsf2rH-K7pn09rb3qvkA" rel="nofollow">https://m.youtube.com/channel/UCHnBsf2rH-K7pn09rb3qvkA</a> - CMU DATABASE GROUP.
all thanks to - <a href="https://mobile.twitter.com/andy_pavlo" rel="nofollow">https://mobile.twitter.com/andy_pavlo</a> - Andy Pavlo - has a quote, something like "I only love two things, my wife and the databases".
Follow his lectures and read his suggested papers.
I would pick one RDBMS and try to dissect it, there is a lot to chose from nowadays, you can check out db-engines to get a general sense of what's out there:<p><a href="https://db-engines.com/en/ranking" rel="nofollow">https://db-engines.com/en/ranking</a><p>From what I have seen most enterprises today will be using Oracle or Microsoft, however PostgreSQL seems to have gained popularity with the web developer and small business crowd (as well as with the HN community). I have been an Oracle database developer since 2015 and would definitely recommend going that route if it interests you, at the very least it might be a good starting point because of the fantastic documentation, here's a great guide I recommend to get you started with all the basic concepts:<p><a href="https://docs.oracle.com/en/database/oracle/oracle-database/19/cncpt/index.html" rel="nofollow">https://docs.oracle.com/en/database/oracle/oracle-database/1...</a>
AWS re:invent 2018 talks:<p><a href="https://youtu.be/HaEPXoXVf2k" rel="nofollow">https://youtu.be/HaEPXoXVf2k</a><p>He has a sequence of 2-3 great talks on DynamoDB, the history of relational databases and the rise of access-pattern oriented db design.
I back up the previous hints for Designing data intensive applications and Database internals. I would suggest also to look at Jepsen tests, <a href="https://aphyr.com/tags/jepsen" rel="nofollow">https://aphyr.com/tags/jepsen</a>, and Adrian Colyer's blog, <a href="https://blog.acolyer.org/" rel="nofollow">https://blog.acolyer.org/</a>