I have a question about this, that I probably can't explain well in a single comment. Normally I have well-defined schemas for my data, and obviously prefer this for its uncountable benefits.<p>However, over the years I have occasionally and reluctantly used the 'Derek table', for importing domain data with a dynamic schema, aware that I will be paying dearly for it.<p>My question is: What alternatives are there..? Other kinds of databases, maybe those used for 'big data'?<p>I must clarify:<p>(1) I know I could instead create proper db tables on the fly - this way, I _can_ have varying columns depending on each new set of domain data I import.
However, IF I do this, I will now have to dynamically build my SQL queries, to refer to these varying tables and column names. The BUILDING of those SQL queries is not to fear, but the query execution of them is, since each new variant is a not previously seen db execution plan, and some of those will hit weird performance problems. I am painfully aware of this, because I have worked(still do) 'lifeguard duty' on a production database, where I routinely had to yet again investigate how a user this time had created a dbms-choking query with exactly this technique.<p>(2) In the derek approach, the approach would usually be to violently retrieve 'all the query-relevant records' (e.g., the contents of 'project'/'document'),
and then do the actual calculation in code, e.g. C#.
This of course has the downsides of<p>(2.1) - we must haul huge (relatively speaking) amounts of raw data off the db, since we are not summarizing it to the calculation-end-result before-hand.
But this is also the 'benefit': We know we won't bother the db further than this initial and rather simple haul/read.
(I AM aware I could also query the derek monstrosity directly, but I have seen enough of that to know to avoid that, to not bring the dbms to the curb.)<p>(2.2) This is an extension of 2.1: Since we are working with the raw data, we must 'pay' both for moving the big chunk of data over the network, and also/often for having it in working memory on the actual external processing/calculation server (this may not be true, if the calculation can be done piece-wise working on a stream).
And, of course, there is the entire cost of 'a single table cell is now a whole db row'.<p>Echoing poor Derek, the benefits of the described approach, is that it actually takes relatively simple approach and code to build the solution this way,
at a tremendous cost to resources/efficiency.
If I did 'the right thing', I would have to write considerably more and more complex code, to handle the dynamic DDL/schema processing, to dynamically work with the real DB schema.<p>Back in the 90's, I would by necessity have 'done the right thing', since the Derek approach was doomed performance-wise then. But now, in the 2020's, we have so much computing power,
we can survive - for a time - by wasting extravagant amounts of resources.<p>To recap/PS: Whenever I have done this, it has been for a small subset of specific data;
I have never done it in the insane "one single table", with dynamic tables too.
My case has always been 'dynamic fields'.<p>Also, for context: The 'calculation' to be done, typically amounts to what could generously be referred to as a bit pivot-table operation on a heterogeneous set of data (which is why, expressed on proper SQL, the query would be rather verbose and unwieldy, accounting for all those heterogeneous and possibly-present fields on the different source tables/datasets)