Not to play Code Golf here but the premise that retention takes 50+ lines of SQL is plain wrong.<p>Here's an implementation (T-SQL) in less than 10 lines. Table setup script is here: <a href="https://pastebin.com/9DajyzLp" rel="nofollow">https://pastebin.com/9DajyzLp</a> . There are probably even shorter ways of doing it. Anyone who objects to hardcoded dates, just swap out with a relative DATEADD.<p>;WITH lastweek AS (
SELECT USERID, TS, [ACTION] FROM Users WHERE TS BETWEEN '2022-06-13' AND '2022-06-19' )<p>SELECT ROUND(CAST(f1.[c] AS FLOAT) / CAST(f2.[c] AS FLOAT),2) FROM<p>( SELECT COUNT(*) [c] FROM lastweek w INNER JOIN Users u ON w.USERID = u.USERID AND w.[ACTION] = 'A' AND u.TS BETWEEN '2022-06-20' AND '2022-06-26' ) f1,<p>( SELECT COUNT(*) [c] FROM Users u WHERE u.[ACTION] = 'A' ) f2
Process Mining is a relatively new field of data analytics, focusing on exactly these types of questions using workflow or activity sequence data. Celonis, one of the biggest players in the field have even developed a query languages called PQL for querying these kind of information from “event logs”
The second example (with the sequences v. the table of boolean columns) seems like it'd be immediately resolved by replacing those booleans with timestamps.
The Arthur has no clue about SQL.
He does not seem to have heard of a basic SQL capability - CTEs (Common Table Expressions)
These make it very simple to sequentially process/query data.