I'm curious as to why you choose to break out specific headers in the schema.<p>For example, you have recipients, subject, and sender as JSON fields, when you could have just a headers field with all of them, and even add the rest of the headers in the message.<p>If it's performance related, you can still have headers as a single json blob and then use generated columns for the specific fields.<p>For example<p><pre><code> CREATE TABLE IF NOT EXISTS "messages" (
"id" INTEGER NOT NULL PRIMARY KEY, -- internal id
"message_id" TEXT NOT NULL, -- Gmail message id
"thread_id" TEXT NOT NULL, -- Gmail thread id
"headers" JSON NOT NULL, -- JSON object of { "header": value },
"subject" TEXT GENERATED ALWAYS AS (json_extract("headers", '$.Subject')) VIRTUAL NOT NULL)
...
);
CREATE INDEX subjectidx on messages(subject);
</code></pre>
I've found this model really powerful, as it allows users to just alter table to add indexed generated columns as they need for their specific queries. For example, if I wanted to query dkim status, it's as simple as<p><pre><code> ALTER TABLE messages ADD dkim TEXT GENERATED ALWAYS AS (json_extract("headers", '$."Dkim-Signature"')) VIRTUAL NOT NULL);
CREATE INDEX dkimidx on messages(dkim);
SELECT dkim, COUNT(0) FROM messages GROUP BY dkim;
</code></pre>
or whatever you want.