We decided to use DynamoDB as a cache for our expensive-to-compute statistics from the data warehouse so we can show them to users without putting a big read load on the data warehouse. DynamoDB scales really well right? And big statistics tables don't belong in the main app database. I'll build the statistics in our postgres data warehouse and use a little ruby script I wrote to push them from postgres to Dynamo with batch_write_item, shouldn't take long.<p>After spending a couple of days in terraform (I'm no infra expert) creating roles to assume, cross account permissions, modifying my script to assume those roles, figuring out the ECS run-task api syntax and some other things I'd rather forget about I kicked off the jobs to copy the data and left for the weekend. Sunday cost alert email put that thought out of my head: I just spent 8k USD on writing 2 billion rows to two tables because I misunderstood how Dynamo charges for write units. I thought I was going to spend a few hundred (still a lot, but our bill is big anyway) because I'm doing batch write requests of 25 items per call. But the dynamodb pricing doesn't care about API calls, it cares about rows written, or write capacity units, or something. OK, so how do we backfill all the historical data into Dynamo without it costing like a new car for two tables?<p>Apparently you can create new dynamodb tables via imports from S3 (can't insert into existing tables though) for basically no money (the pricing is incomprehensible but numbers I can find look small).
Now I just need to write my statistics to line delimited dynamodb-flavored json in S3 (statements dreamed up by the utterly deranged). You need to put the type you want the value to have as the key to the value you see.
A little postgres view with some CTEs to create the dynamodb-json and use the aws_s3.query_export_to_s3 function in RDS Postgres and I had a few hundred GB of nice special-snowflake json in my S3 bucket. Neat!<p>But the bucket is in the analytics account, and I needed the dynamo tables in the prod and staging accounts.
More cross account permissions, more IAM. Now prod and staging can access the analytics bucket, cool!
But they aren't allowed to read the actual data because they don't have access to the KMS keys used to encrypt the data in the analytics bucket.<p>OK, I'll create a user managed KMS key in analytics and more IAM policies to allow prod and staging accounts to use them to decrypt the data. But the data I'm writing from RDS is still using the AWS managed key, even after I setup my aws_s3_bucket_server_side_encryption_configuration in terraform to use my own managed key.
Turns out writes from RDS to S3 always use the S3 managed key, no one cares about my aws_s3_bucket_server_side_encryption_configuration. "Currently, you can't export data (from RDS) to a bucket that's encrypted with a customer managed key.".
Great. So I need to manually (yes I could figure out the aws api call and script it I know) change the encryption settings of the files in S3 after they've been written by RDS to my own custom key. And now, 4 hours of un-abortable dynamodb import jobs later, I finally have my tables in prod and staging in DynamoDB.<p>Now I just need to figure out the DynamoDB query language to actually read the data in the app. And how to mock that query language and the responses from dynamo.<p>At least I'm learning a lot...