Multi-Tenant Architectures

185 pointsby sirkarthikover 4 years ago

22 comments

hashamaliover 4 years ago

I’ve found shared app, shared database completely workable by utilizing Postgres’ row level security. Each row in any table is locked by a “tenant.id” value matching a tenant_id column. At the application level, make all requests set the appropriate tenant ID at request time. You get the data “isolation” while using the simplest infrastructure setup.

评论 #24326693 未加载

评论 #24328852 未加载

评论 #24324976 未加载

评论 #24325305 未加载

评论 #24328405 未加载

评论 #24326013 未加载

评论 #24326791 未加载

评论 #24326149 未加载

kissgyorgyover 4 years ago

This is my favorite article on the topic: <a href="http://ramblingsofraju.com/wp-content/uploads/2016/08/Multi-Tenant-Data-Architecture.pdf" rel="nofollow">http://ramblingsofraju.com/wp-content/uploads/2016/08/Multi-...</a>Once it was available on MSDN site of Microsoft, but I can't find it on microsoft.com anymore.

评论 #24327204 未加载

评论 #24324864 未加载

zzzeekover 4 years ago

Pretty telling that the top two comments at the moment endorse the two most diametrically opposite approaches possible: single app and single set of tables with row level security, vs totally separate apps + separate dbs entirely.I think it really depends on the kind of application and the kind of user / customer you're dealing with. I'd probably lean towards "single" database with horizontal sharding .

评论 #24325084 未加载

Geeeover 4 years ago

Why is the shared app & shared database the most complex beast? Technically it's the simplest and requires the least amount of work to set up and maintain. You can even run the app & db on the same server. The complexity of maintaining multiple app & db instances isn't worth it, at least when you're starting out.

评论 #24326831 未加载

评论 #24326941 未加载

评论 #24327669 未加载

评论 #24326754 未加载

techdragonover 4 years ago

Writing a shared app shared database tenant isolation database middleware for Django was one of the most interesting challenges I’ve had over the years. The library was 100% tested with Hypothesis to randomise the data, and used ULIDs to allow for better long term tenant sharding and since ULIDs are compatible with UUIDs they can be dumped/propagated into other systems for analysis/analytical queries. It was quite a lesson in what 100% test coverage does not actually prove since I still had bugs at 100% coverage that took work to chase down, side effects, false positive/negatives, etc.

评论 #24325577 未加载

评论 #24324990 未加载

kissgyorgyover 4 years ago

I read multiple articles on the topic years ago, dealt with and designed multi-tenant systems, and my approach is very simple for small developer teams: separate databases, separate app instances running on separate domains for every tenant. This is the least technically complex to implement and there are very few mistakes you can make (only deployment). There are a lot of tools nowadays which can help you automate and isolate the environment for these (Docker, Ansible, whatever). Also it can be the most secure architecture of all.

评论 #24325143 未加载

评论 #24324825 未加载

评论 #24325223 未加载

评论 #24324840 未加载

pegas1over 4 years ago

One aspect left out is upgrading: can you always distribute new features to all tenants at the same time? If a new feature requires some training or organizational change, then you need to deploy at the moment agreed on with the tenant. From this point if view, models 1. and 3. are viable.If extensions are rare, you can keep a switch for each and separate upgrades on model 4. However, if you want changes on switch, then you have to keep old code in conditional branches of your code forever. High cost of ownership.

saberdancerover 4 years ago

A colleague implemented a hybrid system for a SaaS product he was leading. Normally the product is a single app, single database with column as a tenant ID discriminator, but for specific tenants he built in an option to specify a separate database. This allowed all tenants that wanted higher performance or data saved in a separate location to be able to buy in, while most of the tenants were in a multitenant database.Solution I implemented on my project (IoT) was shared apps, shared database. When we were starting the project, we decided to use a column as a discriminator and designed the system so that developers for entities that need to be tenant specific, just need to extend an abstract class, the rest of the system detects when you are trying to save or load such an entity and in those cases it applies a filter or automatically assigns the tenant ID. This means that normal developer can work just like he would on a single tenant application. I feel this is pretty normal stuff.

icedchaiover 4 years ago

In practice, I've seen both shared app, shared DBs, and shared app, separate DBs. "Shared / separate DBs" is not actually so black and white. I recommend making your system configurable so you can dedicate DBs to a specific tenant (or group of tenants) if needed. Most of them probably won't need it...

awinter-pyover 4 years ago

such an interesting topic. Not faulting this article for staying focused, but other questions here are:- hybrid model where paid / larger clients get dedicated hardware for perf reasons- YAGNI -- will you even need multitenancy?- business + legal considerations; which industries have legal or regulatory requirements not to intermix- sharing and permissions -- what happens the first time someone needs to share a doc cross-account?- tools and codebase strategies for verifying permission model on shared arch

评论 #24325716 未加载

RickJWagnerover 4 years ago

The author doesn't mention application updates, which can be important.Shared application scenarios can bring headaches when different customers want different application behavior implemented in upgrades.

评论 #24324900 未加载

评论 #24325424 未加载

评论 #24327688 未加载

aszenover 4 years ago

Earlier we used to have multiple web apps with separate databases, now we are using a single web app that can connect to different databases and configuration based on the subdomain, so far it's worked great, having a single web app really makes development and deployment a lot easier. We often host multiple databases in a single rds instance to reduce costs. We get data isolation and don't have to deal with sharding, of course this works well for enterprise applications with just 50 - 100 tenants.

lukevpover 4 years ago

It’s really best to design everything for a shared app shared db model, and to have a user/tenant heirarchy in every app. You can start with each user belonging to their own tenant. You can in the future shard the db and/or app by tenant for scalability, you can provide a dedicated instance as required for data residency or whatever.

rkagererover 4 years ago

Perhaps I missed the point, but it seems strange and artificial to me that all this article considers when discussing multi-tenant architectures are the database and the app. There's so much more that goes into actually delivering multi-tenancy in production.Long ago I was the product manager of a complex enterprise platform which had been heavily customized for one of our large banking customers. They hosted the database on SQL Server shared clusters and much of the application backend in VMware instances running on "mainframe-grade" servers (dozens of cores, exotic high-speed storage). The hardware outlay alone was many hundred thousand dollars, and we interfaced with no less than 5 FTE's who comprised part of the teams maintaining it. Ours was one of a few applications hosted on their stack.Despite repeated assurances of dedicated resource provisioning committed to us, our users often reported intermittent performance issues resulting in timeouts in our app. I was the first to admit our code had lots of runway remaining for performance optimization and more elegant handling of network blips. We embarked on a concerted effort to clean it up and saw huge improvements (which happily amortized to all of our other customers), but some of the performance issues still lingered. Over and over again in meetings IT pointed their fingers at us.Eventually we replicated the issue in their DEV environment using lots of scrubbed and sanitized data, and small armies of volunteer users. I had a quite powerful laptop for the time (several CPU cores, 32GB RAM, high-end SSD's in RAID) and during our internal testing I actually hosted an entire scaled-down version of their DEV environment on it. During a site visit, we migrated their scrubbed data to my machine and connected all their clients to it. That's right, my little laptop replaced their whole back-end. It ran a bit slower but after several hours the users reported zero timeouts. This cheeky little demonstration finally caught the attention of some higher-up VP's who pushed hard on their IT department. A week later they traced the issue to a completely unrelated application that somehow managed to monopolize a good chunk of their storage bandwidth at certain points in the day. Our application was one of their more-utilized ones, but I bet correcting this issue must also have brought some relief to their other "tenants".I know this isn't a perfect example, but it demonstrates how architecture encompasses a whole lot more than just the DB and apps. There's underlying hardware, resource provisioning, trust boundaries, isolation and security guarantees, risk containment, management, performance, monitoring and alerting, backups, availability and redundancy, upgrade and rollback capabilities, billing, etc. When you scale up toward Heroku/AWS/Azure/Google Cloud size I imagine such concerns must be quite prominent.

评论 #24328896 未加载

kaydubover 4 years ago

If you want to scale you share the apps and databases.Nothing worse than having to spin up more and more instances and manage more and more hardware/virtual hardware/services as your customer base grows.

amanziover 4 years ago

I quite like the Wordpress Multisite model which deploys a separate set of tables for each blog in a single MySQL database. Then you can add on the HyperDB plugin which lets you create rules to split the sets of tables into different databases. This gives a lot of flexibility.

poloteover 4 years ago

previous related discussion <a href="https://news.ycombinator.com/item?id=23305111" rel="nofollow">https://news.ycombinator.com/item?id=23305111</a> Ask HN: Has anybody shipped a web app at scale with 1 DB per account? 262 comments

FpUserover 4 years ago

I design my products as multi-tenant (both code and database). This does not mean however that the result can not be used as if it was a bunch of single tenant instances. It is up to the client how they decide to deploy.

justincormackover 4 years ago

No mention of sharding by usage pattern, which is the usual pattern at scale, eg partition potentially app and database differently for users with different fanout or scale or other properties that affect scaling.

tapplebyover 4 years ago

Are there any strategies for migrating from a separate db per tenant to shared db with scoped tenant_id? In this case each tenant would have overlapping primary keys.

评论 #24328117 未加载

yelloweyesover 4 years ago

>Separate Apps & Separate Databases> Suitability: This is the best way to begin your SAAS platform, for product-market fitment until stability and growth.Uh what? How is this even feasible when you get to 1000s of clients?

评论 #24327094 未加载

评论 #24333914 未加载

tracer4201over 4 years ago

I worked on a system with the shared app + shared database model. At its core, we received events (5-10KB) from customers and did something with those events. In total, we were receiving 8K-10K events per second.In terms of security and privacy, isolating an individual tenant from others wasn't so much of a concern as each tenant was a customer within the organization with the same data classification level. So from a security perspective, we were "okay".Where this gets interesting is that one tenant would suddenly decide to push a massive volume of data. Now processing events within a specific SLA was a critical non-functional requirement with this system. So then our on-call engineers would get alerts because shared messaging queues were getting backed up since Mr. Bob had decided to give us 3-5x his typical volume.The traffic spike from one customer, which could last from minutes to hours, would negatively impact our SLAs with the other customers. Now all the customers would be upset. ^_0Being internal customers, they were willing to pay for the excess traffic, but we didn't really have the tooling to auto scale. Our customers also didn't want us to rate limit them. Their expectation was that when they have traffic spikes, we need to be able to deal with it.Now – we didn't want to run extra machines that sat idle like 90% of the time. And when we had these traffic spikes, we'd see our metrics and find maxed out CPU, memory, and even worse, we'd consume all disk space from the log volume on the machines filling everything up. The hosts would become zombies until someone logged in and manually freed up disk.There were a few lessons learned:1. Rate limit your customers (if your organization allows).2. If your customers are adamant that in some instances each month, they need to be able to send you 5x the traffic without any notice, then you can't just rate limit them and be done with it. We adopted a solution where we would let our queues back up while some monitors would detect the excessive CPU or memory usage and would start scaling out the infrastructure. Once our monitors saw the message queues were looking normal again, they'd wait a little while and then scale back down.3. When you're processing from a message queue, you need to capture metrics to track which customer is sending you what volume. Otherwise, you can have metrics on the message queues themselves and have one queue per customer.4. If it's a matter of life and death (it wasn't, but that's how one customer described it), something you can do is stop logging when disk space usage exceeds a specific amount.5. Also – when you have a high throughput system, think very carefully about every log statement you have. What is its purpose? Does it really add value?

评论 #24331617 未加载

评论 #24327006 未加载

22 comments

hashamaliover 4 years ago

评论 #24326693 未加载

评论 #24328852 未加载

评论 #24324976 未加载

评论 #24325305 未加载

评论 #24328405 未加载

评论 #24326013 未加载

评论 #24326791 未加载

评论 #24326149 未加载

kissgyorgyover 4 years ago

评论 #24327204 未加载

评论 #24324864 未加载

zzzeekover 4 years ago

评论 #24325084 未加载

Geeeover 4 years ago

评论 #24326831 未加载

评论 #24326941 未加载

评论 #24327669 未加载

评论 #24326754 未加载

techdragonover 4 years ago

评论 #24325577 未加载

评论 #24324990 未加载

kissgyorgyover 4 years ago

评论 #24325143 未加载

评论 #24324825 未加载

评论 #24325223 未加载

评论 #24324840 未加载

pegas1over 4 years ago

saberdancerover 4 years ago

icedchaiover 4 years ago

awinter-pyover 4 years ago

评论 #24325716 未加载

RickJWagnerover 4 years ago

评论 #24324900 未加载

评论 #24325424 未加载

评论 #24327688 未加载

aszenover 4 years ago

lukevpover 4 years ago

rkagererover 4 years ago

评论 #24328896 未加载

kaydubover 4 years ago

amanziover 4 years ago

poloteover 4 years ago

FpUserover 4 years ago

justincormackover 4 years ago

tapplebyover 4 years ago

Are there any strategies for migrating from a separate db per tenant to shared db with scoped tenant_id? In this case each tenant would have overlapping primary keys.