The Case for Multiple DBs in Multi Tenancy situations

Multitenancy isn’t easy, and one of the particularly challenging aspects is what to do with the database. In Ayende’s excellent posts on multi tenancy, he’s pretty clear that he feels a database per tenant is the way to go . There are two major, reasonable approaches that I know of: First, to have a database per tenant. (This is the approach discussed by Ayende in his multi tenancy series) A second common alternative is to have a tenant id column in all tables that require differentiation by client. My team was in a situation requiring multitenancy, and needed to select an approach. Wary of the idea of multiple databases, the team immediately decided to go with the tenant column approach.

Looking back, I think Ayende was right. The one advantage of the tenantid column system is not having to deal with multiple databases. There are many disadvantages. I’ll discuss the advantages, and then give some thoughts on how to deal wwith the complexity of working with multiple databases.

Advantages of multiple databases:

In my mind the most important developer-centric advantage of having multiple databases is the ability to refactor the database. Many times, developers will design a system a system with certain assumptions, then realize those assumptions are wrong. Other times, a major new feature would be much easier and cleaner with certain system changes. If the database is only talking to one system, then when the developer comes upon these situations, s/he can also refactor the database to reflect that understanding, and keep the mapping between database simple, clean and easy to understand. When the code is pushed live, db update scripts also get pushed live. However, if multiple systems (in this case tenants) talk to the same database, this kind of db structure change just can’t happen-pushing a db change would alter everyone’s db, meaning everyone’s code would also have to be updated at once. In many situations, that kind of simultaneous mass upgrade would be totally unacceptable, meaning that once this kind of multitenancy system is put in place, the single db can never again be altered.

Another advantage of multiple databases is the ability to easily load-balance by moving individual tenants to different db servers as necessary. If the databases are architected to be isolated, this isn’t a problem-change a few connection strings, and you’re done. In a shared system, all tenants suffer if one tenant is overloading the server.

On a related note, though nobody likes to think about it, clients sometimes leave, and when they do, they will want to be able to take their own data with them. If they have their own database, it’s as easy as zipping it up and getting it to them. If all tenants are on the same db, it could be a much more difficult exercise involving picking through every table and selecting out the tenant-specific data. Hopefully you dont give them anything that isn’t theirs. Which cleads to the next point.

From a business perspective, data separation is probably the most important reason to go with separate dbs. With a multitenant system, it’s impossible for one tenant to end up with anyone else’s data. Other tenants’ data is in a completely different system, so there’s no worry about messing up the tenantid and passing a client someone else’s data. Depending on the situation, this could just be embarrassing; in other situations, you might be looking at a lawsuit.

Lastly, having separated databases makes many things in the application more complicated, for the life of the application. Every join, every query, every new table, every mapping has to deal with the tenant id. If the strategy is to use the tenant id to make a composite key on the tables, then you’ve got that much additional complexity-and if you’re using a tool like NHibernate, you’ll soon find out that working with composite keys is a LOT less seamless than a single autogenerated key. Eliminate the tenantid, and a whole lot of things become forever simpler, clearer and have a lot less potential for error.

How to deal with multiple databases?

The trouble with multiple databases is that now you have 20 or even 100 databases to manage instead of one. Managing this manually would be a difficult, error-prone task that could become overwhelming. In addition, any shared data can become out of sync. The solution? Don’t do *anything* manually with these databases.

For versioning: each time you create a new version of your application, create a db script to update the db from the prior version from the last version to the newest one, and create a script to roll the database back from the newest version to the previous version. There are tools to help produce this kind of script… If you’re programming in .Net, Visual Studio Database edition has a tool to help do this for SQL Server, and Red Gate has one as well called SQL Compare. Bundle this script in with the deploy of the code, and push it all out together, to each tenant as they get upgraded.

For shared data: 99% of the time, shared data is of the sort where it needs to be updated soon, but not necessarily real-time. An example of this is cross-client data for statistical computation. To allow this kind of computation in a multiclient situation, create a process that goes to each system and collects the appropriate data, possibly in varying ways based on the tenant’s installed version, then pulls that data into a single system. The combined reporting system can be optimized and tweaked for reporting needs without affecting the designs of the individual tenants. Conversely, if there’s any reference data they all need to share (dropdown data, etc) this information can be coordinated with a similar process that does a data push to the different systems, based on application version.

This kind of automation is some up-front work, but once it’s done, it’s done, and it allows tremendous application flexibility… and once it’s set up, it’s easy-there’s no reason it couldn’t become fairly invisible from a management perspective.. In addition, it’s really a very separate task from the business application itself-it’s actually so separate that the syncing application can be written by a completely different team that knows little to nothing about the main application. For that matter, I’m not sure if there’s a tool out there that does all of these things already(I bet there is, come to think of it), but there’s no reason such a tool couldn’t exist.

The Case for Multiple DBs in Multi Tenancy situations

Comments

Leave a Reply