Monday, November 1st, 2010

NoSQL: Don’t Take the Drug Unless You Have the Symptoms

NoSQL databases are hot, and rightly so. Many of them aspire to solve some of the trickiest database problems of all, problems that have been nagging database research for decades: scaling up and scaling out.

In some trade press and other places the NoSQL discussion has to be simplified. Sometimes a casual reader may get the impression that NoSQL just makes relational databases old-fashioned. This is very far from the truth. NoSQL is there to cure specific ailments. If you don’t suffer from any of those specific symptoms, the side effects from taking NoSQL medication may cause more pain than relief. I’ll explain why and let’s see if you agree.

The long-time holy grail of databases goes something like this: When pressure mounts, just add a bunch of cheap commodity servers. The database seamlessly distributes itself to use the accumulated CPU power. If a server goes down, the others seamlessly take over its load. When availability is an issue you install a bunch of commodity servers at a secondary site and perhaps a third one. The database seamlessly and instantly replicates its data to the new sites. The sites are able to seamlessly take over each other’s duties and dynamically balance the load between themselves. Scalability and high availability without worries.

A problem that’s so easy to state can’t be too hard to solve, can it? That’s what your pointy-haired manager would insist. Sounds cute, let’s have it. In reality database researchers have scratched their heads about this for decades. They have actually come out with an answer: You can do it, and we can prove that the solution is two-phase commit.

Great, so there is a solution? Sorry, in theory only. Two-phase commit means, in essence, that you may partition and distribute your database, but at every change all the partitions must take a vote. All votes must be “go” for the change to become permanent. Perhaps this doesn’t sound too bad until you consider what happens if one of the partitions has a problem. All the other ones hold up their green “go” cards, but the transaction stalls helplessly until the last partition has had its say. Two-phase commit works, given a perfect network, faultless hardware and bug-free software. Hmm, wasn’t the original problem to compensate for an imperfect network, hardware faults and software bugs? It was, so two-phase commit gets us nowhere.

If two-phase commit provably is the mathematically correct solution, but doesn’t work, does this mean that there is no solution? That’s right. We can be quite sure that there is no practical solution to the general problem stated above. It’s been thoroughly investigated by brilliant minds over a long time.

Before you get too upset, note that I said the general problem. Let’s consider what this means for relational databases because they are positioned as general-purpose. For a wide range of problems you may be confident that you can use a relational database as long as you avoid the distribution and replication aspects portrayed in the holy grail scenario above. These days some very visible applications depend critically on wholesale distribution and replication. If you want to include those tricky aspects you cannot hope to come out with a general-purpose solution. This is why relational databases don’t always measure up.

NoSQL databases cut the Gordian knot by avoiding a general-purpose approach. For instance, CouchDB and others very consciously sacrifice airtight consistency for eventual consistency. This is acceptable for many useful applications, but excludes areas like billing and banking. The approach is to find a useful and practicable subset of total holy grail and make it work. For the targeted applications the NoSQL trade-off may be completely reasonable. Now get ready for the next hurdle.

If you shift to NoSQL technology you would expect a certain learning curve. Of course it takes some time to get used to different ways of doing things. What you might not expect is that, even when you have educated yourself and your staff, you may have to write a lot more code to get things done than you did previously. Piloting a relational database is like controlling a powerful utility truck from an air-conditioned, noise-proofed cabin, seated in a ergonomic seat. The SQL ecosystem has produced tons of tools, patterns and helpful utilities.

There is no “NoSQL ecosystem” because, by definition, no two NoSQL databases are compatible or will ever be. Be prepared for an uncushioned, bumpy ride. Here are a few things for you to evaluate when checking out NoSQL databases. Each one of these factors will compel you to write more code than with a relational database.

  1. Lower level of abstraction. Expect a simple key-value database concept. Expect explicit navigation rather than a non-procedural query language.
  2. Schema-less database. This is sometimes touted as a feature, but amounts to shifting the burden of database consistency to applications. Don’t expect a NoSQL database to have the vaguest clue about your data model. It won’t stop a buggy application from filling your database with blatant nonsense. Thorough testing is absolutely necessary.
  3. Absence of utilities. You will probably have to write your own tools even for basic database browsing and for maintaing data models.
  4. Expect database structure to be immersed in application code. A lot more discipline is required to produce maintainable code.

This should not be taken as derogatory. If you get past those hurdles you can do things you simply couldn’t with a relational database. The point is there is a price to pay. It’s no coincidence that the businesses that use NoSQL with commercial success are giants with enormous development budgets. They have the resources to use or even invent the NoSQL technology required for their specific purposes. You may be better off using one of the application platforms some of them offer.

So when is a NoSQL database the right choice? The software-as-a-service (SaaS) business model is a prime candidate. In this business model a large number of customers pay a small fee for a service available over the Net. In many cases the only viable architecture is “single instance, multi-tenant”. Given thousands of customers, the cost of patching and upgrading each account individually would be prohibitive. So there can only be a single instance of the application. Nonetheless, the user experience should amount to having a dedicated installation, even including customizations. The focal point is the database supporting the application. Response time and availability requirements are formidable. Compare this mass production scenario to car production. The cost to create the tools for car production is enormous, but the incremental cost of producing one more car is low. Likewise setting up a SaaS facility may be very costly due to the specialized technology used, like NoSQL. The incremental cost of adding one more user is low which is vital for staying in business.

To sum it up, NoSQL technology is not a cure-all, but answers to specific needs. Expect to spend more developer hours than with a relational database to get the same perceived functionality. You pay this price in order to get precious new options: Scaling up and scaling out the application in a way that is very difficult with relational databases.

2 Comments on “NoSQL: Don’t Take the Drug Unless You Have the Symptoms”

  1. Thanks for a good article. I like the medicine analogy, with its (implied) corollary that the wrong medicine is effectively poison. Very apt.

    I do sort of disagree with your characterization of SaaS as a good fit for NoSQL. As I’ve noted on my own blog, many NoSQL options are notably lacking in any kind of security let alone other aspects of multi-tenancy. Things have gotten better since then, but there’s still a long way to go. Until it improves any more, I think NoSQL will continue to be best deployed wholly within a single trust domain (typically a single app or user).