Indepth Report -Clustering is Not Enough
September 15th, 2008

By Albert Lee, co-founder and chief strategy officer of xkoto, Inc.

Clustering Defined
I was recently at a major database user group event and had the opportunity to speak to several hundred attendees, mostly DBAs, architects, and IT operations staffers. One of the questions I asked most often was, "what is your HA strategy?" To my surprise, a frequent response was, "what's HA?"

HA (high availability) according to Wikipedia is:

"... a system design protocol and associated implementation that ensures a certain absolute degree of operational continuity during a given measurement period."

The most common HA implementation is clustering, which Wikipedia goes on to define as:

"... computer clusters that are implemented primarily for the purpose of improving the availability of services which the cluster provides. They operate by having redundant computers or nodes which are then used to provide service when system components fail. Normally, if a server with a particular application crashes, the application will be unavailable until someone fixes the crashed server".


Hearing these definitions, the user group members responded that clustering would be their most likely HA strategy.

The Legacy
Clustering has been the availability architecture of choice for decades, whether a solution from a hardware vendor (e.g. Sun), or a software vendor (e.g. Veritas). In the early 1990s I was building database management solutions deployed in nuclear power station control rooms where operative terms like "fail-safe" and "defense-in-depth" were design rules you didn't bend. I had single 50 MHz CPU UNIX servers packed with all of 256K of memory to host my relational database. When the time came to make this environment production-grade, there was no need for a lot of analysis. The only answer was installation of a second UNIX server, in a traditional active/passive HA cluster, that would trigger a failover to the passive server if there was a failure on the active box. This was the best availability money could buy.

Fast forward to today and not much has changed. Clustering databases, servers, storage, or networks still rules when it comes to availability solutions.

The Limitations
The problem with clustering is that it requires investment in redundancy (multiple servers and databases) without adding any capacity (transactions still flow only through the primary server). And, the system remains vulnerable to a loss of service (when a failover occurs). One industry analyst recently observed that clustering can be complex to manage, noting that a client reported measuring better availability on the firm's non-clustered systems than on the clustered systems due to the configuration challenges.

Despite these limitations, clustering is still considered good enough and is still the solution of choice, because it can reduce downtime. That's not good enough. What is required is a solution that completely eliminates downtime

The consequences of using legacy-style clustering for database protection must factor in the impact of potential failure and its consequences, both personal and corporate:

• Getting paged with severity one issues at 3AM;
• Dealing with irate customers and end users struggling with application outages;
• Missed KPIs and job performance targets;
• Lost transactions;
• Significant revenue loss;
• Lower employee productivity;
• Penalties for missing SLAs;
• Regulatory fines;
• Damage to corporate brand or reputation

Add to this list the pressure that database center managers are under to get more out of their IT systems and it becomes clear that paying for idle failover servers that consume power, generate heat, take up rack space, and add to lease space costs is no longer good enough.

A Better Way
In the search for continuous availability, there are numerous alternatives to legacy HA clustering solutions. One category that is beginning to emerge is high performance database virtualization solutions that deliver significantly improved database performance and higher database reliability and availability.
Database virtualization software is resetting traditional expectations: why settle for high availability if you can achieve continuous availability?

Database virtualization provides the ability to manage a pool of active-active databases that can be co-located in the same database center or geographically dispersed across remote database centers. Taking advantage of a shared nothing storage model, each database has a complete, consistent, independent copy of all databases - this permits the dispersion and enables scalable performance and continuous availability. With multiple copies of database, a database could be removed due to planned maintenance or unplanned outages without any loss of service to the application. Unlike clustering, there is no failover and hence no failure for the end user.

The database virtualization solution places a layer of software between the applications and this managed database pool, and applications connect to it using industry-standard ODBC, JDBC, or DB2 CLI drivers. In this transparent fashion, applications think they are connecting to a single database but instead they gain all the benefits of a virtualized database infrastructure that delivers continuous availability and scale-out.

Using SQL statement replication, the database virtualization solution asynchronously replays DML statements to each managed database in order to ensure that database is identical in each instance. Without the performance penalties of synchronous schemes like two-phase commit (2PC), applications need only wait for the first database to successfully commit (instead of the last database in a 2PC) in order to continue processing. With continuous availability, rolling upgrades and planned maintenance are now enabled without any need to take applications down. Regular tasks like patching, backups, database reorgs, and schema changes can all be managed during regular business hours without disrupting end users.

The short list comparing legacy clustering to database virtualization looks like this:

Legacy HA Cluster

Active-Passive
: regardless of how many servers are clustered together, only the primary server processes requests

High Availability: clusters can provide more availability but not 100% availability

High Availability: clusters can provide more availability but not 100% availability

Failover: when the primary server becomes unavailable, there is downtime and the potential for lost transactions when the idle standby server assumes control

Local Protection: clustering requires a shared storage model which constrains the cluster to reside in the same local database center

Not Scalable: regardless of how many servers are in the cluster, only one server takes the processing load at any given time; there is no net gain in performance

Homogeneous: clustering typically requires all servers to be identical in sizing, operating system, and configuration

DATABASE VIRTUALIZATION
 
Active-Active: every server can be used at any time to take up the processing load

Continuous Availability: with three or more databases in the managed pool, applications can have non-stop availability, even with unplanned downtime or rolling maintenance

No Failure: if a server becomes unavailable, the end user is never disrupted because there is no failover whatsoever - the application remains connected because the database pool is virtualized

Remote Protection: the active-active databases can reside in the same database center or be dispersed in different database centers for maximum database protection

Scalable: additional databases can be added (or subtracted) from the pool on-demand; with each added database, query performance can improve dramatically

Heterogeneous: Can manage different servers with variations in the database versions and configuration


Where do Hypervisors Fit in?
Server virtualization is an important technology that is most commonly used today for server consolidation. Whether you use the hypervisor from VMware, Citrix, Microsoft, or some other provider, you can often gain improvements in utilization, configuration management, and cost savings by virtualizing your server infrastructure.

However, once server consolidation has been achieved, the real value of server virtualization must be proven out by running real production workloads in virtual machines (VMs). The ability to virtualize databases on physical servers will be even more important in VM environments.

The reason is simple. While server consolidation typically drives down the number of physical servers in favor of a smaller number of big servers, a VM has a limit on how many CPUs/cores and how much memory can be allocated to it. The result, scaling up a VM to handle a serious database workload is not really a viable option. Only a scale-out architecture such as that offered in a virtual database environment will leverage the full benefit of hypervisor technology for real enterprise database transactions. Deploying multiple active-active databases across local or dispersed VMs delivers the same benefits as in a physical server environment.

And Cloud Computing
There has been increasing interest in hosted database services as provided by or announced by Amazon (EC2), Microsoft (SSDS), and others. These forays into cloud computing tap an increasing desire to simplify IT operations by passing the burden of infrastructure investment and management to a third party. While these ideas have gained favor, we all remember high profile service outages at hosted app firms like Salesforce.com, reputedly the result of database cluster problems.

The need for scalable shared services, particularly database services, that are 100% available 24x7 is critical to the success of cloud computing. Scott McNealy of Sun Microsystems used to set the target for reliable web services ("web tone") to match the reliability of telephony services ("dial tone"). Any database service offered "in the cloud" will need to be continuously available. As we have seen again from the Salesforce.com example, clustering is not good enough.

Summary
I have been a help desk analyst, DBA, database designer, and database operations manager, responsible for fighting fires, avoiding fires, and firing arsonists. From my database-centric perspective, business and IT always comes down to database. I've implemented a wide range of HA strategies in the past - disk mirroring, database mirroring, log shipping, server clustering to name a few.

Clustering has been an important HA strategy and undoubtedly it will not fade away overnight. If you choose to implement clustering, you just need to be aware of its limitations and those situations where it is no longer good enough.

Database virtualization brings a better way to address the critical needs of database availability, performance scale-out, and business agility.

Albert Lee is the co-founder and chief strategy officer of xkoto, Inc. A 20-year veteran of the software industry, Mr. Lee is an expert in the ins and outs of database management systems.