Fundamentals of High Availability

High Availability concerns more and more companies as individuals, by dependence that create the Internet and new technologies, which are not always available. There is no standard regarding the duration of service interruption. It depends on the context and the criticality of the application.

For example, a navigation system in an aircraft is designed to have a lockup period 5 minutes per year, while the application billing company will be designed to a lockup period of one day per year.


haute-dispoHigh availability is defined as a system to ensure operational continuity of service over a given period. To measure the availability, a scale is used which is composed of 9 levels. A Highly Available Service is 99% available less than 3,65 days per year.

Calculer affinity to Availability, the following metrics are used:

1.MTBF (Mean Time Between Failure) : measure the estimated time between 2 failure of a system.

2.MTTR (Mean Time to Resolution) : measure the estimated time to restore functionality.

At East formulas availability : Availibility MTBF = / (MTBF + MTTR)

Internet and High Availability

As more and more businesses, Internet is at the heart of the business and the need for availability is constant. Indeed, This media is used to communicate as much outward support for many business applications (CRM, ERP, etc.) or telephony.

It is therefore necessary to distinguish between the needs of the company on two levels : services available to customers versus services necessary for the internal functioning. One of the most telling examples is the websites of companies, which are now at the center of communication and most business enterprises.

High availability websites is organized around different axes which can be crucial:

• redundant hardware,

• Location of equipment,

• Application updates server security applications,

• securing the corporate network,

• the continued availability of backup solution / emergency / disaster recovery,

• sizing power equipment.

Hardware redundancy

Redundancy is the mechanism to replicate one or more components of an architecture with one or more identical elements.

Have n server(s) sur x site(s) allow redundancy of information, with a risk of failure divided by n x…

However we need systems that can automatically switch from one site to another. Systems most commonly implemented to ensure this redundancy are clusters.

Clusters can be active / passive or active / active. The first case represents a group of machines on which relief toggle infrastructure, while an active / active system will have both systems in parallel operation; in that event, one of the two materials can work solo.

Maintenance of applications and software updates

Applications can submit bugs, the resolution updates can correct these defects. Thus we can avoid malicious people explore a loophole that would allow access to company information. Have maintenance service therefore is important and sometimes taking into account the technical skills, it makes sense to outsource maintenance.

Failover in the heart of the device

It is a plan that allows you to resume full or partial activity after a disaster occurs in the information system. The purpose of this plan is to minimize the impact of the disaster on the activities of the company.

The key points in a recovery plan are:

• backup equipment

• the availability of emergency equipment

• backup solutions, with degraded mode (service quality) ex.: backup link with a smaller flow

Enforcement : the availability of internet for business

More and more applications require high availability of the Internet to function… The functioning of the Internet (see article on lab CELESTE) but also specific Internet connections help ensure full availability of it « naturally ».

Advanced solutions can greatly reduce the risk of failure or degradation of services :

1. Have multiple internet connections through different land issues

2. Ensure the permanent availability of a backup solution / emergency / disaster recovery (mode transparent)

3. Opting for connections with Recovery Time Warranty

4. Choose connections with guaranteed bandwidth

Have no internet connections

Have two internet connections via two different land locations may allow secure Internet access solution. In case of failure of one of the lines, This traffic is routed to say is automatically redirected to the second. The presence of two routers in active / passive mode strengthens the redundancy of the system. In that event, it is better to opt for an automatic backup, transparent to users.

Have a backup plan

In case of hardware failure, the material can be redundant infrastructure operator : while the equipment works seamlessly in case of failure, and as before the implementation of two routers system redundancy reinforces.

Opt for a warranty Restoration Time

The GTR is the guarantee of recovery time on a connection in the case of a service interruption. The GTR must ensure that the service interruption is the least detrimental to the company.

A GTR 4 hour internet connections will be the ideal option for IP telephony solutions or as part of a corporate IP VPN , especially when it allows the use of an ERP / centralized CRM.

Choose connections with guaranteed bandwidth

Even if the Internet service is not completely interrupted, it could be greatly altered. In that event, should check with their service provider to have a guaranteed rate. This is particularly important in the case of IP telephony. Indeed, an alteration of the quality of the link will direct consequence a decrease in the quality of the telephone communication.

For all links, SDSL, optical fiber but also ADSL, to have a guaranteed rate, a priority channel to ensure a minimum bandwidth for each specific application or use of the company is configured (telephony, internet…).

To provide a highly available service, must ensure that the infrastructure for the provision of this service is functional 100% time. In this article we are mainly focused on internet links, but do not forget the energy, air conditioning, servers, etc..

Criticality or availability rate necessary for an application or service thus guide the choice of Internet connections to implement a high availability solution. And all connections are not worth !

The R&D service