What can we learn from the Westpac online banking outage?

What can we learn from the Westpac online banking outage?

Last week, Westpac’s new online banking system was offline for nine hours.

The bank released the following statement: 

“Westpac customers will not incur any additional fees as result of this delay. We encourage all customers including business customers who have any queries to contact their local bank manager or a customer service representative on 132 032.”

This would have caused some problems and a lot of frustration for anyone who was desperately trying to get something done via online banking, but the world did not stop spinning.

By the time you are reading this most of the bank’s clients will have resolved any grievances and it will be fading into memory as yet another service that was temporarily offline. The bank will assess its losses and make some changes to ensure that particular series of events is prevented in the future.

Often in small business we think our systems should be bulletproof and have 100% up time. We forget that all systems are susceptible to downtime; even the big banks, which have teams working on high availability, governance, disaster recovery planning and business continuity.

In SME businesses it is important to ensure our data is safe and we have the best sensible protection in place, as the best possible protection is bound to be too expensive to contemplate in a small business. Yet too often there is insufficient investment leading to lack of redundancy, little or no option for recovery due to poor design or poor backup process.

Recently I had a client complain about a mail server outage of just two hours. Yet the mail server involved is too old and poorly resourced for the environment and the investment in a better solution has been avoided despite advice to the contrary.

We can all cope with a short outage of a few hours even if we hate it, but very few businesses will survive a loss of critical data or a long-term outage while data is recovered manually.

It is important to contemplate a complex array of issues when it comes to systems downtime and recovery. First of all, do we have the kind of redundancy required for our central systems? To put it in plain English, if something breaks will the system keep going?

A well-designed server environment will protect us from many issues. Dual power supplies on different fuses at the switch board can protect from simple power problems. RAID arrays for hard drives can protect from a simple hard drive failure.

We all know by now that all hard drives fail it is just a question of when. A secondary internet connection configured for fail over can prevent loss of connection to the internet, this could even be a fail over to a 3G or 4G wireless network.

Virtual servers accessing virtual storage with multiple physical servers starts to get more complicated, but is still cost-effective for mid-sized firms. Of course cloud systems can remove the issue of hardware failure all together for many systems. We then leave backup and recovery to the providers of the cloud and hope they got it right.

However, if we still have internal systems, which let’s face it most companies still do and will for a while yet, there is the issue of recovery if the systems do fail.

Recovery has two aspects: How long will it take to restore systems and recover data; and how old will the restored data be. Also expressed as how long will it take to get back online and how much will we have lost in terms of revenue and data. The techs refer to recovery time objective(RTO) and recovery point objective (RPO).

Finding the right balance between the cost of the systems in place and the value of stability and speed of recovery is always delicate as there is no return on investment on these systems until a problem is avoided or recovery is required. It is basically a data insurance plan with limited guarantees.

There have been plenty of studies done around the world on how well businesses understand the risks and have the bases covered. The bad news is that Australian businesses are well behind the US, UK and Europe on this risk mitigation front. Considering our risks with bush fires and floods across Australia this, in my opinion, is not well judged but reflects our risk taking nature as a relatively young nation.

Today there are many alternatives to backup and recovery offering almost instant recovery via a combination of local hardware and cloud servers, so even if your servers are in your office your backup and recovery can be off site. This will protect you from fire and flood. Solutions range from single PC solutions to multi server recovery systems with snapshot technology that lets you keep a constant backup of your data ensuring any system failure leads to minor data loss and disruption.

Only a few years ago these solutions were unaffordable to most businesses, but improved data services at better prices and reduced storage costs, aligned with cheaper and better software, have combined to make these solutions affordable and sensible.

If your business needs to be online most or all of the time, please seek advice on the newest range of smart solutions relevant to your scale of business. Today, more than ever, the cost of protection is significantly lower than the cost of downtime or data loss.

From sole trader to Westpac Bank there are solutions available to ensure you are less likely to be embarrassed or put out of business by an IT systems failure. The best part is that if they are set up well they will be remotely managed and not require any input from you or your staff.

David Markus is the founder of Combo – the IT services company that is known for solving business problems with IT. How can we help?

COMMENTS