Businesses are relying on technology more and more every day, which is great when business is as usual. However, what happens when your business operations cease to a grinding halt due to an event? If your leaders planned accordingly, your operations should resume, maybe with some extra steps and annoyances, but you will still be business as usual. At the end of the day, your infrastructure’s functionality should be operating like it would prior to the event, but this can’t happen without proper planning and training.
When beginning the planning phase, you need to gather your leaders and begin classifying what is considered a business interruption, then delegate responsibilities in how to handle them. This involves figuring out what functions are critical to the business and who manages them. When IT resources become involved in those functions, you need to begin assigning criticality to operations and their corresponding infrastructure needs. Assume the following process of thought, in a nutshell, when beginning the talks:
- Who is managing the recovery effort?
- What IT hardware and applications does the business utilize for each department? (Define critical operations.)
- When will the resources need to be functional?
- Where will new resources be deployed, in what situations?
- How will those resources be recovered so that business can resume?
Going off this list, you can already begin building a recovery plan for certain disasters and events, albeit basic to begin with, but a plan none the less. Where the specifics start coming into play is when you review your overall environment and begin figuring out where those business functions fit into your environment, and how the criticality of those functions can be taken into consideration. If starting from scratch, I have always suggested doing up a network diagram, to begin with – they are easy to construct and read, but provide important information about how your infrastructure resources communicate with each other.
From here, you can then begin tackling each of the general events that would be business impacting. Power loss, internet outage, and faulty hardware are all very valid risks in the environment that can bring business to its knees but can be mitigated if planned for. Remember the questions I posed earlier? Start filling out how each business critical component answers those questions with the diagram you created. This might start bringing up more scenarios and questions on resiliency, which is great, but don’t get caught in the “what if” loop of unreasonable scenarios. You can only plan for so much reasonably, and other factors might necessitate favoring one function over another.
At the end of the day, a solution is only as good as the implementation and the users that will be operating the recovery process. Planning and practice are the two key elements here – without covering both extensively, you will have an incomplete plan and faulty recovery methods, which could ultimately leave a business in disrepair. Be sure to also regularly practice and update your infrastructure’s needs. The plan is only as good as what was put into it at the time – you don’t want “Bill” to be responsible for rebuilding an undocumented server if “Bill” hasn’t worked for the company for a few years since the plan was created. “Bill” isn’t going to help you, at least not for cheap.