In this text I intend to discuss some practical aspects related to the “multiple 9” percentages that are advertised by vendors regarding their reliability. Oh, and how to achieve 100% uptime (NOT).
Definitions
SLA stands for Service Level Agreement and it is a binding contract between the vendor and the customer. It is usually expressed as the percentage of the time reference window (e.g. a year) when the Service should be functioning normally, delivering its desired output.
The Uptime of the Service represents the numeric portion of the agreement above, expressed either as a percentage or by using time units.
Note: the uptime/downtime definitions above do not completely apply with Services provided through different infrastructure sets, e.g. to different geographical regions, from different data centers. A downtime in one geographical region does not mean that the Service is unavailable to just every customer out there so a different calculation method must be figured out. A solution may be to estimate the number of requests not served during the downtime by looking historical data up and then do the Service availability estimates from that particular numeric figure.
How many 9s?
Starting from today I am one of the AWS CSA(A) certified professionals. The license number is generated sequentially so it is easy to infer that I am the the 16.891st person on this planet holding the title, but given the 2 year recertification cycle I assume that many of those who were certified before may have not renewed their certification status.
Starting on the path
I have registered my own private AWS account in the second part of 2014, around the time when I was assigned to the project I have talked about in the previous text. I did not make much of that account and still do not use it for more than cloud backups; without a professional motivator, getting on this path will not truly bring anyone very far.
The game changer was the DevOps work I have started doing for a customer of my employer at the time: they had a fully configured AWS environment I was given access to. The year Amazon suggests you to spend in a professional environment before trying to get certified is by no means a spurious requirement; there is an entire ecosystem that needs to be mastered in order to make the best use of it.
Preparation
The certification adventure started about a month ago by reading this Reddit post. I didn’t get the free full month of study on offer due to the timezone difference but the seed got planted. What happened next?