Better to introduce myself first. I am the co-founder and CTO of Advancity, a technology company that enables the digitalization of universities and schools. We have LMS (learning management system) and virtual classroom products that we provide as cloud services as well as on-prem.
After the Covid-19 pandemic emerged last year, the demand for our services has increased substantially. Therefore, we increased our capacity in terms of servers, bandwidth, and staff.
This unplanned and unexpected expansion on every front has brought some serious burden on everyone's shoulders. It was a challenge and we accepted it. Under such pressure, so many pipes started leaking steam.
Long story short, we've come to a point that in order to cope with the increasing stress and pressure, we needed a change of mindset towards managing our systems and operations more successfully.
The following are some of the lessons that we've learnt throughout our journey during the pandemic crisis.
- Establish a decent monitoring system to continuously monitor your systems (software services, infrastructure, external services, etc.)
- Set-up a centralized logging system (or a few for different services) to collect log data from services.
- Analyze the logs and monitoring dashboards and try to find out hidden indicators that may be clues for degradation in services. This is required for being proactive.
- Take all errors and warnings very seriously and act on them. Try hard to find the root cause of every problem. Fix even the seemingly slightest errors.
- Aim for zero error. This is the ultimate goal to reach after the 4th step, which is very hard but maybe one of the most important pillars of the aforementioned mindset.
- Try to automate all the operations. Every installation that needs manual intervention, every configuration change, every update are the targets for this challenge. Try to start with small steps. Then aim for big changes. It really pays in the long run. It decreases human errors, standardizes processes, removes the amount of work to document now-unnecessary manual steps.
- Trust your people but leave room for errors. Human beings are prone to making mistakes. Accepting this fact at first is crucial and helps to change the mindset towards removing as much human intervention as possible in your operations. But you cannot automate everything. Continuously educating (mentoring, delegating important and hard tasks to get done, motivating to overcome technical challenges) the staff is vital in that sense.
The last but not the least yet perhaps the most important aspect is to create an environment where sincere and honest communication is maintained. This leads to a healthy feedback loop, efficient operations, and happy people (staff, customers, etc.).
Consequently, every organization has a different story. Yet, I believe that there are lots of overlapping parts of ours with those of others' as well. So, I hope that our findings may be of value in this regard.
Note: As a non-native English writer, this is my first English blog post. Therefore, please forgive my mistakes. :)