This Week In Work (3 Part Series)
A week of
frustrations challenges in my day job have lead me to keep a journal of them, to give me time to reflect on what's going on. This is the first of (m)any entries.
Some context to my current professional role:
Our teams revolve around a web application and some supporting services, such as product creation and the storage of customers' assets. The service, which I refer to as a whole, started life over ten years ago and still does essentially the same thing today. The "complications" have come about from providing various flexibilities through abstraction layers, as clients have come and gone, and the business has "grown up" quickly over the latter half of that decade.
I've been involved for the latest three, an in this year I've moved from mid-level to senior developer within the team. While most of our work is web-orientated, we don't like to think of ourselves as "web development" strictly, rather "product development/engineering" more generally.
We created an operations support team last year, essentially to act as 2nd line to the 1st line customer support. The ops team is made up of developers, infrastructure engineers, support staff and their team managers, on a weekly rotation. We are much more aware of and investing much more time in addressing production issues. It also aims to improve our tooling, but we've been too busy fixing broken things in the meantime.
Here's what I've been paid to do this week.
My first task this week was to finish moving our application to a new version of an API. This has lasted for literally months -- the service does one thing and does it well, so few people have worked on it -- as the API provider required much more than initially documented, to the point that development, testing, and deployment was 100% successful, and that lasted a week before it 100% broke without warning. The months were spent catching up.
We didn't notice the problem directly points to a lack of monitoring/reporting; we have since expanded our monitoring, which thankfully reports no other issues.
For our API issues, you can't manage what you don't measure. Problems will surface that no-one has seen or predicted before. This feels difficult when working with an existing system, but the reality is that it's just progressive. Regular reviews will show the improvements.
Today, I've not progressed the "active" issues I am working on. Instead, I've been helping my colleagues with their tasks and other issues, writing very reflective things like this very thing, and various project discussions.
Both this team's leaders take a set time in the day for code review, stand-up, and everything else that is supposed to happen alongside everything else. It might sound obvious, but it is a deliberate act. Context switching is a productivity killer, so don't do it; it only happens as a last resort.
Half an hour until a conference call, then home-time to my family.
The two words, "job role," mean to me the what you do, and how you do it. Today, my job has been to be the "senior" instead of a "developer" and help my team get better together. It's still been a good day. (Sorry if you were waiting on that issue.)
... some of which turn out to be wrong. We trust our commercial partner to tell us the product range, and all the messy details we don't want to know. The ingestion of that data is independent of the usage, and that's great for reliability but not so great for being able to directly tell what's caused a problem in another part of the system. You need to know what you're looking at and what you're looking for; the two are likely to be different.
I don't think the problems we have on a day-to-day basis have been getting worse. I think it's just time has just exposed it to more people, by a never-changing stream of instances. They take everyone a long time to debug, mainly due (in my opinion) to everyone having to go through the same learning to understand what's happening. Improving our tooling, for our own use, and for more people to dig into how the process is designed, would help us better understand what's going on when issues arise. As I said earlier, we've been too busy fixing the fallouts to fix the problem.
We considered the process as "working" by it being in production with stable influences (i.e. good data coming in, giving good outputs) and having some people (now one person) knowing the full, end-to-end process. This is, of course, not sustainable long-term; people come and go, both inside and outside the domain, so new people must be able to quickly understand what's going on.
Improving the tools we use is an on-going effort. The company is maintained by hiring when colleagues move on, so you must maintain the tools we all use. This does seem to take away from the capacity to deliver on clients' wishes, but open communication with the clients that we are hampered by a complicated, or at least a non-intuitive, system and want to improve it, will benefit us, them and our customers.