Edward Huang

Posted on Sep 28, 2023 • Originally published at pathtosenior.substack.com on Sep 28, 2023

How to Design A Resilient APIs That Prevent Site Incident

#softwareengineer #api #softwaredevelopment #programming

Mistakes in software development can be costly, especially as you delve into critical areas like payments and manufacturing. As the reliance on your software grows, so does the impact of any errors. APIs, or Application Programming Interfaces, are pivotal in this scenario. They are like contracts that profoundly affect both internal and external teams.

In this article, I want to explore why APIs are the cornerstone of a well-designed system, focusing on their impact, best practices, and how to accommodate experimental changes.

Why is API the most important step in defining a good system?

High Cost of Mistake

What does it cost you to fix a mistake if you make it?

As you move down into the software that people rely on (such as payments manufacturing), the cost and impact to the customer will be magnitude. The more people use your software, the more they need to be relied on. API is a contract that will heavily impact other teams, either internal or external.

Considering that most people rely on your API, having a comprehensive test matrix is crucial.

Before updating the API, the changes should be approved by a designated authority figure. Additionally, all users must be notified and instructed to adjust their system according to the updated API. Any mistakes in the API can lead to a significant cost escalation, increasing exponentially.

The API represents your contract with your clients and plays a crucial role in the design of your system. If there is a need to modify the API contract, it may require creating a migration or even redesigning the entire system, affecting both your and the client's systems. Hence, the API contract is fundamental to building a robust and efficient system.

During the payment platform migration process, we faced the challenge of making it compatible with all verticals. We realized that the foremost aspect that needed attention was the API interface. Your team could make all the services underneath the API flexible as our team owned them. However, making changes to the API contract required significant coordination with other teams, which consumed a lot of resources from multiple teams who interacted with the API.

Therefore, designing API requires you to slow down instead of fast iteration because the cost of mistakes will impact people using your software in magnitude.

Bad API Design causes more Bugs and Additional Coordination.

Have you ever encountered situations when trying to integrate your system with an external API, and you need to message their team or solution engineer to ask clarifying questions?

I encountered that once - integrating a system with bad API design requires a longer development time for engineers. Some engineers may not have the time or patience to clarify questions and make assumptions about API behavior. This leads to bugs occurring during system integration and SI because of misunderstanding the API.

Last week, I experienced issues with SI due to poor API design. An upstream service developer omitted a crucial attribute to distinguish between push and email notifications. This resulted in a discrepancy in the number of notifications sent. After spending several hours trying to debug the problem, we realized that an attribute marked as optional, the "scheduled hour," was crucial. It represents the pub-sub topic to publish to in the API.

Good API Design Increases Discoverability

Existing business services and capabilities are not easily discovered, which leads to duplication. It should be easy for anyone in the company to browse a catalog and capabilities and quickly understand how to consume them.

You can break down complexity into manageable pieces that expose simple, well-designed, and loosely coupled interfaces that other teams can reuse without requiring lengthy discussions and negotiations. API design is critical - make it too configurable and complex, and it loses its simplicity and ease of adoption. If you make it too basic, it will lose its value.

Good API Design Best Practices

A lot of us are trying to design API with Convenience-driven Development. That means each layer exposed most of its data to each other layers to avoid DRY. This is usually a sign of coupling. Bloated models bleed through the stack and pass on data that is not relevant to all clients - which increases the complexity and unnecessary dependencies and results in higher development time and bottleneck.

Resource Oriented Design

This is a very popular way to implement REST API, given that it lends itself well to the semantics of the HTTP protocols.

First, identify what resources are offered through your API (orders, catalogs, payments, notificationHistory, etc.).

Then, identify the actions that can affect them (GET, LIST, Create, Update, Delete… etc.)

Then, identify the expectation of the consumer of the APIs.

To effectively utilize your APIs, what requirements or prerequisites must they have?
Do they need all the data you have access to?
Should they be exposed to a piece of business logic related to your data, or should that be abstracted from them?

For instance, design a service where clients can retrieve their notifications based on their notifications. First, the resource that is being offered through your API is notification. Second, identify the actions that can affect them. In this example, it will be a GET request. Lastly, what data do you want your client to receive? In this example, let's say we want to receive the notification channel, the timestamp, and the description.

To combine the above example, we can create an endpoint:

GET yourExample.api.com/notifications/{notificationid}

Did you notice that the notification is plural because you want to indicate that it is a collection of notifications, and we want to retrieve notificationId within that collection?

Only Pass In or Return What is Necessary

We designed API in such a way to return all the information that is available to us so that consumers would have all the data IF they needed to expand on it. This design leads to poor data ownership and a higher risk of bugs.

Lease privilege is a principle of information security that only allows an actor the right set of permissions to accomplish the task at hand. This also applies to the data the actor has access to.

You shouldn't design the data of your response with the mindset of "just in case that attribute might be used in the future."

For every element in an API model, you should ask yourself: "For what purpose is this here?" It shouldn't be included if you do not know the answer to the question.

Provide data without anticipating all possible use cases. You might find that your API usage evolves in unexpected ways, leading to coupling your API with flows that it wasn't initially designed for. This can make maintaining your API more complex because you must support expected and unexpected flows.

Rely on explicit definition instead of implicit definition. I often in API designed like this:

getNotification(notificationId: String, categoryA:Option[String], categoryB: Option[String])

// REST Version

yourExample.api.com/notifications/2?categoryA=Email // for categoryA

yourExample.api.com/notifications/2?categoryB=Push // for categoryA

// GRPC Version

message Notification {

 string notification_id = 1;

 com.google.StringValue category_a = 2;

 com.google.StringValue category_a = 2;

}

The design should be a union and treat multiple.

The best way to create a drastic API change is when migrating your service from V1 to V2.

When migrating to a new payment architecture, I must evaluate each input and output and ask the business if this attribute is needed. It requires multiple coordination with other verticals and product managers to understand the purpose of the attribute. As we go through the entire business definition of the attribute, the design becomes more and more prominent. We knew how to redesign our new payment API - does it require a protobuf one of over here because this serves more like a union rather than a bunch of optional values? We can remove this attribute because it is never used anymore...etc.

If you cannot spell out the impact of the input parameter on the behavior or the output of the service, then it should not be part of the API. If it is part of the API, then the impact of altering the value of this parameter should be extra clear.

Stay Functional

Similar to returning what is necessary, you should only return what is necessary to accomplish the function of your API.

Suppose you have data that will be used downstream of that client, for instance, analytics or tracking. In that case, you don't want to pass those tracking data around the flow of your application because it will increase the bandwidth of your request. It may contain sensitive information that may be exposed to security vulnerabilities.

Instead, you should find an alternate route for the data to travel to data stores. One tip is to store the value in the right data store and pass that data source identifier through the return value.

Avoid Shared Model

The shared model is a practice that involves using a common model between different services. The main goal of this practice is to achieve the "Don't Repeat Yourself" (DRY) principle. Although it may save time creating APIs initially, it can create technical debt as APIs evolve. Since no API is ever truly finished and will always change, shared models lead to more maintenance and updates, which could increase tech debt.

If you use shared complex models between APIs, these models will change, creating a rippling effect in every service that uses these shared models in their own service definition.

Changing the enum type in one service model in a shared model architecture causes company-wide SI because an engineer forgot to bump all dependency services that are directly used or extended to that API.

To avoid high coupling and low cohesion, shared models should only use simple atomic data types that are unlikely to change. However, even with atomic data types, modifying the data type can still have a significant impact, so caution is advised.

Strive for Thin Client Over Thick Client

Thin clients only serve the information returned from service most minimally, without performing any major transformations or business logic. On the other hand, thick clients can perform some transformations and business logic to turn the service response into something more useful.

In a thin client, most of the processing and business logic is handled by the backend/server, while the user interface on the client end only displays the data to the user. In contrast, in a thick client, business logic is processed on the components or user interface on the client end, and the client system performs more heavy lifting compared to the server.

Thick clients assume the tech stack of the consumer and are tightly coupled to it. Therefore, implementing a thick client is less flexible and increases coupling.

You should design your service so that if the consumer of your service completely changes its technology stack or ways of implementation, it can still use your API.

How To Experimental Changes to the API

Making APIs that stakeholders can experiment with without accumulating legacy parameters and endpoints. Many APIs happen when you need quick feature experimentation or prototype creation.

One idea is to use maps of experimental data that offer no real type of safety or guarantees. Their needs can be formalized once the experimental features require graduation to the core API. Conversely, unsuccessful experiments leave no trace.

Making such a workaround creates some advantages by relieving core teams of some of the review/approval burden and making it easier to get un-pluggability of features when it comes time to graduate.

This experimental idea should apply to persistent data as well.

Recap

APIs are the backbone of a well-designed system, affecting everything from cost and coordination to discoverability and flexibility. By adhering to best practices and accommodating experimental changes, you can ensure that your APIs remain robust and adaptable in an ever-evolving technological landscape.

Thanks for reading Path To Senior! Subscribe for free to receive new posts and support my work.

💡 Want more actionable advice about Software engineering?

I’m Edward. I started writing as a Software Engineer at Disney Streaming Service, trying to document my learnings as I step into a Senior role. I write about functional programming, Scala, distributed systems, and careers-development.

Subscribe to the FREE newsletter to get actionable advice every week and topics about Scala, Functional Programming, and Distributed Systems: https://pathtosenior.substack.com/

DEV Community