Don’t limit yourself to pipeline tests
While tests in your pipeline are a must, you should not have the false idea that things are fine because your pipeline is green, and here are a couple of reasons why:
Complexity in Distributed Systems
You’re not alone. Nowadays, a service is rarely an isolated system with no connections to other services, be it databases, Kafka clusters, other services of your team, other team’s services, third-party services, you name it.
Imagine the following system:
You deploy a shiny new feature for your web login in the web frontend service, test it locally, create unit and integration tests, deploy, and make sure it works in production.
After some time, you or someone else (of course, always someone else 😛), wanted to provide an MFA feature for mobile apps, and, therefore, modified the account service to provide some additional context to the apps and ended up breaking the login for the web frontend. Let’s say neither account service nor mobile app is your team's responsibility. How long would it take for you to know this feature is broken? Of course, you have metrics and alarms in place, but let’s make it less obvious. Instead of breaking the feature completely, you only break it for a small subset of users, for example. Depending on your thresholds for your alarms, evaluation period, points to alarm, etc., you may take a very long time to detect it (and by very long, I consider it already 5 minutes or above). Or worse, if you don’t have alarms and metrics in place (shame on you), it could detect it only during your next build or after a couple of customer complaints.
Fail Fast Fix Faster
As mentioned in the previous section, it may take time to detect a failure, and as a consequence, even more time to detect the root cause, as you may not be able to find out so quickly when it started to happen (e.g., short log retention time, missing metrics, etc.). If you are constantly testing your system, you know exactly when something stopped working, making it easier to find the subset of changes during that timeframe that could have led to the issue.
Hidden Intermittent Failures
There are a few things that irritate me more than green peas and the act of restarting a failed build and if it works, proceeding as if everything is fine and it was just a ‘glitch’.
There is no such thing as a ‘glitch’ in mathematics and, therefore, computer science. Behind everything, there is a reason, and you should always know the reason so you do not get caught off guard in the near future. If an issue can happen, it will happen. Did you get it? Are you sure?
I’ve seen teams run buggy software for days, months, and even years without fixing intermittent failures because they seemed just randomness, and no one could explain the reason because the frequency was relatively low that no one bothered to check the root cause, and at some point in time, this issue comes and bites you, because if you don’t know the reason, you might make the same mistake again, in another scenario, service, or system that will lead to a higher impact on your business.
Chasing the reason for things to happen should be the number one goal of software engineers because only then can we learn and improve.
So, What’s My Suggestion?
Continuously Test Your Applications
By continuously, I really mean continuously, and not only during deployments. Test it at a one-minute frequency, for example, so you have enough resolution to know when things started to go bad and can also know how frequently an issue occurs. Does it always occur? Every x requests? Only during the night quiet period? All these questions can help you find the root cause faster. Also, make sure those tests alarm you in case they are not working properly.
A Possible Solution with Functions
There are a couple of companies out there that offer continuous testing services, such as Uptrends. However, if you’re looking to run some continuous integration tests, I believe you could have a much more cost-effective, simpler, and more useful solution if you build it on your own using Postman as a basis.
Postman is a great tool that has been on the market for a very long time. It is very reliable, has very good features for end users, and has enough flexibility to adapt to your needs.
More Useful
I realize occasionally that most developers are not very familiar with their APIs. By that, I mean that they often don’t have a shared collection of API calls prepared for running on demand at each stage if needed, for example.
Postman allows you to share collections of HTTP, GraphQL, gRPC, Websocket, Socket.IO, and MQTT requests and organize them into multiple environments, each with its variables (e.g., hostname, secrets, user names, etc.).
By sharing these collections with the team, everyone can quickly understand your APIs by calling them whenever needed, at any stage, for example, and, with this, integrate them into their own systems.
Simpler
Before implementing the solution mentioned in this article, I encountered integration test suites written in Java. Therefore, they had their own projects configured with Maven and had a lot of verbose and redundant code for performing and verifying HTTP calls. These projects were checked out during the build and executed for each stage. The execution also needed some spring boot bootstrap time, making the pipeline slower.
By using Postman, creating new test cases is much quicker and simpler, as it can be created in a user-friendly UI by inserting the address, adding variables as you need for each environment, adding very straightforward individual assertions per test case, and running it with a click of a button for verifying it. See some examples here.
Cost Effective
You can use Postman for free with some limitations if you like (you can share your collections with up to 3 people), and this would be enough to implement the solution I’ll describe here. However, if you want to share the collections with your team, it’s good to look at their plans and pricing.
Also, by building your infrastructure to run it, you may even be able to run these tests almost for free! The idea behind this infrastructure is to run the tests using functions through a Postman runner to run test collections exported from Postman. Lambda functions are a very affordable way of executing code for a short period of time.
Solution
As you can see in the above diagram, EventBridge schedules a lambda function to be executed periodically. This lambda function retrieves the assets exported from Postman (test collection, environment, and global variables), injects secrets from the secrets manager, executes the tests using the Newman npm package, and, in case of failures, updates metrics in CloudWatch and stores test results in the S3 bucket. An alarm is triggered if the metrics exceed a threshold (in this case, a count of 1).
The complete solution with SAM is available here.
The infrastructure is defined in the template.yaml file, and the lambda function handler with all testing logic is defined in api-testing-handler.ts.
This infrastructure can be reused for any Postman testing (HTTP, REST APIs, etc.). An example of an exported Postman collection is available here. Please notice that these files were not created manually but exported from the UI. All these files must be placed inside the S3 bucket generated by the infrastructure in the folder defined by the *TestName *parameter input during the infrastructure deployment (in this case, ‘MyService’ by default).
Also, notice that the *SecretId *secret must exist in order for the lambda function to inject any secret needed by the test collection.
Have fun playing around with it.
Top comments (0)