Today, our products are deeply dependent on third-party integrations to run successfully. Common integrations that we have come across in building products include payment gateways, communication APIs, CRMs, client integrations and banking APIs.
Most common reasons for issues with integrations
Poor Provisioning by Service Provider.
While multi-tenant architectures with distributed load are the recommended way to build API products, very often we end up with issues in reaching service providers because of a lack of efficient provisioning. Peak traffic apart, sometimes even during a lean period, we have had vendor services going down due to a minor increase in the request volumesLatency breaches → Timeouts
To protect existing workflows and thread pools from choking, we have pre-defined thresholds with service providers for APIs. When these latencies are breached, it leads to timeouts of requests at our endUnhandled error codes
A new error code, generated due to an edge case or otherwise, which is not managed during exception handling can impact our workflowsIncorrect response
Changes in third-party’s APIs can lead to unexpected request responses and bodies, causing downstream APIs to reject the response, throw errors or respond unexpectedlyWebhook Deactivations
Modern integrations depend on webhooks to receive data from third-party. We have seen instances where webhooks have been deactivated at the service provider’s end without appropriate notifications
Remediation Strategies
Monitoring API latencies, throughput and error rates
Multiple integrations in case of critical services - As you read above, a service provider’s APIs can falter due to multiple reasons. In our attempts to normalise dependency, multiple integrations should be set in place for critical services
Setting up fallback options in case of failures - Setup circuits with a default fallback option in case of failure
In case of requests consistently exceeding latency thresholds, dropping the requests would be critical to avoid queuing
Enable webhook monitoring and set up alerts for the sudden dip in transaction frequency
Store API response bodies in logs or databases to be able to retrieve and identify issues in data in case of debugging needs
Introducing Dr Droid: Monitor third-party integrations
Outsiders help you monitor 3rd party integrations - both performance and context monitoring.****
Context Monitoring
Change in response body format or variables
Data variation / sudden change for a specific parameter
A mismatch between the status code and API response body
Mapping between API call and webhook response
Performance Monitoring:
Error rates and error codes
API latency
Ack to webhook delay
Top comments (0)