TL;DR notes from articles I read today.
For functions that are highly cohesive, organized into the same repository, share code via a module inside the repository. To share code between functions across service boundaries in general, you can use shared libraries (perhaps published as private NPM packages only available to your team) or encapsulate the business logic into a service. To choose, consider:
- Visibility: Dependency is explicitly declared in a library but often not declared in a service, so you need logging or explicit tracing.
- Deployment: With a shared library, you rely on consumers to update when you publish a new version. With a service, you decide when to deploy and can control deployment better.
- Versioning: There will be times when multiple versions of the library are active. With services, you control when and how to run multiple versions.
- Backward compatibility: with a shared library, you communicate compatibility with semantic versioning (a major update signals a breaking change). With a service, it’s your choice.
- Isolation: you expose more of the internal workings with a shared library. With a service, you exercise more control.
- Failure: When a library fails, you know your code has failed and stack traces show what’s wrong. With a service, it may be an actual failure or a timeout (the consumer cannot distinguish between the service being down and being slow), which can be a problem if the action is not idempotent, and partial failures require elaborate rollbacks.
- Latency: You get significantly higher network latency with a service.
Full post here, 9 mins read
- Identify weaknesses before they manifest in system-wide aberrant behaviours: improper fallback settings when a service is unavailable, retry storms from poorly tuned timeouts, outages when a downstream dependency gets too much traffic, cascading failures, etc.
- Lambda functions have specific vulnerabilities. There are many more functions than services, and you need to harden boundaries around every function and not just the services. There are more intermediary services with their own failure modes (Kinesis, SNS, API Gateway) and more configurations to get right (timeout, IAM permissions).
- Apply stricter timeout settings for intermediate services than those at the edge.
- Check for missing error handling that allows exceptions from downstream services to escape.
- Check for missing fallbacks when a downstream service is unavailable or experiences an outage.
- Monitor metrics carefully, especially client-side, which shows how user experience is affected.
- Design controlled experiments to probe the limits of your system.
Full post here, 4 mins read