Want to process incoming events exactly-once?
Well, any distributed systems pedant will say you can't, because it's theoretically impossible. And technically, they're right: if you send a message and don't get an answer, you have no way of knowing whether the receiver is offline or just slow, so eventually you have no choice but to send the message again if you want it processed.
So if exactly-once processing is impossible, why do many systems, including DBOS, claim to provide it?
The trick is to leverage another property: idempotence. If you design a message receiver to be idempotent, then you can deliver a message to it multiple times and it will be fine because the duplicate deliveries have no effect. Thus, the combination of at-least-once delivery and idempotence is identical to exactly-once semantics in practice.
Under the hood, this is exactly how DBOS event receivers (like for Kafka) work. They generate a unique key from an event (for example, from a Kafka topic + partition + offset) and use it as an idempotency key for the event processing workflow. That way, even if an event is delivered multiple times, the workflow only processes it once.
Here's all the code you need to process Kafka messages exactly-once:
from dbos import DBOS, KafkaMessage
@DBOS.kafka_consumer(config, ["topic"])
@DBOS.workflow()
def test_kafka_workflow(msg: KafkaMessage):
DBOS.logger.info(f"Message received: {msg.value.decode()}")
Learn more here!
Top comments (0)