When all you have is a handler... • busstop.dev

Everything needs to be a message!

I have worked on a software stack that has heavy use of async messaging (using NServiceBus). If you asked the devs, they would probably describe it as a heavily distributed, event driven system, and I agree with that.. but not in a positive way. The usage of NSB/messaging is very overdone, and I personally think that the second most impactful decision mistake was the failure to establish a standard for this.¹

I understand the appeal. These frameworks are genuinely powerful, and when you first experience the simplicity and elegance of decoupled message handlers, it is tempting to reach for them everywhere. But I think async messaging is a specific tool for a specific class of problems.

The key feature of async messaging

I think if you strip out everything, at its core, async messaging is characterized by having a broker. This has the following positive effects:

temporal decoupling - sender and receiver don’t have to be online at the same time to communicate
location transparency - senders don’t need to know where their receivers are or how many there are; they address a queue or topic rather than a specific endpoint.

The key tradeoff for the temporal decoupling specifically is that all operations are fire and forget, it is not possible to get feedback in a synchronous way. There are ways to design around this of course, but that is not the point.

The next impactful tradeoff is that the code is genuinely harder to read from a macro level and the stack trace is often lacking in context. This can be alleviated with some tools (distributed tracing for example), but still worth mentioning.

Where it genuinely shines

I think that the clearest and most defensible use case is in domain boundaries, things like event publishing or subscription, sending commands to other domains for it to do something. The coupling via events in particular are almost always fire-and-forget, precisely because we want both temporal decoupling and location transparency. A statement that something has happened (“an SMS was sent”) does not require us to know who cares and whether they are currently listening.

Making another domain do something via commands can be fire-and-forget, depending heavily on the context. Sometimes we need feedback immediately, but other times, the broker acknowledgement is enough. This eventual consistency is a powerful tool, when used in the right context.

Where it creates problems

Anything that requires a response

This one should be obvious, but I have seen examples where people try desperately to make everything into a handler, and they use complex mechanisms to get back a response. Async messaging is in direct opposition to any use case that requires immediate feedback. UI interactions, external API calls that block on a result, operations where the user is waiting — most if not all of these operations require some form of feedback, even if just a ticket ID to poll for long running operations. And the broker simply cannot give this feedback. Forcing them to use messaging requires you to bolt on a response mechanism anyway, which gives you the worst of both worlds: the complexity of messaging with the latency of waiting.

Internal domain coordination

This one is most likely controversial, but I have come to see it as almost always a sign of bad design. The argument goes: we have multiple things that need to happen, and messaging lets us coordinate them loosely and retry each one independently if something fails.

My experience is that when you actually work through this, the need for that kind of coordination usually points to something that could have been structured more simply. If multiple things need to happen, just do them. If the work is long-running or requires steps that you want to retry individually, messaging is still probably not the right tool — it is just the most available one.

Sagas

I am specifically talking about NSB sagas (which IMO are not real sagas). I think it looks good in theory, but in practice is easy to misuse because of how it is used. Sagas (from what I discovered from reading about the topic) is about maintaining data consistency across domains without resorting to distributed transactions/multi phase commits. I think an NSB saga would be good, if that’s what you use it for. But in practice, I don’t see them used for this… instead, I see it used like a job scheduler because the API it presents lean in that direction, without actually giving you all the tools you need. I think that last problem (lacking appropriate tools) is very underrated, and is a problem that might not reveal itself for a long time. This is also one of the cases of internal domain coordination (see above) and in my opinion, is almost always a code smell.

Scheduling

Very similar vein to my issue with sagas, using a message broker to schedule future work is particularly painful to work with. The typical approach — publishing a delayed message — produces something that is essentially invisible. You cannot easily query what is scheduled, modify it, cancel it, or get any kind of overview of what the system intends to do and when. Most frameworks treat scheduled messages as second-class citizens. If scheduling is a real requirement, it deserves a real tool.

Fire and forget as a default

Since cross domain talk uses a lot of async messaging, and that is often the default, it is easy to fall into the trap of just using it for all inter-domain communication. It is true that you have to be careful when synchronously communicating to prevent a waterfall, but to not consider it I think is equally bad. There are genuine cases where async messaging would not be my default, here are a few examples:

transmitting large amounts of data - using messaging will often require you to transmit the bulk of the data out of band. It is more complex, you have to worry about timing and that the API boundary can become muddy
BFF communication - anything that comes from a BFF most likely needs some feedback
external API - when exposing an API to clients, they will almost always need immediate feedback

Other tools to consider

Synchronous request/response

Good old HTTP or gRPC covers the majority of use cases that get incorrectly routed through messaging. When the caller needs a response, synchronous request/response is almost always the answer. I think the problem why people tend to avoid this is because of over compensation since http has had the same issue before.. where HTTP was used for EVERYTHING and a single request would actually cascade into a bazillion ones causing hard to debug timeout issues.

Like the other tools here, you can definitely abuse this so it is very important to be mindful when crossing boundaries. The dependency direction is especially important, but yeah, considering http instead of always defaulting to messaging can only do you good.

A job scheduler

This is an honorable mention because, not having a good job scheduler is one of the leading causes of tool abuse (citation needed lol). Jokes aside, I think its true. Its easy to reach out for that sweet saga or schedule a message, if you have no easy way to do it otherwise.

If what you actually need is to run work in the background, sequence multiple steps, prioritize work, schedule things in the future, handle recurring schedules, track the progress of a running job, or retry individual steps in a sequence — that is a job scheduler. There are tools built specifically for this. They give you visibility into what is queued, what is running, what has failed, and what is coming up. They let you inspect and modify the schedule at runtime. I think it is important to consider this too, when you are thinking about requirements like this.

The thing that frustrates me about overusing messaging is not the tool itself — it is actually awesome when applied to the right problems. What frustrates me is when it is used in the wrong context and without understanding the tradeoffs. All I’m saying is that there are other tools…

Footnotes

The first, and arguably most impactful, was blurry — and often non-existent — domain boundaries. ↩