Distributed Communications Part 1: Common Issues

Intro

Nowadays most of the relatively complex and big systems have a distributed architecture. That technically means that the system consists of multiple standalone services communicating between each other in order to solve different business scenarious.

In “Distributed Communications” series I’d like to cover the topic of distributed communication, the issues it raises and the ways to minimise the impact or solve them.

Common types of communication

There are multiple ways to pass the data or commands between services. Some of them are more or less standard and some of them are hacky solutions build by some crazy minds due to either limitations of the environment or just lack of knowledge in the area.

Here I’ll list the most common ways:

Shared Database
Synchronous Communicaiton: API calls
Asynchronous Communicaiton: Messaging via Message Brokers (Apache Kafka, RabbitMQ, SQS/SNS, etc)

Lets review each of them in more details.

Shared Database

Shared database is the easiest way to separate two services or share the data between them.

Its has the following advantages:

Easy to implement, easy to understand especially for junior developers
It’s a good intermediate step during decomposition of monolitic application.

But it has its disadvantages:

Hard to troubleshoot.
If multiple applicaitons are writing/reading the same table its hard to maintain and figure out who actually do what and what causes locking issues for example.
Making change to database structure can cause dependend services to fail and require a coordinated update.
Hard to scale.
If you decide to replicate or shard your database you need to find, review and adjust all the applications which are accessing it. It may be not an easy task and you may forget/miss something moreover it’ll be newrly impossible to maintain.

So even though it has some advantages and its easy to implement, as a long running solution it usually not a good choice.

Synchronous API Calls

API Calls

One of the common ways of communication between services in distributed environment is to call each other using API calls.

Its commonly used in the cases when you need to get the result of the operation or retrieve data in a response to your request.

But it has set of issues you need to be aware and know how to mediate.

Issue	Mediation
Long running requests/Hanging requests	Define strict timeout setting for your API calls. So the service don’t spend resources waiting for the response which never comes back.
Transient Errors (408, 5xx, Network failure)	Setup a retry mechanism for your requests. For .NET for example there is a great library named Polly which allows you to define retry policies.
Service Unavailability / Overload	In such cases it make sense to stop sending requests to the struggling service for a while in order to reduce resource utilization and time spend processing thee reuest which is likely to fail. For such scenarious you can add Circuit Breaker to your application which will be determining whether to allow you hit the service or not.

Asynchronous Messaging

Messaging

Asynchronous communication takes place usually when your service does not expect immediate response or the processing takes time and its better to notify user that operation is in progress and notify when the status changes.

Examples of such actions may be: email sending or transform audio uploaded to text etc.

This way of communication removes the burden of the producer services (the one which publishes the massage) to know about who should receive this message. Its a responsibility of the consumer service (the one which listens for the events/messages) to subscribe to specific notificaton sources (topics/streams/queues).

In this way of integration you should be aware of the next concepts and issues.

Concepts/Issues	Details/Mediation
Eventual Consistency	When Service A publishes the message it does not know if Service B processed it. Moreover Service B may be down or busy handling other processes. That means that the data between Service A and B will be inconsistent for a period of time until Service B processes the message. So you should design your system to be tolerant to such temporary inconsistencies.
Message Broker Is Down	Services does not communicate between each other directly and that guarantees that the Service B unavailability does not affect Service A availability, but we still have a component which handles communication and this is Message Broker. In cases when message broker is down we will not be able to send a message. It may be mediated by retries (if unavailability is just a glitch or temporary) or by introducing transactional outbox (in cases when we notify about data change)
Structure of the message should be changed	Its easy to change the structure of the message on the producer level, but consumer may not handle this change automatically. So to mediate this you need to design your messages with the versioning in mind and implement the consumers to properly handle new versions of the message coming (or not handling actually)

Dive into the Software