Wiring Communication Between Microservices
Choosing a mean to connect microservices is never an easy task, many factors are taken into account before resorting to an option. If you are building a production-ready system, I guess the principle of weighing all factors hold true. Yes, I know this doesn’t apply to visionaries :)
In this article, I will run through some common communication means, briefly describe the background of our project, and my arguments on choosing RPC over the remaining options.
Before deciding on how we should wire our microservices, we have to understand two concepts:
- Architectural Style
- Transport Protocol
Think about: How the payload is formed when consuming a service? Is it stateless or stateful? Should we use REST, SOAP, JSON, XML, or some other messaging formats?
Think about: Which transport protocol should we use? Should we call a remote service over HTTP, HTTP2, a message bus, TCP socket or even UDP?
Popular Communication Means
Let us look at some relatively popular options available:
- REST over HTTP(S)
- Messaging over Message Broker
- RPC (cross-language or single-language)
REST over HTTP(S)
Even since RESTful architecture style was proposed by Roy Fielding, we’ve been seeing a huge wave of adoption especially in web application development. The constraints proposed by Fielding despite not being a standard, shall always be adhered before declaring our API as RESTful.
There are variety of REST over HTTP(S), since there is no standard to be enforced. Developers are free to choose forming a request payload in JSON, XML or some self-defined format.
REST over HTTP(S) simply means using REST architectural style and send requests over HTTP(S).
Messaging over Message Broker
This basically works by connecting microservices to a centralize message bus and all communications between services are done by sending messages through the backbone.
Eg: Nameko in Python
RPC (cross-language or single-language)
Remote Procedural Call is not a new thing in distributive systems, it works by executing functions/methods/procedures on another device over the network.
According to the standard of RPC, RPC 5531:
- RPC should be transport protocol agnostic: TCP, UDP, egal! Thus, reliability is not guatanteed.
- Transaction ID is used to insure execute-at-most-once semactics and to allow client application to match replies to calls.
- Time-outs and reconnection required to handle server crash, even if a connection-oriented protocol (TCP) is used.
- Does not specify binding of services and clients, up to implementer to decide.
- Mandatory requirements for RPC implementation: (1) Unique specification of a procedure to be called. (2) Provisions for matching response messages to request messages. (3) Provisions for authenticating the caller to service and vice-versa
Background of The Project
In the organization we work for, we have a monolithic web application (written in Django) with acceptable performance. There are some services can be decoupled as separate services. I was taking the initiative of transforming our system architecture into microservice architecture in a gradual approach. One of the important aspects is to decide communication mean(s).
Why I chose RPC over the other two popular ones?
Take a look at RPC before ruling them out. I’ve read articles and comments advocating a replacement of RPC with REST. Some argued RPC is a stone-age technology, some said RPC is simple not easy to use. My stance is neutral, as the choice depends on individual use-case.
These are our main requirements:
- No single-point-of-failure -> This ruled messaging queue out
- Errors are propagated back to caller/client/consumer
- Service interface which provides native experience
Since errors propagation to callers is important to us, RPC is a good candidate as many RPC frameworks return any exception raised in server function back to RPC function caller.
Most RPC frameworks eliminate the need of message broker, thus, single-point-of-failure is avoided.
Most RPC frameworks allow remote procedural call like:
import my_remote_functiontry: my_remote_function.validate_user(my_user) except ValueError as e: logging.error(e.message)
Given my scenario, what’s the better option than RPC?
Categories of RPC Frameworks
Within my course of exploring different RPC frameworks. I roughly categorize them into:
- Monolingual RPC frameworks
- Cross-language RPC frameworks
Monolingual framework, well, supports only a single programming language. A good candidate of such category in Python is RPyC. RPyC comes with easy-to-use pretty standard RPC features, and uses TCP as its transport protocol. The pros of using RPyC (monolingual framework) is the absence of need to write a separate service interface. The downside is insufficient support of different Python version, of course, missing support of cross-language as its name suggest.
On the other hand, cross-language RPC frameworks supports multiple programming languages with a great cost. gRPC is one of the frameworks I uses. gRPC supported by Google, comes with wide coverage of programming languages from C++, Ruby, Python to Dart. In order to support multiple programming languages, a common service contract has to be defined. It is usually a protocol buffer (.proto) file. A service contract defines functions with arguments provided by the servers and to be consumed by clients, as well as the message format to be transported. In the case of gRPC, a protocol buffer file is then compiled into language-specific file (Eg: .py file in Python), this creates a problem when you started to have multiple versions of service contracts. It makes us difficult to keep track of different versions of client stubs and service functions.
To wrap up, communication mediums are chosen from use-case to use-case, these are basically my humble considerations on selecting a communication medium between microservices. Feel free to suggest any improvement of my choice.
First published on 2018-09-09
Republished on Hackernoon
Full Stack Dev at Agentdesks | Ex Hashnode | Ex Shippable | Ex Altair Engineering
Did you have a chance to explore the AMQP protocol? Melvin Koh
No single-point-of-failure -> This ruled messaging queue out Why are queues a single point of failure? You can set up automatic retries with a broker like RabbitMQ. Errors are propagated back to caller/client/consumer You can use acknowledgments for this using the brokers as well.
Melvin Koh depends on the definition of message queue. Usually it's a queue so a FiFo structure that contains 'messages' as some kind of Buffer/Cache/DB depending on the persistence definition. the level of distribution, isolation, sharding, p2p or other implementations are up to the implementing party and restricted by the protocols.
In other word you can basically have a local p2p torrent like system where you replicate your message/queues to at least N/2 shards. Just top of my head not even considering the other complications like quorums, source of truth. Also the question is about what you can communicate ... anyhow I am pretty sure you considered all this stuff as well :P just wanted to sound smart in my lunch break ;)
Message queue is a single point of failure I would really say the opposite. When you plan for production, your message queue will need to support somne sort of clustering for dealing with failure. For instance (because AMQP was mentioned) RabbitMQ offers tons of knobs to control your durability/replication vs performance, and it’s one of the most stable pieces of infrastructure I ever had to maintain in production. But others solutions (MQTT, Kafka, ...) are there really to give you a durable and resilient central bus.
In my opinion though, I would consider messages bugs (or distributed logs) in a separate category than RPC or transaction-based things like REST. Message queues are really interesting when you want to decouple the load or parallelize task (you can build really awesome topologies with Rabbit or Kafk), to make the public-facing (with many definitions of public) endpoints as lightweight as possible, and smooth out and parallelize the work at the other end with workers. What I mean is that both have strengths, and should be used depending on the context. And as your platform grows, it might make sense at some point to use more than one paradigm.