Introduction
Microservices are all the rage these days. I’m part of a team that is relatively new to the current trend of software architecture, and we’re trying to figure out how, and whether we even should decompose our monolithic applications into smaller components.
This means I have a lot to learn. This article is my attempt to document the journey of discovery and by writing down what I learn, this will hopefully help me understand things better. To get started, I read a good article by Martin Fowler that explains quite a lot. After reading several articles and watching a bunch of videos, I’d like to summarize the salient points that make the most sense in terms of understanding Microservices for me. Hopefully, this will benefit others.
Note that this article isn’t intended to replace or one-up any of the fantastic knowledge out there. Thousands of hours have been spent producing videos, writing amazing articles, and many books have been written. This is merely a place for me to reflect my understanding, and if this article benefits someone out there in improving themselves, so much the better.
Also, this article doesn’t cover modernization of a legacy monolithic system.
SOA Done Right
As I read through various articles, the ideas seemed really similar to the SOA architectural patterns of the last decade or so. For example, in both patterns there exist a potentially arbitrary number of services that have within them encapsulated business logic. Also it’s possible to have a microservice call another microservice, just like a web service call often gets chained to another service call.
But there are significant differences. SOA services can be very large. They are often not deployable independently of a dependent app. Also, SOA services often share databases or possibly connect to several at a time to fulfill a request.
There are other differences but the point is they’re significant and warrant a closer look.
For the remainder of the article, I’m going to focus on core design principles that seem most relevant based on what I’ve learned.
Highly Cohesive
Microservices are said to be highly cohesive. High cohesion is not a new concept and should be familiar to anyone who has experience with object oriented design. In the abstract, this represents the idea that functionality that belongs together are tightly grouped into a single module (or in this case, a service) that is responsible for carrying out those tasks. This generally results in reduced complexity, better maintainability, and therefore more reusability.
Autonomous
Microservices are autonomous if they can be independently deployed and/or updated without affecting any other part of the System they compose. They may communicate with other applications or services via an HTTP Rest interface or SOAP-based service contract. However, simplicity is greatly preferred. Using a fancy or proprietary communication mechanism between services increases dependence on some code or a library that doesn’t have anything to do with the business problem solved by that service. It needlessly adds complexity for very little benefit.
Resilient
Microservices are resilient and can tolerate failure. Failure can come from many sources, such as network connectivity, a database’s log drives running out of space or practically an infinite number of other causes. Your Microservice should continue to function in a degraded state if some operational dependency fails. This might mean that a service, upon failure detection, could return default data that still satisfies its contract, but without the business domain-specific data. For example, if a Microservice’s database is offline or otherwise unavailable to serve requests, the service could still return something hard coded but ‘valid’, even though it might not be able to service the actual request. IOW, there should be some type of response that is either degraded or defaulted.
Another point on resilience and failure. In a Microservices architecture, as well as in SOA, there can potentially be many different services that compose the larger business service or application. It’s critical that that application as a whole is still able to function, but in a degraded state, if one of those Microservices fails or otherwise becomes unavailable. This is known as Design for Failure. See Martin Fowler’s post for more.
One final point on Resilience. No discussion on this topic would be complete without mentioning the benefits of utilizing the Circuit Breaker pattern. This described in Michael Nygard’s Release It. At a high level, this allows for API calls to a given Microservice to fail gracefully in a degraded state if the internal call fails. Please read Netflix’s blog entry above, and Michael’s book as well for a thorough explanation.
Fast Failure
Microservices should also be designed to ‘fail fast’. I’ve worked in several teams where performance has been overlooked as an afterthought. However the reality is that performance, responsiveness and timeliness are critical non-functional requirements. These days, everyone expect systems to respond nearly instantaneously. If you have a Microservice, or at worst case several cascading Microservices that are slow to respond it’s simply not acceptable.
To help ensure really fast performance, we use timeouts for various external connection points, such as database connections or connections to any other external systems, such as Microservices. And we set the timeouts aggressively low to cause work to fail if it takes too long.
The idea is that if the timeout exceeds the threshold, the caller of that work, such as the upstream/calling service, considers the request a failure and responds using a degraded execution flow.
Another benefit of using timeouts is the fast detection of infrastructure performance problems, such as network latency. For example, given a cluster of identical services, if one of them consistently flips into a degraded state, it’s likely that the particular instance of that Microservice has an infrastructure issue. So the low tolerances and fast failure make it much easier to pinpoint non-code problems.
Elastic
While this concept doesn’t seem to make up part of the traditional or formal definition of Microservices, Elasticity solves important business, technical and operational requirements.
Finally, Elasticity helps ensure operational resiliency. If one instance of a containerized Microservice fails, and it’s a member of a dynamically allocated pool of load-balanced nodes for that service, then it’s reasonable to assume that other instances of the service will take over the load for the dead instance.
Small, Bounded Context
Bounded Context is a fundamental principle of Domain Driven Design. It is the idea that a domain model should be as small as possible to properly encapsulate the business domain model for that unit of work. additionally, the team responsible for that code or system, should have sole authority over a given bounded context. Similarly, in the Microservices architecture model, a given service should be limited to the data model it needs to fulfill its responsibility, and no more. This means that a data model used by a Microservice may not have all the fields/attributes for that entity across the entire company, because it is larger and therefore out of scope for that bounded context.
This architectural model, with it’s implied team organizational boundaries, creates a strong dependency on Continuous Integration coupled with effective unit and integration-level test automation. With large and multiple teams, as is typical in an Enterprise environment, it’s very difficult to keep the interfaces, or service contracts, between bounded contexts, as well as the associated services, in alignment with each other. This means that without rigorous, automated tests, the various services will drift apart from each other but nobody will know until either QA runs their tests, or at worst customers will see errors after a release. And nobody wants that, right?
Observable
A non-functional requirement of nearly all software must be that it’s behavior and state must be observable. Microservices certainly fall into this category. In addition to monitoring the applications’ health, it is also critical to monitor the health of any hosts/servers.
In general, this can be accomplished through logging to a central repository. Many organizations use a centralized event collection, aggregation and reporting tool such as Splunk. I love Splunk. It also provides a robust and powerful feature set for important things such as trends, high level reports and the ability to drill down to see necessary details. Splunk also has a powerful and expressive search feature that can be used to define alerts as well as reports/dashboards. Finally, the historical record retention and reporting that Splunk offers is a great way to look in the past in case of an audit or if an incident otherwise needs to be revisited.
Notes on Logging
Common attributes for logging should be used, such as level, date/time, hostname and service name. You get most of these using one of the standard logging libraries, such as SLF4J or Log4Net.
Some key events that are generally good to log are app/service startup, shutdown/crash (if possible), timeouts, handled and unhandled exceptions, requests, responses and decisions. You may want to place some of these routine logging statements into higher, less frequently used logging levels, such as DEBUG, ERROR or ALL. The reason is that you may not always need these data and the logs/reports can get very noisy if this information is constantly captured but infrequently used.
One really important piece of data to log is a CorrelationID as described by Yan Cui here. This allows us to track distributed transactions across multiple Microservices, especially given the tendency to use asynchronous requests. The idea is that the initiating service, or possibly even the caller, generates a new GUID that is passed to each subsequent step and/or service in the entire application. Therefore, each service that processes a transaction does so in the context of the passed-in correlationID. It then becomes possible to trace an individual transaction through the entire system, irrespective of the degree of distribution or the number of asynchronous steps.
Communication
We strongly prefer HTTP/REST APIs to RPC or SOAP. With REST, there is a natural decoupling since all communication can be done with the standard, open HTTP verbs for CRUD operations.
Another technique for communication is to use HATEOAS. I’ll leave it to the reader to learn more, but in summary this specifies that the implementing Microservice returns a set of links associated with the requested action, rather than the entire object. For example if a request was made to an Account Microservice to create a new account, the Account service can return a link to that new account, rather than returning a JSON representation of the account object. This makes the communication between services much less chatty and prevents unnecessary network traffic.
We also strongly favor asynchronous communication for Microservices. We accomplish this by injecting a message bus as an intermediary layer between services. This makes a caller of a service more like a message publisher, and a recipient of a call more like a subscriber. This explicit decoupling allows for more resiliency of the entire system by ensuring functionality even when one of the subscribers (or recipients) are unavailable for message processing.
Containerization and MicroServices
Microservices are excellent candidates for containerization. Containers are sort of like VMs, but they don’t have the full operating system that a VM typically has. Instead, they have only what’s necessary to run a given application. Because of this, containers use significantly less VM host resources than a full VM, which also speeds up performance in general and boot time for new nodes.
More traditional deployment models have a large number of services live on a single VM. These service are typically assigned a particular port, and each service/port make up a node in a pool of a load balanced VIP.
In containerization, there’s a 1:1 relationship between a container and a single app. This means that if you need to have 3 instances of a Microservice running, you would deploy 3 containers that host that one application. This obviously makes horizontal scale easier since you only need to add more containers hosting a given Microservice to a cluster instead of spinning up new VMs and deploying apps to the new VMs.
Registration and Discovery
Another key difference between the more traditional SOA architectures and MicroServices is in the way a new service gets ‘put into action’ after it’s deployed.
I’m used to something similar to the following when rolling out a new version or an N+1 instance of a Web Service: Deploy the new service, then add that new instance to a pool of nodes already known by a load balancer, such as an F5, so the new instance can start taking traffic and servicing client requests.
With Microservices, we add a new infrastructure component known as a registry database. In this model, the services register themselves into the database, but also make calls into that database to know what other services are and how to talk to them. This is usually accomplished by either DNS entries or HTTP Rest API calls. This is generally called client-side service discovery.
Another really neat concept around this is de-registration. This happens when an instance of a Microservice breaks, times out excessively or otherwise becomes unavailable. By de-registering from the ‘set’ of known working live nodes, the broken instance of the service is taken out of rotation. This is very similar to an F5 monitor that detects a failure of a node and takes that node out of rotation. The difference is that there is not separate load balancer per-SE – this all happens in the container cluster and service registry. Incidentally, this is can be described as ‘self healing’. One popular such tool is Consul by HachiCorp.
Another huge advantage of containerization and self discovery is that it becomes possible and pretty easy to horizontally auto-scale an application based on demand. The two platforms that I’m evaluating, Kubernetes and CloudFoundry, both provide native support for this.
API Gateway
The last concept I’m going to cover in this article is the API Gateway. By comparison with the more traditional architectures such as SOA or Monolithic applications, this is similar to a load balancer sitting between public internet traffic and the pool of nodes hosting applications or services. The API Gateway basically acts as a facade, or abstraction layer for one or more Microservices. This provides a single point of entry for a client, such as a mobile application, so the app doesn’t need to know any details of the internal network/routing structure of the Microservices architecture.
Additionally, the API Gateway is a good candidate to add a caching layer for static content to increase performance of callers. In Kubernetes’ nomenclature, this is referred to as Ingress, and load balancers also come into play.
Next Steps
I hope this article helps someone learn about Microservices. Please leave any feedback in the comments and let me know if there’s anything else you’d like to see covered. Cheers!
Leave a Reply