Sunday, December 11

Containers & Microservices

In order to successfully host the internet scale applications organizations are seriously considering migrating to Containers and Microservices. Few companies had containers on their product roadmap in 2014. In 2015 almost every organisation was either testing or evaluating them.
With computes running in hundreds if not thousands and the need to cater to any kind of surge in user traffic organization see the solution in Containers and Microservices approach. Soon Containers and Microservices would become table stakes.

Agility

Being slow is the new dead when it comes to internet scale applications. The ability of an organization to rapidly adapt to market and technology changes is extremely important to the success of the organization.
The Microservices architecture allows one to break things into smaller services which adds to the ability to make changes in response to a bug or a new feature request much faster. These services when developed in containers makes it super easy and quick to deploy on QA or production systems.
Environment deployment times have reduced from 4-6 hours to minutes thanks to containers. Containers are created from an image. Image creation is fast and easy. The whole cycle from development to deployment is drastically reduced giving an edge to companies by releasing features at a faster pace.

Operability

Most of the organizations are scaling up quiet fast and need technologies that would be easy to operate at such a scale. The use of containers holds a promise to ease the whole task of operability as the applications scales out horizontally and elastically. Container is slowly becoming the fundamental & standard unit of deployment.

Portability

Even with VMs shipping code from development environment to production is a problem. VM images have to be converted and transferred both of which are slow operations. Unlike VMs, Containers are not restricted to a specific cloud providers.
Containers provide write once, run anywhere design that improves the portability in a big way. Containers are easy to move as container images are broken down into layers. When updating and distributing an image, only relevant layers are shipped. One can build, ship, and run any app, anywhere.

Cost Savings

With containers businesses can optimize infrastructure resources, standardize environments and reduce time to market. Running containers is less resource intensive than running VMs. Docker enables users to run their containers wherever they please either in a Cloud or on-premise. thus avoiding vendor or Cloud lock-ins. One can move their containers to any Docker host on any infrastructure from Amazon, Google, or Azure whichever is the best for their budget.

Scalability

Elastic (Scaling) is a new normal. Monolithic applications are difficult to scale let alone elastic scaling. The solution is to migrate to microservices architecture. The scaling of a particular micro service is far more easy because if a particular microservice becomes a bottleneck due to slow execution, it can run multiple instances on different machines to process data in parallel.
With the monolithic systems you would have to run a copy of the complete system on a different machine making it far more difficult. With Containers and microservices organization can elastically and horizontally scale their application dynamically in response to the spike in user traffic.

Resilient

With microservices architecture even if a particular service goes down it would not cause the crash of the whole application. Services are up front designed for failure and for the unavailability of other services.

Developer Experience

The developer experience with Docker has improved multi-fold in the recent past. Developers love the experience with Containers as it allows them to package up an application with all of the parts it needs (such as libraries), ship it and run it. The Application that runs in Developers machine works in QA and production without any hassles. A developer machine can run tons of containers which may not be possible with VMs. The same container built for development are also deployed to production.

Summary

While both Containers and microservices have existed independently for a long time they work best when put together. The benefits of microservice architectures are amplified when used in combination with containers. Container-based development are a great option for microservices, DevOps, and continuous deployment, all of which are critical to an organizations success.
Containers can share more context than VMs thus making them a better partners of microservices as it helps to decouple complexity. Docker is available on all modern Linux variants. Many IAAS providers have server images with Docker. There are multiple mature container orchestration engines like Kubernetes, Docker Swarm and Apache Mesos.

Thursday, December 8

Stream Processing At Scale : Kafka & Samza

Businesses today generate millions of events as part of their daily operations. One such example is Uber that generates thousands of events like when you open the Uber app to see how many cars are near by that is a eye ball event, your booking of a cab is an event, the uber driver accepting your request is another event and many more such events.
Unbounded un-ordered and large scale data sets have become increasingly common and come from varied sources like satellite data , scientific instruments, stock data and traffic control. Basically data that arrives continuously and is large in scale and is never ending. To summarize entire business can be represented as streams of data. Turning these streams into valuable, actionable information in real time is critical to the success of any organization.

Challenges

These requirements lead to development of applications whose primary job is to consume these never ending continuous stream of events and process them successfully in near real time. The number of events from such a business are extremely high or large scale in nature. Each of these events needs to be sent somewhere and in most cases there would be multiple applications that would like to process a single event.

Stream Processing

Stream Processing can be looked into in 2 parts. One is how the application gets its input and the way it produces the output. The aspect of getting the stream input can be owned by a message broker and the job to produce the output can be owned by the processing framework.

Message Broker

A Message Broker would need to deal with the never ending fire hose of events (100k+/sec) and hence needs to be scalable, fault-tolerant and with high throughput. The Message Broker should also support multiple subscribers as there could be more than one application that would process the messages. The message broker should also have the ability to persist the messages. Another important requirement from the message broker is performance.

Apache Kafka

Currently Apache Kafka is the unanimous choice when it comes to message broker in stream processing applications. Performance-wise, it looks like Kafka blows away the competition as it handles tens of millions of reads and writes per second from thousands of clients all day long on modest hardware. In Kafka, messages belonging to a topic are distributed among partitions.

This ability of a Kafka topic to be divided into partitions allows Kafka to score high on scalability. Kafka is designed as distributed from the ground up as it runs as a cluster comprised of one or more servers each of which is called a broker. Kafka's message delivery is durable as it writes everything to the disk while maintaining the performance. Kafka can process 8 million messages per second at peak load.

​Processing Framework

​The other half of the stream processing is the Processing Framework that would consume the message, process it and produce the output message. Stream Processing framework need to have a one-at-a-time processing model and the data has to be processed immediately upon arrival. A Stream Processing framework also need to achieve low sub-second latency as it is critical to keep the data moving. The requirement to produce result for every event needs the processing framework to be scalable, highly available and fault tolerant.

streaming-comparision.png
Stream Processing Frameworks Comparision. Source : http://www.cakesolutions.net


​Apache Samza

​The Stream Processing framework space is crowded with multiple players like Strom, Flink, Spark Streaming, Samza, Kafka Streams, DataFlow. Apache Samza is probably the least well known stream processing framework that is trying to make a space for itself. Data Intensive organizations like Uber, NetFlix, Linkedin that process millions of events every second have Samza in their Stream Processing Architecture.
Samza has an advantage when it comes to performance, stability and support for a variety of input sources. Since Samza and Kafka were developed at Linkedin around the same time Samza is very Kafka-centric and has excellent integration with Kafka.
The key difference between Samza and other streaming technologies is its stateful streaming processing capability. Samza tasks have dedicated key/value store co-located on the same machine as the task. This approach delivers better read/write performance than any other streaming processing software. The Kafka, Samza based Stream Processing Architecture has proved its mettle in data intensive high frequency unbound stream processing use cases.

Sunday, December 4

The Reactive Manifesto

Today’s requirements demand new technologies as applications are deployed on multiple devices (mobile, tabs, cloud), with each of the machines having thousands of multicore processors. The user expectation on the response time is in millisecond, or even micro-second. The systems are expected to be up 100% of the time and churn big data (petabytes).

In summary today’s problems are far bigger in scale as well as in complexity as compared to those in the recent past.This calls for some changes to the design principles to be applied well in advance. One such approach is listed in The Reactive Manifesto by summarising the key principles of how to design highly scalable and reliable applications.

The Reactive Manifesto

In 2013 a group of individuals came up with the collection of design principles that would help build systems that are responsive, maintainable, elastic and scalable from the outset. This collection of design principles was published as a manifesto called as The Reactive Manifesto. The manifesto attempts to summarize the information on how to design highly scalable and reliable applications. These are listed as architecture best practices and also define a common vocabulary to easy communication while discussing these topics like architecture and scalability among various stakeholders like engineering managers, developers, architects, CTOs.

High Level Traits

The reactive systems exhibit four high-level traits: Responsive, Resilient, Elastic and Message Drive.
A reactive system will be responsive meaning it will react to users in timely manner. For the users, when the response time exceeds their expectation, the system is down.

A resilient system keeps processing transactions, even when there are transient impulses, persistent stresses, or component failures disrupting normal processing. This is what most people mean when they just say stability.

Resilient systems are loosely coupled in order to achieve high degree of resilience. This is achieved with a shared-nothing architecture, clear boundaries and use of Microservices that lead to single responsibility independent components.

Elasticity plays a major role in Scalability. A reactive system would Scale Up for responding to increase in number of users and will scale down to save cost as the number of system users decreases.
Reactive Systems are more flexible, loosely-coupled and scalable. They are easier to develop and open to change. They are significantly more tolerant of failure. Reactive Systems are highly responsive, giving users effective interactive feedback.

The manifesto is aimed at end-user projects as well as for reusable libraries and frameworks. One can look at the manifesto as a dictionary of best practices.

The long term advantage that this manifesto provides is to come up with a set of principles to avoid confusion and facilitate on-going dialog and improvement for scalable, resilient and responsive systems.

The Reactive Manifesto is about effectively modelling the use cases in our problem domain, writing solutions that will scale and live up to the customers’ expectations. The reactive characteristics can be considered as a list of service requirements.

The requirements of a system need to be captured and factored into a design right at the onset and the reactive manifesto allows one to have the characteristics in one place. This allows one to check if the particular characteristic is applicable to their system.