As companies today are re-architecting to become fully digital, they are putting events and data at the center of the business. In this journey, it's critical to start off on the right foot. Here’s a guide on what makes a good first project and how to use change data capture (CDC) to set yourself, and the entire company, up for success.
The first step
Sometimes the first project just shows up: there is a business problem to be solved, and the architects who lead the project agree that Apache Kafka is part of the solution and the best way to solve the problem. One project leads to another and next thing you know, everything that happens in the company is streamed in real time to a central platform, where it is available for anything else to tap into, process and react to: a central exchange of events and data that connects the entire organization.
Other times, the driver will instead be a broader vision within the organization for digital transformation and a move to a sustainable architecture based on a streaming platform. Apache Kafka is a streaming platform that enables real-time, scalable and event-driven integration and processing across the enterprise. With such a vision, you need a plan to get there. This search for a good place to start, a good first project on the path to a long-term architecture, should not be confused with “a solution looking for a problem.” Rather, it is a strategic exercise of finding a path that leads the organization to its goal; oftentimes, this goal is becoming a fully digital business.
Creating early success
There are many ways to help an organization on the path to a new platform, but the one we’ve seen work consistently is early success. If you manage to prove your ideas are successful in a project, it will be much easier to convince others to jump on board. Saying “use Kafka because it worked well for LinkedIn and Netflix” isn’t quite as convincing as saying “our security team used Kafka to improve their intrusion detection capabilities, saved $30M in loss prevention and their architect got promoted to a director based on the project success.” Who wouldn’t join your digital transformation efforts after hearing a story like this? Everyone wants to have a positive impact on their business.
In order to create a successful first initial project, there are a few things you want to pay attention to:
1. Business impact
You want your project to save money or make money in a measurable way. Projects like mainframe offloading and fraud detection are great at saving money. Projects like targeted marketing or a customer service improvement project like customer 360 tend to help make money. Having such measurable goals will let everyone see the project’s success.
2. Ease of execution
It is easy to accidentally bite off more than you can chew, particularly if you have a target architecture in mind. Remember that the goal is to have business impact within a few weeks, maybe two months at most. If the project starts dragging on, it becomes more vulnerable to random additional requirements, changes in priorities, and other disruptions. Focus on the smallest part of the platform you need to demonstrate business impact.
3. Minimal dependencies
You’re going to need to get data into Kafka, and usually this means from systems owned by a different team. However, as a new project with no proven business impact yet, it will often be difficult to get other teams to prioritize working with you. You can’t ask other services to just write events to Kafka, because there’s a good chance they are not convinced they need to use Kafka at all.
These requirements form a Catch 22: you can’t create business impact without having the data, but in order to get the data fast, you need other teams to help you, which you can’t do without getting some business impact first.
This is where change data capture (CDC) comes in.
What is CDC?
At the heart of any database sits the transaction log. It’s called different things in different technologies—a redo log, a binlog, a write-ahead-log (WAL)—but the fundamental concept remains the same. The transaction log is an immutable, sequential log of every transaction performed by every user and application of the database—inserts, updates, deletes, commits, and rollbacks. In other words, a stream of events. If you are using Kafka to build an event-driven streaming platform, this is exactly the data you need.
So you need data from those applications, but you can’t convince the owner to publish events into your platform? They are probably using existing databases and message queues. CDC software connects to the existing databases, collects these events either from the database directly or from the transaction logs on disk, and lets you stream these events into Kafka where they can be stored, processed and accessed by the new microservices and pipelines that your team is building. This is a way to cut through the organization complexity and demonstrate the value of a new platform, without having to convince anyone else to use the platform first.
Delivering value with CDC and stream processing
We’ve seen multiple organizations use CDC to stream customer information from the many places it is stored within their organization, then use stream processing technology to join it together and create a comprehensive view of a customer—available at the fingertips to everyone from customer service to product management.
Of course, this all depends on how easy it is to capture changes out of a database and make them accessible to the organization. Once data has been liberated from your source database, you can take advantage of the platform stream processing capabilities to deliver value from your project even quicker.
An event streaming platform lets you rapidly tackle an interesting use case with positive impact on the business. Once you have a success story under your belt and the data in a centralized streaming platform it becomes much easier to on-board additional projects.
This is the real power of a good platform—the more projects that use it, the more useful it becomes. Each new project is easier and delivers even more impact, because there is more data, better tooling and experience with the platform.