A Technical Article Series for Data Architects
This multi-part article series is intended for data architects and anyone else interested in learning how to design modern real-time data analytics solutions. It explores key principles and implications of event streaming and streaming analytics, and concludes that the biggest opportunity to derive meaningful value from data – and gain continuous intelligence about the state of things – lies in the ability to analyze, learn and predict from real-time events in concert with contextual, static and dynamic data. This article series places continuous intelligence in an architectural context, with reference to established technologies and use cases in place today.
Part 1: Introduction
Organizations are overwhelmed by streams of live data from their products, assets, partners, people, cloud services, their apps & IT infrastructure. “Store then analyze” architectures struggle to meet the need for granular, context-rich, real-time insights (see figure 1). Most data is only ephemerally useful – but also boundless. High data volumes and distributed data sources make centralized collection expensive and slow, and applications fragile. Insights are coarse and arrive too late for decision makers on the ground who need granular, situationally relevant, continuous operational intelligence.
Use cases for continuous intelligence are broad, and include dynamically managing connection quality in mobile networks, real-time traffic monitoring and routing in cities, optimizing performance of SaaS apps, asset and equipment inventory management and tracking, dynamic response to changing customer demand placed on power grids, real-time contextualized customer experiences and detection of fraud/hackers in complex IT systems and mobile networks. They are all characterized by a need to understand streaming data and automatically respond in context – in real-time, concurrently and at scale. Continuous intelligence applications statefully fuse streaming and traditional data, analyzing, learning, and predicting on-the-fly in response to streaming data from distributed sources.
Swim makes it easy to develop and run business-critical continuous intelligence applications that use granular contextual relationships to analyze, learn and predict from streaming data on-the-fly to deliver locally relevant, real-time responses – at any scale.
Continuous intelligence embraces infrastructure service patterns like “pub/sub” from event streaming. It addresses the application platform need to help organizations develop, deploy, and operate stateful applications that consume streaming events, analyzing, learning and predicting on the fly to deliver streams of real-time insights and responses. Although modern databases can store data for later analysis, and update relational tables or modify graphs, continuous intelligence drives analysis from the arrival of data – adopting an “analyze-then-store” architecture that automatically builds and continuously executes a distributed, live model from streaming data. Whereas streaming analytics applications use a top-down UI or query/response user-driven control loop, continuous intelligence applications continuously compute and stream insights, deliver truly real-time user experiences and facilitate real-time automatic responses at massive scale.
Event Streaming for an ‘Always-on’ World
Event Streaming has emerged as a powerful cloud-native pattern that allows any number of “producers” – applications, containers, users, devices and infrastructure – to asynchronously publish events that record their state changes, to topics managed by a broker that queues events by topic in arrival order (see figure 2). Applications subscribe to relevant topics to receive and process events in arrival order.
The most widely known open source event streaming technology is Apache Kafka, though Apache Pulsar, Apache Beam and Apache Samza are also popular. CNCF NATS is optimized for performance and simplicity. There are many others, and all major cloud providers offer a managed event streaming service. So, is event streaming the answer?
No. It is a useful infrastructure pattern that helps organizations buffer asynchronously generated events before they are analyzed: The applications that analyze these events are beyond their scope. The application layer challenge therefore is to analyze and respond to events, given their contextual meaning, in real-time.
Event streaming decouples publishers from subscribers (hence “pub/sub”) so they can operate independently. It doesn’t specifically address application needs. The broker is a data mover that buffers events from any number of data sources for delivery, by topic, in arrival order to different apps, whenever they are ready to consume them. Each application can independently consume events and runs at its own pace. Event streaming is thus an important enabler of next-generation applications that are driven by the arrival of data, much as a database update is an enabler for legacy apps. But it is important to note that event streaming infrastructure does not understand the meaning of events – an application layer concern.
Streaming Infrastructure Management
For all its successes, event streaming infrastructure is tricky to operate at large scale, often due to the challenge of scaling the broker across a cluster whilst scaling topic queue storage and distribution. Brokers are increasingly relied upon to store events for all time so that if a catastrophic failure occurs it is possible to recreate the state of an application by replaying the events. But eventually a more durable (and lower volume) representation of the data must be found. Raw data isn’t really that useful for analysis. Instead, it is the representation of the state changes on the part of the data sources that is most useful. A traffic light is still red if it says so every second; moreover the voltage transitions in the relays on the light are not really useful: We just need to know that the light is red, and when it changes.
Centralizing event queues can increase networking costs and impacts application response times by adding another tier of storage just for buffering – the topic queues (see figure 3). Sizing and scaling out broker clusters is difficult because there is no way to determine a priori how many sources will ‘pub’ or their event rates, and furthermore there is no way for the broker to drive the consumption rate of events by communicating the queue depth to an application stream processor whose goal is to stay up to date. This is worrisome because the stream processor might reach a critical event just too late, even if there was ample earlier evidence of an important issue in earlier messages.
About the Author
Simon Crosby is CTO at Swim. Swim offers the first open core, enterprise-grade platform for continuous intelligence at scale, providing businesses with complete situational awareness and operational decision support at every moment. Simon co-founded Bromium (now HP SureClick) in 2010 and currently serves as a strategic advisor. Previously, he was the CTO of the Data Center and Cloud Division at Citrix Systems; founder, CTO, and vice president of strategy and corporate development at XenSource; and a principal engineer at Intel, as well as a faculty member at Cambridge University, where he led the research on network performance and control and multimedia operating systems.
Simon is an equity partner at DCVC, serves on the board of Cambridge in America, and is an investor in and advisor to numerous startups. He is the author of 35 research papers and patents on a number of data center and networking topics, including security, network and server virtualization, and resource optimization and performance. He holds a PhD in computer science from the University of Cambridge, an MSc from the University of Stellenbosch, South Africa, and a BSc (with honors) in computer science and mathematics from the University of Cape Town, South Africa.
Speak Your Mind