Talks

Flink Forward – Tale of Stateful Stream to Stream Processing By Ajit Koti

Streaming engines like Apache Flink are redefining how we process data. Flink provides the opportunity to extract, transform and write data with an ease matching that of batch data processing frameworks. There are plenty of known and proven use cases of how to convert a single batch job into a streaming job. However, there are quite many challenges when we want to convert a stateful end-to-end batch workflow to multiple stateful stream jobs.
Netflix processes payment for 180M+ members across 190 countries. Payment processing and transaction data is very critical for measuring operational health and performance of our payments platform. We decided to move the existing batch workflow completely to stream. Things started to get exciting when we wanted to introduce multiple streaming jobs with zero data loss and high accuracy.
In this talk we describe how we converted a conventional complex stateful batch workflow to a multi-step stateful streaming workflow at Netflix using Flink.

Reactive Summit – Tale of Stateful Stream to Stream Processing by Ajit Koti

Streaming engines like Apache Flink are redefining how we process data. Flink provides the opportunity to extract, transform, and write data with ease matching that of batch data processing frameworks. There are plenty of known and proven use cases of how to convert a single batch job into a streaming job. However, there are quite many challenges when we want to convert a stateful end-to-end batch workflow to multiple stateful stream jobs. Netflix processes payment for 180M+ members across 190 countries. Payment processing and transaction data is very critical for measuring operational health and performance of our payments platform. We decided to move the existing batch workflow completely to stream. Things started to get exciting when we wanted to introduce multiple streaming jobs with zero data loss and high accuracy. In this talk, we describe how we converted a conventional complex stateful batch workflow to a multi-step stateful streaming workflow at Netflix using Flink.

Strange Loop Conf – Pattern Matching @ Scale Using Finite State Machine” by Ajit Koti and Rashmi Shamprasad

Pattern Matching @ Scale Using Finite State Machine

Working with data often means trying to locate data that fits patterns, akin to finding a needle in a haystack. When we add big data from non homogenous sources to the mix, this problem becomes exponentially complex. One of the use cases at Netflix, is about improving the Sign Up experience through experimentation. Being able to find user journeys across billions of events; that follow certain patterns, is a key insight into simplifying the sign up process.

This gave us an idea to build a framework to express these user journey patterns that could be translated into a Non Deterministic Finite State Machine. One of the ideas that we adapted from Ken Thompson’s 1968 CACM paper, was to create an NDFA around patterns defined using regex that could support backtracking. The next step was applying the state machine across billions of events at scale using Spark. The final piece to the puzzle was to make it easily usable by Data Engineers, Scientists and Analysts alike.

In this talk, we will cover how we built this framework (dubbed “Conduit”) and the design decisions resulting from challenges along the way. We will also talk about how this can be adapted to real time applications in the future.

Scale By The Bay – Finding Needles In Big Data Haystacks…. By Rashmi Shamprasad & Ajit Koti

Working with data often means trying to locate data that fits patterns, akin to “Finding a needle in a haystack”. When we add big data from non homogenous sources to the mix, this problem becomes exponentially complex. One of the use cases at Netflix, is about improving the Sign Up experience through experimentation. Being able to find user journeys across billions of events; that follow certain patterns, is a key insight into simplifying the sign up process. This gave us an idea to build a framework to express these user journey patterns that could be translated into a Non Deterministic Finite State Machine. One of the ideas that we adapted from Ken Thompson’s 1968 CACM paper, was to create a Non Deterministic Finite Automaton around patterns defined using regex. The next step was applying the state machine across billions of events at scale using Spark. The final piece to the puzzle was to make it easily usable by Data Engineers, Scientists and Analysts alike. In this talk, we will cover how we built this framework (dubbed “Conduit”) and the design decisions resulting from challenges along the way. We will also talk about how this can be adapted to real time applications in the future.