Apache Flink: Stateful Computations over Data Streams
True streaming-first engine with native event-time processing, watermarks for out-of-order events, and barrier-based exactly-once checkpointing. Covers Q38–Q42.
Apache Flink was designed from the ground up as a streaming-first system. Unlike Spark Streaming (which approximates streaming via micro-batches), Flink processes each event individually with true streaming semantics — enabling sub-millisecond latencies and native event-time processing with watermarks.
💡 Flink's streaming-first philosophy
Flink vs Spark: Streaming Philosophy
- Each event processed individually as it arrives
- Latency: sub-millisecond to milliseconds
- Event time is first-class (watermarks built in)
- Batch = bounded stream (unified model)
- State is scoped per-key, stored in RocksDB
- Windows are triggered by watermarks (not batch intervals)
- Events collected into micro-batches (0.5–5s intervals)
- Latency = batch interval (100ms minimum)
- Processing time native; event time added in Structured Streaming
- Streaming = scheduled batch jobs on D-Streams
- State stored in RDD partitions + checkpoints
- Windows are RDD operations over time intervals
Flink Architecture
Flink's layered architecture separates user-facing APIs from the core runtime engine. Click each layer to learn more.
Select a layer to learn about it.
Flink's architecture separates user-facing APIs from the runtime engine, enabling the same core to handle both bounded (batch) and unbounded (streaming) data.
Windowing Strategies
Windows are the key abstraction for processing unbounded streams in bounded chunks. Flink supports three core window strategies, each with event-time semantics.
Non-overlapping windows of fixed size. Each event belongs to exactly one window. Good for period-over-period comparisons (hourly totals, daily summaries).
Watermarks: Handling Out-of-Order Events
Real-world event streams arrive out of order (network delays, retries, mobile offline sync). Watermarks tell Flink how far behind the current processing time the event time has progressed, enabling windows to fire correctly even when events arrive late.
Barrier-Based Checkpointing
Flink achieves exactly-once semantics by periodically inserting barriersinto the data stream. Barriers are special markers that flow with the data, triggering state snapshots when they reach each operator — without pausing the stream.
JobManager injects a barrier into every source stream. Barrier carries the checkpoint ID.
Source operator snapshots its state (e.g., Kafka offset) when it emits the barrier.
Operators receive barriers on all input channels. They perform barrier alignment: buffer records received after the barrier, wait for barriers on ALL inputs before proceeding.
Once all barriers aligned: operator snapshots its state to the state backend (RocksDB / HDFS).
Operator forwards the barrier downstream and resumes processing buffered records.
When the Sink receives all barriers: reports checkpoint complete to JobManager. Global consistent snapshot achieved.
Exactly-Once Semantics
Events may be lost but never duplicated. No fault tolerance. Fastest.
Events never lost but may be reprocessed (duplicated) on recovery. Simple: replay from last checkpoint.
Each event affects state exactly once. Achieved via barrier checkpoints + recovery replays only uncommitted records.
Requires transactional sinks (Kafka transactions, JDBC transactions) so committed outputs are not duplicated even if the sink is replayed.
Flink vs Spark Streaming: Detailed Comparison
| Feature | Spark Streaming | Apache Flink |
|---|---|---|
| Processing model | Micro-batch (D-Streams) | True streaming (per-event) |
| Latency | 100ms – seconds (= batch interval) | Milliseconds (sub-ms possible) |
| Time semantics | Processing time native; event time added in v2.x | Event time native with watermarks |
| State management | RDD lineage + Structured Streaming state | Rich keyed state (ValueState, MapState, etc.) |
| State backend | In-memory RDD or checkpoint | MemoryStateBackend / RocksDB (large state) |
| Windowing | Batch window over DStream time slices | True windowing with watermark-triggered firing |
| Fault tolerance | RDD lineage recomputation | Chandy-Lamport barrier checkpoints |
| Exactly-once | Yes (Structured Streaming + WAL) | Yes (barrier checkpoints, native) |
| Backpressure | Manual tuning | Automatic backpressure (credit-based) |
| Batch support | Yes (primary use case) | Yes (bounded DataSet API, same runtime) |
Flink Exam Questions (Q38–Q42)
These questions cover the core Flink concepts tested in the exam.