Watermarking and Late Data Handling in Spark Structured Streaming
5d ago ยท 23 min read ยท TLDR: A watermark tells Spark Structured Streaming: "I will accept events up to N minutes late, and then I am done waiting." Spark tracks the maximum event time seen per partition, takes the global minimum across all partitions, subtracts the thresho...
Join discussion






















