4 Kafka Streams -> Stateful Operations of KStreams and KTables without Joins and Window

KTable:

  • groupBy

KGroupedStream / KGroupedTable:

  • Count
  • Reduce
  • Aggregate

KStreams:

  • Peek
  • Transform / TransformValues

KTable GroupBy()

4 Kafka Streams -> Stateful Operations of KStreams and KTables without Joins and Window

KGroupedStream / KGroupedTable Count()

KGroupedStream comes after groupBy() or groupByKey() call on KStream

Count counts the number of record by grouped key

If used on KGroupedStream:

  • Null keys or values are ignored

If used on KGroupedTable:

  • Null keys are ignored
  • Null values are treated as "delete" (=tombstones)

KGroupedStream / KGroupedTable Aggregate()

For KGroupedStream aggregate() you need an initializer, an adder, a Serde and a StateStore name (name of your aggregation)

4 Kafka Streams -> Stateful Operations of KStreams and KTables without Joins and Window

For KGroupedTable aggregate() you need an initializer, an adder, a substractor, a Serde and a StateStore name (name of your aggregation)

4 Kafka Streams -> Stateful Operations of KStreams and KTables without Joins and Window

substractor is used for keep current state of data, if key1, value1 is updated to key1, value2 then by count we will got key1, (count of value1 - 1)

Why do we need  substractor in KGroupedTable, but not in KGroupedStream? Because in Stream we only will perform insert operation, which for aggregate() could be handled with add.  In KGroupedTable we perform not only insert, but also update/delete, that's why we need a substractor for it.

KGroupedStream / KGroupedTable Reduce()

Reduce is a simplified version of aggregate(), but the result type has to be the same as an input:

(int, int) => int or (String, String) => String (example concat(a,b))

4 Kafka Streams -> Stateful Operations of KStreams and KTables without Joins and Window

KStream Peek()

Peek allows you to apply a side-effect operation to a KStream and get the same KStream as a result

For example:

  • Printing the stream to the consolo
  • Statistics collection

Warning: it could be executed multiple times (cause of at least once concept) as it is side effect (in case of failures)

Like for debugging on the console:

4 Kafka Streams -> Stateful Operations of KStreams and KTables without Joins and Window

KStream Transform() / TransformValues()

Trust me, you won't need them !!!

Summary

This diagram explains, how different Kafka object interact and transform:

4 Kafka Streams -> Stateful Operations of KStreams and KTables without Joins and Window