Streaming Architecture Development – Points to ponder

What is a Data Stream

Any data that is continuously flowing

Main Stages of Data Stream Management

Collection – Analyze – Consume

Architecture Components

  1. Collection (Data Integration)
  2. Storage
  3. Stream Processing / Analysis
  4. Consumption

Architecture Characteristics

  1. Data Stream Availability (time pattern aspect fixed/pattern/intermittent)
    1. Real-Time ( macros to < milli-seconds)
    2. Near Real-Time (seconds to Minutes)
    3. Mini Batch (hour)
    4. Immutable Data Stream
  2. Pay Load / Type of Data
    1. In Bytes / KBs
    2. In MBs (Images, files)
  3. Storage & Data Stream Consumption Access Pattern
    1. Storage
      1. Time oriented (time series)
      2. Shard / Partition support
      3. Consumer Re-playability / Multiple re-reads
    2. Consumption Access Pattern
      1. Point to Point Consumption – store data in as-is to source data
      2. Multiple consumers – canonical view (JSON, AVRO)
      3. Latest and greatest data of a particular type of data stream (type could be primary key OR composite key or a whole data source latest state) – structured streaming
  4. Security
    1. Row level security
    2. Field level security
    3. Data Source level security
  5. Data Stream Processing
    1. Row oriented processing
    2. Mini-Batch processing
    3. Incremental and continuous processing
    4. Stateful and Stateless data stream management
    5. Serverless
    6. Chaining data stream processing
    7. Infrastructure As A Code support to kickstart Data Stream Processing
  6. Source Data Steam & Data Stream Processed Output Data – Schema Management and Registry
  7. Scalability, Failover and Accessibility
    1. Fault-tolerant system
    2. Highly Available
    3. Distributed data storage management and data stream processing