Kafka

Kafka provides a high-throughput, fault-tolerant, and scalable log ingestion layer for the Logwise system.

Overview

Kafka acts as the message broker that receives processed logs from Vector and buffers them for downstream consumers like Apache Spark.

Vector → Kafka → Spark Jobs

Kafka enables:

High throughput - Handles massive log volumes efficiently with parallel processing across partitions
Fault tolerance - Replicated data across brokers with automatic failover
Scalability - Horizontal expansion by adding brokers with partition-based parallelism

Vector dynamically creates Kafka topics using the service_name tag. Topics follow the naming convention: logs.{service_name}.

Format:

Examples:

This automatic topic creation enables organized log routing and processing.

Topics start with 3 partitions (base count from num.partitions configuration)
Partitions can be manually adjusted if needed based on your throughput requirements

By default, topics have 1 hour retention. Messages are automatically deleted after 1 hour.

Increase retention beyond 1 hour for:

See the Kafka Setup Guide for installation and configuration.