LogWise - Local Development Setup β
A complete end-to-end logging system that streams logs from Vector β Kafka β Spark β S3/Athena, with a Spring Boot Orchestrator, Grafana dashboards, and automated cron jobs.
ποΈ Architecture β
βββββββββββ βββββββββββ βββββββββββ βββββββββββ
β Vector βββββββΆβ Kafka βββββββΆβ Spark βββββββΆβ S3 β
β (Logs) β β(Stream) β β(Process)β β(Storage)β
βββββββββββ βββββββββββ βββββββββββ βββββββββββ
β
βΌ
βββββββββββββββ
β Athena β
β (Query) β
βββββββββββββββ
β
βΌ
βββββββββββββββ
β Grafana β
β (Dashboard) β
βββββββββββββββComponents:
- Vector: Log collection and forwarding
- Kafka: Message streaming (KRaft mode)
- Spark 3.1.2: Stream processing and Parquet writing
- S3: Object storage for processed logs
- Athena: Query engine for S3 data
- Grafana: Visualization and dashboards
- Orchestrator: Spring Boot service for job management
- MySQL: Database for orchestrator configuration
π Prerequisites β
Required β
- Docker (v20.10+) and Docker Compose (v2.0+)
- Make (for convenience commands)
- AWS Credentials with access to:
- S3 bucket (read/write)
- Athena workgroup (query execution)
Note: The
setup.shscript will automatically install Docker, Make, and other prerequisites if they're missing (on macOS and Debian/Ubuntu Linux). For other systems, install these manually before running setup.
Optional β
- Maven 3.2+ (if building Spark JAR locally)
- Java 11+ (if building Spark JAR locally)
β οΈ Mandatory: S3 & Athena Setup (Must Complete First) β
Before proceeding with the Docker setup, you MUST complete the S3 & Athena configuration. This is a required prerequisite as the LogWise stack depends on AWS S3 for log storage and Athena for querying.
Steps to Complete: β
Follow the S3 & Athena Setup Guide to:
- Create an S3 bucket with
logsandathena-outputfolders - Create an AWS Glue database
- Create an Athena workgroup
- Create the
application-logstable
- Create an S3 bucket with
Note down the following information (you'll need it for the
.envfile):- S3 bucket name
- S3 URI for logs (e.g.,
s3://your-bucket-name/logs/) - S3 URI for Athena output (e.g.,
s3://your-bucket-name/athena-output/) - Athena workgroup name
- Athena database name (typically
logs)
Return to this page after completing the S3 & Athena setup to continue with the Docker deployment.
Critical
Do not proceed with the Docker setup until you have completed the S3 & Athena configuration. The setup will fail without proper AWS resources configured.
π Quick Start β
One-Command Setup β
The easiest way to get started is with our one-click setup script:
cd deploy
./setup.shThis single command will:
- β Install prerequisites (Docker, Make, AWS CLI, etc.) if needed
- β
Create
.envfile from template (.env.example) - β Prompt you to fill in AWS credentials
- β Start all services (Vector, Kafka, Spark, Grafana, Orchestrator, MySQL)
- β Wait for services to become healthy
- β Create Kafka topics automatically
That's it! Your LogWise stack will be up and running.
π Accessing Services β
| Service | URL | Credentials |
|---|---|---|
| Grafana | http://localhost:3000 | admin / admin (default) |
| Spark Master UI | http://localhost:18080 | - |
| Spark Worker UI | http://localhost:8081 | - |
| Orchestrator | http://localhost:8080 | - |
| Orchestrator Health | http://localhost:8080/healthcheck | - |
βοΈ Configuration Details β
The .env file contains all configuration for the LogWise stack. When you run setup.sh, it automatically creates this file from .env.example. Here are the key configuration sections:
AWS Configuration (Required) β
AWS_REGION=us-east-1 # AWS region for S3 and Athena
AWS_ACCESS_KEY_ID=your-access-key # AWS access key ID
AWS_SECRET_ACCESS_KEY=your-secret-key # AWS secret access key
AWS_SESSION_TOKEN= # Optional: for temporary credentialsS3 Configuration (Required) β
S3_BUCKET_NAME=your-bucket-name # S3 bucket for storing processed logs
S3_PREFIX=logs/ # Prefix/path within the bucketAthena Configuration (Required) β
S3_ATHENA_OUTPUT=s3://bucket/athena-output/ # S3 path for Athena query results
ATHENA_WORKGROUP=primary # Athena workgroup name
ATHENA_CATALOG=AwsDataCatalog # Athena data catalog
ATHENA_DATABASE=logwise # Athena database nameKafka Configuration β
KAFKA_BROKERS=kafka:9092 # Kafka broker address (default for Docker)
KAFKA_TOPIC=logs # Kafka topic name for logs
KAFKA_CLUSTER_ID=9ZkYwXlQ2Tq8rBn5JcH0xA # Kafka cluster ID (KRaft mode)Spark Configuration β
SPARK_MASTER_URL=spark://spark-master:7077 # Spark master URL
SPARK_STREAMING=true # Enable Spark streaming
SPARK_MASTER_UI_PORT=18080 # Spark Master UI port
SPARK_VERSION_MATCH=3.1.2 # Spark version
HADOOP_AWS_VERSION=3.2.0 # Hadoop AWS library version
AWS_SDK_VERSION=1.11.375 # AWS SDK version
MAIN_CLASS=com.logwise.spark.MainApplication # Spark application main classDatabase Configuration β
MYSQL_DATABASE=myapp # MySQL database name
MYSQL_USER=myapp # MySQL user
MYSQL_PASSWORD=myapp_pass # MySQL password
MYSQL_ROOT_PASSWORD=root_pass # MySQL root passwordOther Configuration β
ORCH_PORT=8080 # Orchestrator service port
TENANT_VALUE=ABC # Tenant identifierFor a complete list of all environment variables, see .env.example in the deploy directory.
π οΈ Common Commands β
# Start all services
make up
# Stop all services
make down
# View logs
make logs
# Check service status
make ps
# Stop and remove volumes
make teardown
# Reset Kafka (fix cluster ID issues)
make reset-kafkaβ οΈ Troubleshooting β
Spark Worker Not Accepting Resources β
Symptom: WARN Master: App requires more resource than any of Workers could have
Solution:
- Check worker memory:
docker compose logs spark-worker | grep "Starting Spark worker" - Ensure worker has enough memory. The worker needs:
- Memory for driver + executor + overhead
- Default: 512m driver + 512m executor = ~1GB minimum
- Adjust in
.env:bashSPARK_DRIVER_MEMORY=400m SPARK_EXECUTOR_MEMORY=400m - Or increase worker memory limit in
docker-compose.yml:yamlspark-worker: mem_limit: 3g
ClassNotFoundException for S3 or Kafka β
Symptom: java.lang.ClassNotFoundException: org.apache.hadoop.fs.s3a.S3AFileSystem
Solution:
- The custom Spark Dockerfile includes required JARs:
hadoop-aws-3.2.0.jaraws-java-sdk-bundle-1.11.375.jarspark-sql-kafka-0-10_2.12-3.1.2.jarkafka-clients-2.6.0.jar
- Rebuild the Spark image:
docker compose build spark-worker spark-master spark-client
AWS Access Denied (403 Forbidden) β
Symptom: AccessDeniedException: 403 Forbidden
Solution:
- Verify AWS credentials in
.env:bashAWS_ACCESS_KEY_ID=your-key AWS_SECRET_ACCESS_KEY=your-secret AWS_SESSION_TOKEN=your-token # If using temporary credentials AWS_REGION=us-east-1 - Ensure IAM permissions include:
s3:GetObject,s3:PutObject,s3:ListBucketon target bucketathena:StartQueryExecution,athena:GetQueryResults(if using Athena)
- Restart Spark client:
docker compose restart spark-client
Port Conflicts β
Symptom: Error: bind: address already in use
Solution:
- Change ports in
.env:bashGRAFANA_PORT=3001 ORCH_PORT=8081
Kafka Cluster ID Mismatch β
Symptom: Cluster ID mismatch errors
Solution:
make reset-kafka
make upDisk Space Issues β
Symptom: no space left on device
Solution:
# Clean up Docker
docker system prune -a --volumes
# Remove unused images
docker image prune -aSpark Worker Not Registering β
Symptom: Worker fails to connect to master
Solution:
- Check network connectivity:bash
docker compose exec spark-worker curl http://spark-master:8080 - Verify master is running:bash
docker compose logs spark-master | grep "Successfully started service" - Check worker logs:bash
docker compose logs spark-worker | grep -i "error\|exception"
π Project Structure β
logwise/
βββ deploy/
β βββ docker-compose.yml # Main orchestration file
β βββ Makefile # Convenience commands
β βββ setup.sh # One-click setup script
β βββ grafana/provisioning/ # Grafana dashboards & datasources
β βββ healthcheck-dummy/
β βββ Dockerfile # Healthcheck test service
βββ vector/
β βββ vector.yaml # Vector configuration
β βββ logwise-vector.desc # Protobuf descriptor
βββ spark/
β βββ docker/Dockerfile # Spark container image
βββ orchestrator/
βββ docker/Dockerfile # Orchestrator container image
βββ db/init/ # Database initialization scriptsπ Security Notes β
- Never commit
.envfile - Contains sensitive AWS credentials - Use IAM roles in production instead of access keys
- Enable TLS/SSL for production deployments
- Restrict network access to services in production
Happy Logging! π
