S3 & Athena — Self-Hosted Setup
Follow these steps to set up Amazon S3 and Athena for LogWise to store and query your application logs.
Prerequisites
- An AWS account with appropriate permissions to create S3 buckets, Glue databases/tables, and Athena workgroups
- Access to AWS Console (or AWS CLI configured)
1) Create S3 Bucket and Folders
- Go to Amazon S3 in the AWS Console
- Create a new S3 bucket (or use an existing one)
- Inside the bucket, create two folders:
logs— for storing log dataathena-output— for Athena query results
2) Copy S3 URI of Logs Directory
- Navigate to the
logsfolder you just created - Copy the S3 URI (e.g.,
s3://your-bucket-name/logs/) - You'll need this URI in the next steps
3) Create AWS Glue Database
- Go to AWS Glue Service → Data Catalog → Databases
- Click Create database
- Set the database name to:
logs - Set the location to the S3 URI you copied in step 2 (e.g.,
s3://your-bucket-name/logs/) - Click Create database
4) Create Athena Workgroup
- Go to Amazon Athena → Workgroups
- Click Create workgroup
- Configure the workgroup:
- Set the output location to your S3 bucket's
athena-outputfolder (e.g.,s3://your-bucket-name/athena-output/)
- Set the output location to your S3 bucket's
- Complete the workgroup creation
TIP
The workgroup's output location stores query results. Make sure you have appropriate IAM permissions for Athena to write to this location.
5) Create Table
- Go to Amazon Athena → Query editor
- In the query editor, select the workgroup you created in step 4 from the workgroup dropdown
- Run the following DDL to create the table:
sql
CREATE EXTERNAL TABLE `application-logs`(
`hostname` string,
`message` string,
`source_type` string,
`timestamp` string)
PARTITIONED BY (
`service_name` string,
`time` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
's3://your-bucket-name/logs'
TBLPROPERTIES (
'compressionType'='gzip',
'parquet.ignore.statistics'='true',
'projection.enabled'='true',
'projection.service_name.type'='injected',
'projection.time.format'='\'year=\'yyyy\'/month=\'MM\'/day=\'dd\'/hour=\'HH\'/minute=\'mm',
'projection.time.interval'='1',
'projection.time.interval.unit'='MINUTES',
'projection.time.range'='NOW-1YEARS,NOW',
'projection.time.type'='date',
'storage.location.template'='s3://your-bucket-name/logs/service_name=${service_name}/${time}')Important
The table schema includes partition keys (service_name, time) which are essential for efficient querying in Athena. Ensure your log data is organized in S3 with these partitions in the path structure.
Before running the query, replace s3://your-bucket-name/logs with your actual S3 URI from step 2 in both the LOCATION clause and the storage.location.template property.
