PSQL Log Analyser

The project focuses on running the log analyser for the PostgresSQL DB as a daily scheduled job using AWS DataPipeline

Project Details

Since the company was working with a lot of data like ship location, weather, ship information, etc, they were stored in the RDS database. The engineering team was running pgbadger log analyser manually on an EC2 instance. The project was to eliminate the disadvantages of the existing method and make it more efficient.

Existing Method

The reports were generated manually in an EC2 Instance where the PSQL was running. This was cumbersome and had a lot of disadvantages. The steps follows were…

SSH into the EC2 instance
Run the PGBadger against the PSQL DB
Archive the generated report
SCP the report to local
Send the report to the engineering team

Risks & Disadvantages

EC2 Instance runs 24x7x365 costing more money than required
Entire process is manual and time-consuming
Relies on developers mailbox to send reports
Reports need to be downloaded from the mailbox to view
Reports are sent at a different times every day/week
No Log Archival

Project Requirements

AWS DataPipeline
S3 Buckets
- Log Archival Bucket
- Static Hosting for Reports
Bash Script
Terraform

New Method

The project was approached in 2 steps.

First Step

A Bash script was run nightly to archive the day’s logs from the PostgreSQL EC2 Instance and uploaded them into the S3 bucket. This was uploaded to the S3 bucket using AWS Private link by creating the VPC endpoints. This was done since the EC2 Instances running the PSQL DB were in the private subnet and the data transfer cost was very minimal when AWS Private link was used.

Second Step

AWS DataPipeline was used to run the pgbadger job every day at a given time. This will create a spot instance in the private subnet to run the tool. When the EC2 Instance is ready, the bash script will download the required logfile from the S3 bucket and run the pgbadger tool. Once the tool has been successfully run, the reports are uploaded to the S3 bucket directly from the EC2 instance. The S3 Bucket was already created as a Static Hosting Website so the reports were available to anyone on the VPN.

There was no coding other than writing Bash scripts at the beginning of the project. However, to spice things up, I started using Terraform as my Infrastructure as Code to create AWS Data Pipeline and S3 Buckets. This was them migrated to AWS CloudFormation for keeping things consistent across the company.