The project focuses on running the log analyser for the PostgresSQL DB as a daily scheduled job using AWS DataPipeline
Project Details
Since the company was working with a lot of data like ship location, weather, ship information, etc, they were stored in the RDS database. The engineering team was running pgbadger log analyser manually on an EC2 instance. The project was to eliminate the disadvantages of the existing method and make it more efficient.
Existing Method
The reports were generated manually in an EC2 Instance where the PSQL was running. This was cumbersome and had a lot of disadvantages. The steps follows were…
SSH
into the EC2 instance- Run the PGBadger against the PSQL DB
- Archive the generated report
SCP
the report to local- Send the report to the engineering team
Risks & Disadvantages
- EC2 Instance runs 24x7x365 costing more money than required
- Entire process is manual and time-consuming
- Relies on developers mailbox to send reports
- Reports need to be downloaded from the mailbox to view
- Reports are sent at a different times every day/week
- No Log Archival
Project Requirements
- AWS DataPipeline
- S3 Buckets
- Log Archival Bucket
- Static Hosting for Reports
- Bash Script
- Terraform
New Method
The project was approached in 2 steps.
First Step
A Bash script was run nightly to archive the day’s logs from the PostgreSQL EC2 Instance and uploaded them into the S3 bucket. This was uploaded to the S3 bucket using AWS Private link by creating the VPC endpoints. This was done since the EC2 Instances running the PSQL DB were in the private subnet and the data transfer cost was very minimal when AWS Private link was used.
Second Step
AWS DataPipeline was used to run the pgbadger job every day at a given time. This will create a spot instance in the private subnet to run the tool. When the EC2 Instance is ready, the bash script will download the required logfile from the S3 bucket and run the pgbadger
tool. Once the tool has been successfully run, the reports are uploaded to the S3 bucket directly from the EC2 instance. The S3 Bucket was already created as a Static Hosting Website so the reports were available to anyone on the VPN.
There was no coding other than writing Bash scripts at the beginning of the project. However, to spice things up, I started using Terraform as my Infrastructure as Code to create AWS Data Pipeline and S3 Buckets. This was them migrated to AWS CloudFormation for keeping things consistent across the company.