Why we built it and what problem it actually solves
Cloud infrastructure is easy to set up and dangerously easy to misconfigure. An S3 bucket left public, an IAM role with overly broad permissions, a security group with port 22 open to the world — these aren’t hypothetical risks. They’re the actual vulnerabilities that lead to breaches, and they happen because infrastructure grows faster than the team’s ability to audit it.
We built Infra as an internal tool first — something we needed ourselves. Managing multiple client environments means dozens of cloud resources across different accounts, and manually checking configurations doesn’t scale. The goal was simple: a system that continuously scans infrastructure, flags what’s wrong, and tells us about it before it becomes a problem.
“The question wasn’t whether we had misconfigurations — we knew we did. The question was how fast we could find and fix them before they mattered.”
— Internal engineering team
The architecture: serverless by necessity, not by trend
We chose serverless (AWS Lambda) for a specific reason: the scanner doesn’t need to run 24/7. It needs to run frequently, finish quickly, and cost almost nothing when it’s idle. A traditional server running scheduled cron jobs would work, but we’d be paying for compute time that was 95% idle.
- Scheduled Lambda functions that scan specific resource types on a configurable cadence — IAM policies every hour, security groups every 30 minutes, storage configurations daily. Each scan is a discrete function with a focused scope.
- A rules engine that evaluates scan results against security benchmarks (CIS, AWS best practices) and custom rules specific to our environments. Severity levels are assigned automatically: critical, high, medium, informational.
- Real-time alerting via Slack and email. Critical findings trigger immediately. Everything else aggregates into a daily digest. The goal is signal, not noise — alert fatigue defeats the purpose.
- A reporting dashboard that shows trends over time: are we getting better or worse? Which resource types have the most recurring issues? Where should we invest in automation or training?
What it catches (and what it doesn’t)
The scanner is effective at catching configuration drift — things that were set up correctly but changed over time, or resources provisioned quickly during an incident that were never hardened afterward. Common findings:
- Storage resources with public access policies (intentional or accidental)
- IAM roles with wildcard permissions that should have been scoped
- Security groups allowing ingress from 0.0.0.0/0 on sensitive ports
- Resources missing encryption at rest or in transit
- Outdated runtimes on Lambda functions or container images
What it doesn’t do: application-level security testing. It’s not a penetration testing tool. It’s a hygiene tool — it makes sure the foundation is solid so that the application-level work has something secure to build on.
What building this taught us
The biggest insight wasn’t technical — it was behavioural. Having automated scans changed how our team thinks about provisioning. When you know a scanner will flag a misconfiguration within 30 minutes, you’re more careful during setup. The scanner didn’t just find problems — it prevented them, because the feedback loop was tight enough to change habits.
“Before Infra, we did quarterly manual audits. By the time we found an issue, it had been sitting there for months. Now we know within the hour.”
— Bhoja Solutions engineering
Should you build or buy?
For most businesses, a managed tool like AWS Config Rules, Prowler, or ScoutSuite will cover 80% of what you need. We built our own because we needed custom rules, multi-account scanning, and integration with our specific alerting workflow. If your needs are simpler, start with an open-source scanner and customise from there.
The important thing isn’t which tool — it’s that you’re scanning continuously. A quarterly audit is an archaeology project. Continuous scanning is prevention.