Internal Tech Talk

Defense in Depth for AI-Assisted Development

Pre-commit hooks, review agents, and CI that catch LLM mistakes

Brooks McMillin | Infrastructure Security |

The Problem

Things I've caught Claude doing in the past month

Stored OAuth tokens in plaintext JSON files
Removed CSRF protection because "it was causing test failures"
Used eval() to parse user-provided configuration
Committed a .env file with production database credentials
Updated test DB creds — already bypassing detect-secrets — with real prod admin creds
Truncated a 400-line Svelte component to 50 lines, losing auth logic

LLMs optimize for task completion,
not operational safety.

The solution isn't to stop using them — it's to build defensive layers that catch dangerous code before it ships.

Strategy

Three layers, each catches what the others miss

Layer	What It Catches	When	Bypassable?
Pre-commit hooks	Patterns, secrets, type errors	Before commit	--no-verify
Review agents	Logic errors, context mismatches	On PR creation	Ignore comments
CI workflows	Everything above + integration	Before merge	No ✓

01

Layer 1

Pre-commit Hooks

The first line of defense — block problems before they reach git

Layer 1 — Pre-commit Hooks

The pre-commit stack

              # .pre-commit-config.yaml

              repos:

              # Secret detection

              - repo: Yelp/detect-secrets

                hooks:

                  - id:
              detect-secrets

              # Linting + formatting

              - repo: astral-sh/ruff-pre-commit

                hooks:

                  - id: ruff

                  - id:
              ruff-format

              # Type checking

              - repo: local

                hooks:

                  - id: pyright

                    entry: uv run pyright

              # Security scanning

              - repo: PyCQA/bandit

                hooks:

                  - id: bandit

🔑 detect-secrets

Blocks commits containing credential patterns. Catches API keys LLMs paste directly into code.

📐 pyright

Enforces strict typing. LLMs are sloppy with types — this catches inconsistencies early.

🛡️ bandit

Flags insecure crypto, SQL injection risks, hardcoded passwords, eval() usage.

Live Demo

Layer 1 — Pre-commit Hooks

Catching LLM mistakes before commit

Demo 1: Secret Detection

Commit a file with a hardcoded API key

              # LLM-generated config.py
              API_KEY =
              "sk-proj-abc123..." DB_PASSWORD =
              "prod_admin_2024"
            

→ detect-secrets blocks the commit

Demo 2: Security Anti-pattern

Commit code using eval() for config parsing

              # LLM's "quick fix" for parsing
              config = eval(user_input)
            

→ bandit flags B307: eval() usage

Pre-commit hooks add ~30 seconds per commit. Easiest win you can adopt today.

02

Layer 2

Code Review Agents

Logic analysis that static tools can't do

Layer 2 — Review Agents

Local agents: specialized and parallel

🔍 Code Optimization Agent

Performance bottlenecks, unnecessary complexity

🧪 Test Coverage Agent

Missing test cases, edge case gaps

📦 Dependency Agent

Outdated packages, license issues

🛡️ Security Agent

Auth gaps, injection risks, data exposure

Why specialize?

LLM performance degrades as context gets polluted with unrelated concerns. Each agent has a focused context window.

All agents are read-only — they report findings, humans decide what to fix.

~250k tokens per run • ~$4-5 cost • Run weekly

Layer 2 — Review Agents

CI review agents: every PR gets reviewed

              # claude-code-review.yml

              - name: Run Claude Code Review

                uses:
              anthropics/claude-code-action@v1

                with:

                  prompt: |

                    Review this PR for:

                    - Code quality and best
              practices

                    - Potential bugs or issues

                    - Performance
              considerations

                    -
              Security concerns

                    - Test coverage

                    Use
              CLAUDE.md for guidance.

Two agents per PR

General review — code quality, bugs, tests

Security review — auth, injection, data exposure

Key catch: File truncation

The CI agent regularly finds files that Claude truncated or emptied during local editing sessions. Full rewrites from memory, output token limits, context pollution — all cause data loss.

Example: PR #159

Layer 2 — Review Agents

Real catches: logic flaws static analysis misses

PR #160 Open Redirect

Redirect validation using startswith() is bypassable:

              # Bypassable!
              if not url.startswith( settings.frontend_url):
            

//evil.com and localhost.evil.com both pass

PR #141 Race Condition

TOCTOU in single-use registration codes:

              await
              validate(code)
              # Check
              # ... user creation ...
              await
              use_code(code)
              # Too late!
            

10 concurrent requests → 10 users on a max_uses=1 code

PR #152 Incomplete Auth

IDOR-adjacent: updating todo with unauthorized project_id:

              # Sets project_name to None
              # but doesn't REJECT the
              # request — silent failure
            

User associates todo with unauthorized project

The pattern: review agents catch logic flaws (wrong validation, races, incomplete auth) — not syntactic issues. That's the key differentiator from static analysis.

03

Layer 3

CI Workflows

The enforced gate that can't be bypassed

Layer 3 — CI Workflows

Multi-tool security scanning

Tool	What It Catches	Example
bandit	Python-specific anti-patterns	`eval()`, weak crypto, hardcoded passwords
safety / pip-audit	Known CVEs in dependencies	Vulnerable Flask versions
semgrep	Semantic patterns	SQL injection via f-strings
CodeQL	Advanced taint analysis	User input → `os.system()`
trivy	Container/filesystem vulns	Secrets in Docker layers

LLMs might suggest old library versions, deprecated crypto, or unsafe SQL. Multiple specialized tools provide comprehensive coverage — no single tool catches everything.

Context Management

CLAUDE.md: prevention over detection

Lives in the repo root. Teaches LLMs your patterns before they generate code.

Quick commands — prevents LLM from inventing commands
Architecture overview — enforces consistency
Security requirements — explicit guardrails
Common tasks — provides working examples
Anti-patterns — "never do this" list

Takes 30 min to write. Saves hours of LLM output correction.

              ## Response Formats

              ### Success:

              {

                "data": [...],

                "meta": { "count": 10 }

              }

              ### Errors:

              {

                "detail": {

                  "code": "AUTH_001",

                  "message": "Invalid credentials",

                  "details": { ... }

                }

              }

              ## Anti-patterns

              NEVER use f-strings for SQL queries

              NEVER commit .env files

              NEVER use pip directly (use uv)

              NEVER use Optional[str] (use str | None)

Reality Check

The cost is trivial. The risk isn't.

What it costs

Pre-commit hooks~30s/commit

CI pipeline~3-5 min/PR

Review agents (CI)$10-30/mo

Local agents (weekly)~$4-5/run

Security toolsFree / OSS

What you avoid

Credential leak incident response$$$$$

Production auth bypass$$$$$

Manual review of LLM output10x slower

Truncated/lost code in prod💀

Get Started

Adoption path: start today

Start here

Pre-commit hooks

Easiest win. 15 min setup. Catches secrets and anti-patterns immediately.

→

Highest ROI

Security review agents

Catches logic flaws static analysis can't. claude-code-action in GitHub Actions.

→

Full coverage

Comprehensive CI

Multi-tool scanning. Reusable workflows. The enforced gate.

All patterns are running in my open-source repos:

github.com/brooksmcmillin/taskmanager · agents · workflows · claude-code-agents

Questions?

Brooks McMillin · Infrastructure Security

Keyboard Shortcuts

Defense in Depth for AI-Assisted Development

Things I've caught Claude doing in the past month

Three layers, each catches what the others miss

Pre-commit Hooks

The pre-commit stack

🔑 detect-secrets

📐 pyright

🛡️ bandit

Catching LLM mistakes before commit

Demo 1: Secret Detection

Demo 2: Security Anti-pattern

Code Review Agents

Local agents: specialized and parallel

🔍 Code Optimization Agent

🧪 Test Coverage Agent

📦 Dependency Agent

🛡️ Security Agent

Why specialize?

CI review agents: every PR gets reviewed

Two agents per PR

Key catch: File truncation

Real catches: logic flaws static analysis misses

CI Workflows

Multi-tool security scanning

CLAUDE.md: prevention over detection

The cost is trivial. The risk isn't.

What it costs

What you avoid

Adoption path: start today

Pre-commit hooks

Security review agents

Comprehensive CI

Questions?