Keyboard Shortcuts

Next slide Space
Previous slide
First slide Home
Last slide End
Speaker notes N
Fullscreen F
Open display window D
Help ?
Close overlay Esc

Defense in Depth for AI-Assisted Development

Pre-commit hooks, review agents, and CI that catch LLM mistakes

Brooks McMillin | Infrastructure Security |

Things I've caught Claude doing in the past month

LLMs optimize for task completion,
not operational safety.

The solution isn't to stop using them — it's to build defensive layers that catch dangerous code before it ships.

Three layers, each catches what the others miss

Layer What It Catches When Bypassable?
Pre-commit hooks Patterns, secrets, type errors Before commit --no-verify
Review agents Logic errors, context mismatches On PR creation Ignore comments
CI workflows Everything above + integration Before merge No ✓
01

Pre-commit Hooks

The first line of defense — block problems before they reach git

The pre-commit stack

# .pre-commit-config.yaml
repos:
# Secret detection
- repo: Yelp/detect-secrets
  hooks:
    - id: detect-secrets

# Linting + formatting
- repo: astral-sh/ruff-pre-commit
  hooks:
    - id: ruff
    - id: ruff-format

# Type checking
- repo: local
  hooks:
    - id: pyright
      entry: uv run pyright

# Security scanning
- repo: PyCQA/bandit
  hooks:
    - id: bandit

🔑 detect-secrets

Blocks commits containing credential patterns. Catches API keys LLMs paste directly into code.

📐 pyright

Enforces strict typing. LLMs are sloppy with types — this catches inconsistencies early.

🛡️ bandit

Flags insecure crypto, SQL injection risks, hardcoded passwords, eval() usage.

Live Demo

Catching LLM mistakes before commit

Demo 1: Secret Detection

Commit a file with a hardcoded API key

# LLM-generated config.py API_KEY = "sk-proj-abc123..." DB_PASSWORD = "prod_admin_2024"

→ detect-secrets blocks the commit

Demo 2: Security Anti-pattern

Commit code using eval() for config parsing

# LLM's "quick fix" for parsing config = eval(user_input)

→ bandit flags B307: eval() usage

Pre-commit hooks add ~30 seconds per commit. Easiest win you can adopt today.
02

Code Review Agents

Logic analysis that static tools can't do

Local agents: specialized and parallel

🔍 Code Optimization Agent

Performance bottlenecks, unnecessary complexity

🧪 Test Coverage Agent

Missing test cases, edge case gaps

📦 Dependency Agent

Outdated packages, license issues

🛡️ Security Agent

Auth gaps, injection risks, data exposure

Why specialize?

LLM performance degrades as context gets polluted with unrelated concerns. Each agent has a focused context window.

All agents are read-only — they report findings, humans decide what to fix.

~250k tokens per run • ~$4-5 cost • Run weekly

CI review agents: every PR gets reviewed

# claude-code-review.yml
- name: Run Claude Code Review
  uses: anthropics/claude-code-action@v1
  with:
    prompt: |
      Review this PR for:
      - Code quality and best practices
      - Potential bugs or issues
      - Performance considerations
      - Security concerns
      - Test coverage
      Use CLAUDE.md for guidance.

Two agents per PR

General review — code quality, bugs, tests

Security review — auth, injection, data exposure

Key catch: File truncation

The CI agent regularly finds files that Claude truncated or emptied during local editing sessions. Full rewrites from memory, output token limits, context pollution — all cause data loss.

Example: PR #159

Real catches: logic flaws static analysis misses

PR #160 Open Redirect

Redirect validation using startswith() is bypassable:

# Bypassable! if not url.startswith( settings.frontend_url):

//evil.com and localhost.evil.com both pass

PR #141 Race Condition

TOCTOU in single-use registration codes:

await validate(code) # Check # ... user creation ... await use_code(code) # Too late!

10 concurrent requests → 10 users on a max_uses=1 code

PR #152 Incomplete Auth

IDOR-adjacent: updating todo with unauthorized project_id:

# Sets project_name to None # but doesn't REJECT the # request — silent failure

User associates todo with unauthorized project

The pattern: review agents catch logic flaws (wrong validation, races, incomplete auth) — not syntactic issues. That's the key differentiator from static analysis.
03

CI Workflows

The enforced gate that can't be bypassed

Multi-tool security scanning

Tool What It Catches Example
bandit Python-specific anti-patterns eval(), weak crypto, hardcoded passwords
safety / pip-audit Known CVEs in dependencies Vulnerable Flask versions
semgrep Semantic patterns SQL injection via f-strings
CodeQL Advanced taint analysis User input → os.system()
trivy Container/filesystem vulns Secrets in Docker layers
LLMs might suggest old library versions, deprecated crypto, or unsafe SQL. Multiple specialized tools provide comprehensive coverage — no single tool catches everything.

CLAUDE.md: prevention over detection

Lives in the repo root. Teaches LLMs your patterns before they generate code.

  • Quick commands — prevents LLM from inventing commands
  • Architecture overview — enforces consistency
  • Security requirements — explicit guardrails
  • Common tasks — provides working examples
  • Anti-patterns — "never do this" list
Takes 30 min to write. Saves hours of LLM output correction.
## Response Formats

### Success:
{
  "data": [...],
  "meta": { "count": 10 }
}

### Errors:
{
  "detail": {
    "code": "AUTH_001",
    "message": "Invalid credentials",
    "details": { ... }
  }
}

## Anti-patterns
NEVER use f-strings for SQL queries
NEVER commit .env files
NEVER use pip directly (use uv)
NEVER use Optional[str] (use str | None)

The cost is trivial. The risk isn't.

What it costs

Pre-commit hooks~30s/commit
CI pipeline~3-5 min/PR
Review agents (CI)$10-30/mo
Local agents (weekly)~$4-5/run
Security toolsFree / OSS

What you avoid

Credential leak incident response$$$$$
Production auth bypass$$$$$
Manual review of LLM output10x slower
Truncated/lost code in prod💀

Adoption path: start today

Start here

Pre-commit hooks

Easiest win. 15 min setup. Catches secrets and anti-patterns immediately.

Highest ROI

Security review agents

Catches logic flaws static analysis can't. claude-code-action in GitHub Actions.

Full coverage

Comprehensive CI

Multi-tool scanning. Reusable workflows. The enforced gate.

All patterns are running in my open-source repos:

github.com/brooksmcmillin/taskmanager  · agents · workflows · claude-code-agents

Questions?

Brooks McMillin · Infrastructure Security