Personal project · Case study

Inter: running a multi-agent AI operation under real governance

Can one person direct a team of AI agents with the discipline an enterprise would demand of any production system, and have them do real work? This project is my working answer. The control set covers documented change management, separation of duties, independent review, incident response, and auditability.

Jonathan Franks, CISSP, CRISC · 25+ years in IT & cybersecurity leadership

What it is

Inter is a small fleet of AI agents with defined roles, running on separate platforms. Together they operate a production job-search automation pipeline end to end: multi-source aggregation, LLM-assisted scoring with grounding verification, deduplication, a database-backed coordination layer that can wake agents on demand, and a web operations dashboard. I serve as Release Manager, the human authority that every consequential decision routes through.

The pipeline is the workload. The governance is what I built it to prove out.

~2 hrs

task dispatched > agent woken autonomously > built & tested > independently reviewed > deployed to production

375

tests passing on a release authored end-to-end by an autonomously woken agent

<$0.01

marginal cost per production pipeline run, by deliberate model-tier and batch economics

~2 min

detection-to-revocation on a real P1 credential exposure, under a pre-written incident protocol

The governance, actually practiced

The control set is derived from a policy framework designed toward ISO 27001, ISO 27701, ISO 42001, and SOC 2 alignment. It is scaled honestly to a single-operator project. Where cost or scale justified a deviation, it went into a maintained exceptions register.

Change control & release discipline

Versioned, frozen release packages with document-history requirements on every artifact. Shipped versions are never edited in place. Deviations from this rule have happened, and each one was caught, documented, and remediated on the record.
Separation of duties: the agent that authors a release never reviews it. A second seat performs code review and security architecture review on every release.
Every release package ships with evidence, not assertions: a full passing test suite, static security analysis, dependency vulnerability audit, and a secret scan, all archived with the release. Deployments end with smoke tests, and those tests got stricter after a live lesson taught us that a page can return 200 while everything behind it is broken.
Mandatory independent review gate on every release, performed by a different vendor's model as a deliberate cross-vendor check. It has caught real defects before deploy, including a credential-leak path in exception messages. Disputed findings get verified against primary sources before acceptance, because the reviewer has also been wrong, and that was caught too.

Security operations

P1–P5 incident classification with response timelines. Exercised on real incidents: a credential exposure went detection > revocation > rotation > clean audit > closure documentation inside the protocol's deadlines.
Recurring infrastructure audits mapped to framework control families, producing findings registers, a Plan of Action & Milestones, and a precondition gate that must pass before any internet exposure decision.
Secrets discipline born of incident lessons: metadata-only verification of secret-bearing files, single-key extraction, no byte-level inspection. These rules are written into a non-negotiable floor that every agent session inherits at boot.

AI-specific management (the ISO 42001 layer)

Every agent operates under an acknowledged role charter, and structural changes require re-acknowledgment. One gateway-based agent is classified as a tool, not a seat, with its own tasking rules, model-selection cost controls, and explicit scope fences.
Autonomous wake-on-dispatch (a database write can spawn a working agent session) shipped with its prompt-injection risk formally accepted and documented: sender allowlists, database-verified dispatch content, single-flight locks, cooldowns, session logging, and periodic log review as compensating controls. Each new wake surface re-opens the acceptance.
Honest capability reporting is engineered in. Ungrounded model output gets flagged as unverifiable at ingestion, surfaced at triage, and aggressively filtered, because some of it proved to be hallucinated listings. Pipeline runs report degraded status instead of an optimistic "ok".
An anti-pattern ledger records the failure modes agents actually exhibited in practice (invented policy exceptions, misplaced log entries, overconfident timing claims). Each one was converted into a standing check that future sessions boot against.

Architecture, briefly

Three Claude-based seats (operations/review, engineering, infrastructure) run on separate platforms, with a Gemini-based independent reviewer and a multi-model gateway agent at tool tier. Coordination runs on PostgreSQL with event-driven notification. When a dispatch row addressed to an agent appears, it wakes a headless session that boots against the governance record, does bounded work, reports back, and marks its own dispatch complete. The human-facing layer is a Django operations dashboard with a curated triage workflow and a bounded two-way sync to a pre-existing datastore during a controlled migration. Cost engineering is explicit: batch APIs for bulk work at half price, model tiers matched to task difficulty, and fleet-wide spend telemetry.

Multi-agent orchestration

PostgreSQL + event-driven wake

Django

LLM grounding verification

Batch API cost engineering

Hardened service configurations

Cross-vendor review

A technical appendix covering the coordination-layer design, the wake function's containment model, audit methodology, and the incident timeline is available on request.

Why it matters

Most AI-governance experience today is policy written for systems someone else runs. This project closes that loop. The person writing the control set has to live under it while agents do real work, fast, with real credentials and real data. The governance survived contact with autonomous agents, and the places it bent are documented, because keeping that record honest is the actual practice.

It also reflects how I approach the discipline professionally. Frameworks should be working tools, not shelf-ware. Risk acceptance should be a documented decision. And I don't believe you really own a control until you can explain the failure mode it addresses.