Inter: running a multi-agent AI operation under real governance
Can one person direct a team of AI agents with the discipline an enterprise
would demand of any production system, and have them do real work? This project is my
working answer. The control set covers documented change management, separation of duties,
independent review, incident response, and auditability.
Jonathan Franks, CISSP, CRISC · 25+ years in IT & cybersecurity
leadership
What it is
Inter is a small fleet of AI agents with defined roles, running on separate platforms.
Together they operate a production job-search automation pipeline end to end: multi-source
aggregation, LLM-assisted scoring with grounding verification, deduplication, a database-backed
coordination layer that can wake agents on demand, and a web operations dashboard. I serve as
Release Manager, the human authority that every consequential decision routes through.
The pipeline is the workload. The governance is what I built it to prove out.
~2 hrs
task dispatched > agent woken autonomously > built & tested > independently reviewed > deployed to production
375
tests passing on a release authored end-to-end by an autonomously woken agent
<$0.01
marginal cost per production pipeline run, by deliberate model-tier and batch economics
~2 min
detection-to-revocation on a real P1 credential exposure, under a pre-written incident protocol
The governance, actually practiced
The control set is derived from a policy framework designed toward
ISO 27001, ISO 27701, ISO 42001, and SOC 2 alignment. It is scaled honestly
to a single-operator project. Where cost or scale justified a deviation, it went into a
maintained exceptions register.
Change control & release discipline
Versioned, frozen release packages with document-history requirements on every artifact.
Shipped versions are never edited in place. Deviations from this rule have happened, and each
one was caught, documented, and remediated on the record.
Separation of duties: the agent that authors a release never reviews it. A second seat
performs code review and security architecture review on every release.
Every release package ships with evidence, not assertions: a full passing test suite,
static security analysis, dependency vulnerability audit, and a secret scan, all archived
with the release. Deployments end with smoke tests, and those tests got stricter after a
live lesson taught us that a page can return 200 while everything behind it is broken.
Mandatory independent review gate on every release, performed by a
different vendor's model as a deliberate cross-vendor check. It has caught real defects
before deploy, including a credential-leak path in exception messages. Disputed findings get
verified against primary sources before acceptance, because the reviewer has also been wrong,
and that was caught too.
Security operations
P1–P5 incident classification with response timelines. Exercised on real incidents:
a credential exposure went detection > revocation > rotation > clean audit > closure
documentation inside the protocol's deadlines.
Recurring infrastructure audits mapped to framework control families, producing
findings registers, a Plan of Action & Milestones, and a precondition gate that must
pass before any internet exposure decision.
Secrets discipline born of incident lessons: metadata-only verification of
secret-bearing files, single-key extraction, no byte-level inspection. These rules are
written into a non-negotiable floor that every agent session inherits at boot.
AI-specific management (the ISO 42001 layer)
Every agent operates under an acknowledged role charter, and structural changes require
re-acknowledgment. One gateway-based agent is classified as a tool, not a seat,
with its own tasking rules, model-selection cost controls, and explicit scope fences.
Autonomous wake-on-dispatch (a database write can spawn a working agent session) shipped
with its prompt-injection risk formally accepted and documented: sender allowlists,
database-verified dispatch content, single-flight locks, cooldowns, session logging, and
periodic log review as compensating controls. Each new wake surface re-opens the acceptance.
Honest capability reporting is engineered in. Ungrounded model output gets flagged as
unverifiable at ingestion, surfaced at triage, and aggressively filtered, because some of it
proved to be hallucinated listings. Pipeline runs report degraded status instead of an
optimistic "ok".
An anti-pattern ledger records the failure modes agents actually exhibited in practice
(invented policy exceptions, misplaced log entries, overconfident timing claims). Each one
was converted into a standing check that future sessions boot against.
Architecture, briefly
Three Claude-based seats (operations/review, engineering, infrastructure) run on separate
platforms, with a Gemini-based independent reviewer and a multi-model gateway agent at tool
tier. Coordination runs on PostgreSQL with event-driven notification. When a dispatch row
addressed to an agent appears, it wakes a headless session that boots against the governance
record, does bounded work, reports back, and marks its own dispatch complete. The human-facing
layer is a Django operations dashboard with a curated triage workflow and a bounded two-way
sync to a pre-existing datastore during a controlled migration. Cost engineering is explicit:
batch APIs for bulk work at half price, model tiers matched to task difficulty, and fleet-wide
spend telemetry.
Multi-agent orchestration
PostgreSQL + event-driven wake
Django
LLM grounding verification
Batch API cost engineering
Hardened service configurations
Cross-vendor review
A technical appendix covering the coordination-layer design, the wake
function's containment model, audit methodology, and the incident timeline is available
on request.
Why it matters
Most AI-governance experience today is policy written for systems someone else runs. This
project closes that loop. The person writing the control set has to live under it while agents
do real work, fast, with real credentials and real data. The governance survived contact with
autonomous agents, and the places it bent are documented, because keeping that record honest
is the actual practice.
It also reflects how I approach the discipline professionally. Frameworks should be working
tools, not shelf-ware. Risk acceptance should be a documented decision. And I don't believe
you really own a control until you can explain the failure mode it addresses.