Agentic Airbyte — Agent plans. Crabbox runs. Airbyte moves. Evidence decides.

§01 the loop

Intent becomes evidence, one boundary at a time.

Scroll. The board is the lecture — each step lights its own lane while you read what happens, who owns it, and what crosses the boundary.

Flow board · six boundaries 1 / 6 · plan

scroll, or step through

01owner · the agent

The agent compresses intent into a job spec.

It reads the goal, repo state, schemas, previous evidence, and policy. It emits one bounded run. It does not move data.

Ingoal + policy + repo state

Outjob JSON + Crabbox command

02owner · Crabbox

Crabbox creates the execution boundary.

It finds or creates a ready worker, hydrates the repo, attaches cache, and starts the requested command under a durable run id.

Inpool id + command + artifact rules

Outworker lease + run id

03owner · profile + Crabbox

Credentials enter as scoped environment.

The prompt carries only a profile name. Crabbox resolves it into allowed variables inside the worker — never back into the model context.

Incredential_profile + allow_env

Outworker env + redacted env report

04owner · Airbyte

Airbyte moves data where the agent cannot see it.

The connector reads the source and writes the target inside the worker. Rows never pass through a prompt.

Insource ref + target ref + connector config

Outtarget writes + sync status

05owner · Crabbox + worker

The run returns evidence, not guesses.

Logs, metrics, JUnit, counts, and redacted config come back under one run id. The next decision becomes auditable.

Inexit code + reports + artifacts

Outstructured evidence bundle

06owner · the agent

The agent repairs from the failing boundary.

Finish, retry, repair, or alert. The next command changes one bounded input and keeps the same audit shape. Then the loop closes.

Inevidence + failure owner

Outfinish · retry · repair job · alert

§02 the evidence

A real run, audited four ways.

“It ran” is not proof. After the move, the worker compares source and destination — and exits non-zero unless all four checks pass. This is the actual output of the run on the receipt.

exhibit a · four parity checks

1 · Row count

PASS

Every record made it across. No drops, no duplicates.

source50,000 destination50,000

2 · Revenue sum

PASS

The decimal aggregate survives the type boundary exactly.

source$611,815.02 destination$611,815.02

3 · Per-type tally

PASS

Group-by counts agree for all five event types.

page_view29,958 = 29,958 search7,598 = 7,598 add_to_cart6,014 = 6,014 checkout3,928 = 3,928 purchase2,502 = 2,502

4 · Content SHA-256

PASS

Byte-exact: a hash of every (event_id, type, revenue) tuple, sorted. The strongest check.

sourcea82239cc…c73ebcb destinationa82239cc…c73ebcb

exhibit b · schema map

The schema crossed intact.

The worker reads the source catalog and maps each ClickHouse type to a DuckDB type before a single row moves — decimals stay decimals, datetimes stay timestamps.

column	ClickHouse · source	DuckDB · destination
event_id	UInt64	BIGINT
user_id	UInt32	BIGINT
session_id	UInt32	BIGINT
event_type	LowCardinality(String)	VARCHAR
channel	LowCardinality(String)	VARCHAR
device	LowCardinality(String)	VARCHAR
country	LowCardinality(String)	VARCHAR
url	String	VARCHAR
revenue	Decimal(12, 2)	DECIMAL(18,2)
ts	DateTime	TIMESTAMP

exhibit c · provenance

Where it ran — full provenance.

Not a laptop. A fresh, repo-defined islo microVM, captured live and then torn down.

sandbox id	019ea238-1f81-7950-80a9-1b80a5e0b556
image	docker.io/library/python:3.12
kernel	Linux 6.16.9+ · x86_64
vCPU / memory	4 vCPU · 3930 MB
compute region	ca.compute.islo.dev (Canada)
source engine	ClickHouse 26.6.1.472
destination engine	DuckDB 1.5.3
bytes read	9,450,617 (~9.0 MB)
batches	10 × 5,000 records
read / write split	0.186 s read · 0.145 s write

exhibit d · the run, phase by phase

1bootstrapClickHouse binary + Python venv into the fresh box

2boot_clickhouselocal server up, answering on HTTP

3seed50,000 deterministic events — the system of record

4discoversource catalog read, types mapped

5write_setuptyped destination table created from the mapped catalog

6syncRECORD batches out, bulk-load in — 0.332 s

7verifyfour parity checks; non-zero exit unless all pass

8analyticsqueries on the destination prove it's usable

9emitmetrics.json + STATE — evidence for the loop

Nine ::CRABBOX_PHASE:: markers split the job into steps the orchestrator can time, attach evidence to, and reason over.

exhibit e · raw log

The raw tail, unedited.

Straight from the sandbox. Full artifacts live in the repo.

airbyte-etl · /workspace/agentic-airbyte/poc · isloexit 0

::CRABBOX_PHASE::bootstrap
[e2e] installing clickhouse static binary
::CRABBOX_PHASE::boot_clickhouse
[e2e] ClickHouse up: 26.6.1.472
::CRABBOX_PHASE::seed
[seed] analytics.events ready: rows=50000 total_revenue=611815.02
::CRABBOX_PHASE::discover
{"type":"LOG","log":{"level":"INFO","message":"discovered stream 'events'"}}
::CRABBOX_PHASE::write_setup
::CRABBOX_PHASE::sync
{"type":"LOG","log":{"level":"INFO","message":"synced 5000/50000 records"}}
          … 10 batches …
{"type":"LOG","log":{"level":"INFO","message":"synced 50000/50000 records"}}
::CRABBOX_PHASE::verify
::CRABBOX_PHASE::analytics
::CRABBOX_PHASE::emit
{"type":"STATE","state":{"records_moved":50000,"status":"SUCCEEDED"}}
{"type":"LOG","log":{"message":"sync SUCCEEDED: moved 50000 rows in 0.549s (150700.2 rows/s); checks_passed=True"}}
EXIT=0

What this proves — and what it doesn't.

This run uses the Airbyte source→destination contract on a custom-connector (Airbyte CDK) path, not a full packaged connector deployment — that's what lets it run self-contained in a sandbox in under a second. What it does prove is the part that matters for agentic data movement: a goal-driven worker can be dispatched into an isolated box, move real typed data end-to-end, and return evidence strong enough — a byte-exact checksum — for a harness to trust the result and decide what to do next. The full proof appendix walks every phase.

§03 the contracts

A useful agent output is not prose.

It is three contracts: a spec the agent writes, a handoff a runner can execute, and a repair rule that keeps the proof shape intact.

ai-agent-dispatch.sh

# Goal: sync CRM accounts into the warehouse safely.

crabbox pool ensure example-org/data-movement/main/provider/linux/etl \
  --min-ready 3 --create -- --cache-volume airbyte-etl

cat > .crabbox/generated/accounts-sync.json <<'JSON'
{
  "movement": "source_to_target",
  "source_ref": "source.crm.accounts",
  "target_ref": "warehouse.analytics.accounts",
  "credential_profile": "etl-warehouse",
  "allow_env": ["AIRBYTE_*", "SOURCE_*", "TARGET_*"],
  "idempotency_key": "accounts_sync:daily",
  "retry": { "max_attempts": 2, "when": ["rate_limit", "transient_network"] },
  "validation": ["row_count", "schema_drift", "freshness"],
  "artifacts": ["reports/**", "metrics.json", "redacted-config.json"],
  "redact": ["password", "token", "secret"]
}
JSON

crabbox run --pool example-org/data-movement/main/provider/linux/etl \
  --shell 'python -m workers.airbyte_sync --config .crabbox/generated/accounts-sync.json' \
  --allow-env 'AIRBYTE_*,SOURCE_*,TARGET_*' \
  --env-from-profile etl-warehouse \
  --artifact-glob 'reports/**,metrics.json,redacted-config.json' \
  --junit reports/

crabbox results <run-id> --json
crabbox artifacts download <run-id> --out evidence/<run-id>

The spec names things. plan

The agent writes references and rules — source, target, profile name, allowlists, validation, retry, redaction, artifact globs. It never writes secret values or row payloads.

raw secrets · copied rows · prompt transcripts

The handoff is executable. lease

Crabbox receives a pool id, command, profile name, and artifact contract. It returns a run id and a bounded evidence bundle — nothing else crosses.

unbounded shell · missing run id · missing artifact capture

Repair preserves the proof shape. repair

The next run changes one bounded input tied to the failing owner. Validation, redaction, and artifact capture stay on, so every attempt stays comparable.

changing many inputs at once · retrying partial writes without idempotency

§04 the failure map

First find the owner. Then read the signal.

Failures are not mysteries — they are boundary breaks. Each class tells you where to look first and what you are allowed to change.

class	owner	signal to read	smallest repair
F1 · Planbefore lease	the agent	spec diff	fix refs, profile, validation, or retry policy — then rerun
F2 · Capacitybefore command	Crabbox	pool + lease status	fix pool capacity, image, cache, or repo hydration
F3 · Credentialsbefore sync	profile	redacted env report	fix the profile mapping or `allow_env`
F4 · Connectorduring sync	Airbyte / source	connector log	fix auth scope, schema, API limit, or cursor
F5 · Partial writeafter write	worker / target	target counts + sync state	verify idempotency before any retry
F6 · Validationafter proof	the next plan	JUnit + metrics	compile a repair job from the failing checks, or alert

triage rule — owner + signal = the one bounded input the next run is allowed to change.

§05 run it yourself

One command. Same bytes, every time.

The seed is deterministic, so the SHA-256 on the receipt is reproducible. Borrow a box, hydrate it from the repo, run the proof, tear it down.

The islo way — persistent box

Lease a sandbox, hydrate from the repo, run the proof.

your machine

islo use airbyte-etl \
  --config poc/islo.yaml \
  --source github://zozo123/agentic-airbyte \
  -- bash poc/run_e2e.sh

The crabbox way — ephemeral worker

Dispatch it as a governed run with evidence capture.

your harness

crabbox run --pool org/data-movement/main/... \
  --shell 'bash poc/run_e2e.sh' \
  --artifact-glob 'poc/reports/**' \
  --junit poc/reports/

source — run_e2e.sh · worker/etl.py · worker/seed.py · islo.yaml

The agent plans. Crabbox runs. Airbyte moves. Evidence decides.

The cast — who runs what

The agent

Crabbox

islo sandbox

Airbyte worker

The evidence

Intent becomes evidence, one boundary at a time.

The agent compresses intent into a job spec.

Crabbox creates the execution boundary.

Credentials enter as scoped environment.

Airbyte moves data where the agent cannot see it.

The run returns evidence, not guesses.

The agent repairs from the failing boundary.

A real run, audited four ways.

1 · Row count

2 · Revenue sum

3 · Per-type tally

4 · Content SHA-256

The schema crossed intact.

Where it ran — full provenance.

The raw tail, unedited.

What this proves — and what it doesn't.

A useful agent output is not prose.

The spec names things. plan

The handoff is executable. lease

Repair preserves the proof shape. repair

First find the owner. Then read the signal.

One command. Same bytes, every time.

The islo way — persistent box

The crabbox way — ephemeral worker

The loop is simple because the boundaries are hard.