← Articles · · 5 min read

Orchestrating agents with GitHub Actions

I moved my whole autonomous coding pipeline onto GitHub Actions in two days, because Anthropic shut the door on the previous setup overnight. Notes on what changed and what I would do differently.

  • agents
  • github-actions

Getting an AI agent to write code is the easy part. The hard problem, the one most teams hit after the novelty fades, is everything around it. How do you schedule agents reliably and know when they have silently failed? How do you replicate the setup to a new project without rebuilding it? The orchestration layer is where autonomous coding pipelines actually break.

I learned that running an agent factory on IGNIO. The earlier post describes the principles that made it work. This one is about what happened when I had to move the whole thing onto GitHub Actions in two days, and what I learned in the process.

The forced migration

The factory ran through OpenClaw on my local CI machine, spawning Claude Code sessions backed by subscription tokens. OpenClaw was the orchestration layer, and it was the weak point. If the process crashed, the pipeline went silent. The audit trail was whatever I could pull out of session logs after the fact. Replicating the setup to another repo meant copying scripts manually and hoping nothing was missed.

I already knew the architecture was wrong. Then, on April 4, Anthropic made it urgent. They blocked subscription tokens from third-party tools like OpenClaw. Developers had been routing frontier AI through flat-rate subscriptions while consuming compute that should have been billed per token. Anthropic closed the arbitrage.

I bit the bullet. I moved everything to pay-per-token API keys, dropped the spawned subscription sessions, and chose GitHub Actions as the new orchestration runtime. No grey areas, no workarounds.

Why GitHub Actions, not the official action

Anthropic does ship a first-party GitHub Action, and it is excellent for interactive cases: PR review, @claude mentions, and similar conversational triggers. It is the wrong fit for long-running autonomous agents. The action is optimized for shorter interactions, and its permission model requires enumerating every allowed tool explicitly. For ninety-minute sessions with five hundred max turns doing full-issue implementation, calling the Claude Code CLI directly is simpler and gives full control. The pipeline’s real complexity is shell logic that runs around the Claude step (board scanning, priority ranking, dependency detection, GraphQL mutations), and the action does not help with any of that.

The migration took about two days. The machine running the agents did not change; my self-hosted runner is the same dev environment I always use. What changed is everything else: orchestration now lives in version-controlled workflow files inside .github/workflows/, every run is auditable through the Actions UI, and failures surface as notifications instead of silence. Copying the directory to a new repo replicates the entire pipeline.

The architecture, in one paragraph

Six workflows make up the system. The Implementer runs hourly, picks the top-priority issue on the board, and hands it to Claude Code CLI for a full session. The Fixer runs every thirty minutes, scans autoagent PRs for CI failures or review feedback, and asks Claude for a patch. The Merger runs every two hours, processes green PRs sequentially, and uses Claude to verify that review feedback was actually addressed before landing. Two supporting workflows handle the rest: Board Sync moves project cards on PR events, and Week Rollover creates the next milestone and carries forward unfinished work every Monday. The whole thing is shell, GraphQL, and Telegram. No framework, no SDK, no dependencies beyond gh, jq, and curl. The full architecture is on the project page.

Steering work with six labels

The selection logic on the Implementer’s orchestrator is the smallest piece of the system and also the one I keep getting most value from.

The board uses six priority labels, p0 through p5. The orchestrator sorts candidate issues by priority first, then by issue body length as a tiebreaker. Longer descriptions tend to produce cleaner PRs, because the agent has more constraint to work against. Vague issues fail more often, so the bot self-selects toward the issues you actually thought about. No dependency graph resolver, no topological sort, no priority queue with weighted scoring. Just labels.

Reprioritization is trivial. Change a label, the orchestrator picks it up on the next hourly run. When I tagged an unbuilt prerequisite as p0 and the rest of the phase as p1, the agent stopped getting confused by the dependency chain and started working in the right order.

What’s auditable, what’s conversational

OpenClaw stayed in the picture as the conversational channel, now running on API keys. The split turned out to be more than a workaround. It maps to a real distinction in how agent work breaks down.

Some work is inherently conversational. One evening I asked my agent to disable all the agentic workflows on a repo, comment out the cron triggers, push directly to main, then create a new week-rollover workflow. That required reading each file, deciding what to comment versus delete, and pushing multiple commits. Some edge cases needed judgment, like the fact that the Copilot workflow could not be disabled via API. A ten-minute back-and-forth with a decision at every step. You would not put that in a workflow file. It needs context and judgment, including the ability to ask “wait, should I also disable the cron jobs on the OpenClaw side?”

So I run two systems: GitHub Actions for autonomous repeatable work (issue implementation, CI fixes, board and sprint management) and Telegram for interactive contextual work (elaborating issues, quick reactions, cross-project coordination). It is the right split. Some work needs autonomy, some needs conversation, and forcing both into one system makes both worse.

The numbers

After several weeks running the GitHub Actions version: 240 issues implemented, 232 PRs opened, and a 95% success rate, where “success” means merged without significant manual intervention. The five percent that needed help were usually issues with ambiguous requirements or edge cases that required domain knowledge the agent did not have. Writing better issue descriptions directly raises the success rate.

What I would do differently

Three things stand out from running this at scale.

Write better issue descriptions from the start. The number-one predictor of autonomous implementation success is the quality of the issue. Vague issues produce vague PRs. Issues with clear acceptance criteria and references to existing code produce clean implementations. Every hour spent on the issue description saves three on the PR review.

Start with the Fixer, not the Implementer. The Fixer delivers value immediately on any project with CI; it just watches for failures and pushes patches. The Implementer needs a project board, priority labels, and an orchestrator before it can do anything. If I were starting over, I would deploy the Fixer first and add the rest incrementally.

Watch token usage. Running Opus with five hundred max turns is expensive. Track API costs per workflow run. Sonnet works fine for straightforward issues and costs a fraction of what Opus does. A model dropdown on the workflow inputs lets you choose per run, and that switch alone changed the monthly cost more than any optimization in the pipeline.

The takeaway

You do not need a complex framework. GitHub Actions and Claude Code with a well-structured prompt is enough to build a full autonomous development pipeline. The orchestration is shell scripts and GraphQL queries. The intelligence comes from the model, not the framework.

When the platform you were using changes its rules overnight, having your agent infrastructure on GitHub’s own rails means you are standing on solid ground.