Ko must explicitly provide or approve the stream before task work begins.
If no stream or area was provided, stop and ask. You may suggest likely streams, but never choose one silently. Default only when already established by Ko or steering:Whole Task Flow
This skill owns the complete task lifecycle:task.pr must promote the task into the stream and return the stream review PR unless Ko explicitly asks for task-only mode.
Canonical flow:
OS Tool Surface
The OS app exposes exactly two MCP tools:task.start returns data.taskSession.
Treat that exact value as the task handle for the rest of the task:
taskSession at the top level of every task-scoped os.call:
task.pin- root
.task/current.json - current branch
- ambient task selection
taskSession inside input.
The server propagates the top-level taskSession into the facade. input.taskSession exists only for compatibility and must match the top-level value if both are present.
Branch names still matter for GitHub, PRs, logs, and debugging. Normal OS calls should be task-session scoped instead of branch-threaded.
Phase 1 — Orient in the Stream
Run stream context before starting task work:stream.list first only when the correct stream area is unknown:
- stream branch
- current open task PRs
- recent stream commits
- current worktrees
- stream ahead/behind state
- any obvious active work that could conflict
Phase 2 — Create the Task
Create one focused task branch:startFrom: "stream" only when Ko explicitly says the task should stack on current stream work, or when the task is a direct follow-up to unshipped stream changes.
Capture these fields from the result:
data.taskSessiondata.branchordata.taskBranchdata.worktreePathdata.prUrl
taskSession for later task-scoped calls:
task.pr results to inspect PR state.
Use task.prs only if that tool is present in the current manifest.
For diffs, use the workspace/GitHub tool surface where available. Only fall back to task.exec for git diff when there is not yet a typed tool that exposes the exact local diff you need.
Task Session Handling — Canonical Task Context
taskSession is the canonical handle for task-scoped work.
When task.start returns:
VALIDATION_ERROR.
If a task-scoped call returns TASK_SESSION_REQUIRED or TASK_SESSION_NOT_FOUND, first check that the exact taskSession returned by task.start was passed at the top level. Do not switch to branch-threading or root task metadata as the default recovery path.
Inside code.run and batch, pass taskSession on the outer os.call. Nested workspace.* calls inherit task context.
Non-Negotiable: Scoped Workpad Writes
Every task must keep its task-local scoped workpad current enough for another agent to continue without chat history. The workpad is the durable task record. Chat, terminal output, traces, and memory are supporting evidence; the scoped workpad is where the task’s current truth belongs. The active task workpad lives at:-
Immediately after
task.start- acceptance criteria
- plan
- initial assumptions
Test-first contractstub
-
Before any meaningful production code edit
- behavior under test
- existing local test pattern to follow
- new or changed tests
- focused red command
- expected red failure
- no-test waiver, only when genuinely appropriate
-
After the focused red run
- red command
- red result
- meaningful failure signal
-
Before
task.push,task.pr, ortask.finish- final summary
- files changed
- key decisions
- green evidence
- broader validation evidence
- issues encountered
- a blocker appears
- the plan changes
- context reveals prior decisions
- validation fails
- metadata conflicts appear
- a tooling gap is discovered
Temporary and Smoke-Test Task PRs
Temporary validation tasks must be obvious from the title, branch, and PR. Use this naming format:- temporary task PR link
- whether it was closed, merged, or intentionally left open
- reason if left open
- any cleanup that remains
Phase 3 — Research Through the Decision Engine and Context
Use the decision engine and project context before direct symbol hunting, and return to them whenever new uncertainty appears. Context and exploration are not one-time kickoff steps; they are tools for staying aligned throughout the task.exploit when the path is clear enough to commit to an editing target:
fs.search only after the decision engine has narrowed the direction:
Confidence Guidance
| Score | Action |
|---|---|
< 0.55 | Continue evidence gathering |
0.55–0.75 | Read at least one more high-value implementation file or test |
>= 0.75 | Proceed when acceptance criteria and edit target are clear |
Context Is a Live Memory Layer
Usecontext.search throughout the task, not only at kickoff.
The workspace context store contains prior handoffs, workpads, task notes, decisions, failures, and session memory. Treat it as project memory.
It often explains:
- why a workflow exists
- why a previous agent chose a path
- which failures already happened
- task kickoff
- planning
- implementation when a pattern seems non-obvious
- validation failures
- repeated tool failures
- merge or metadata conflicts
- surprising test behavior
- production/debugging investigations
- final handoff or workpad writing
- a tool fails in a surprising way
- tests fail for unclear reasons
- the code disagrees with the plan
- an old handoff gives shell commands
- a branch or PR state looks stale
- another agent’s work appears nearby
- a repeated workaround starts to look like a missing tool
context.search results directly when they contain enough detail. If the result points to a handoff, workpad, or file, read that source through the matching typed workspace tool.
Use short, strong keywords instead of long sentence searches.
Good keywords:
explorefinds where to look in the repo.context.searchfinds what the team already learned.fs.readverifies what the code says now.- tests/logs/runtime prove what actually happens.
Phase 4 — Use code.run for Semantic Workspace Workflows
Use code.run as the default tool for semantic workspace work: one task that requires several related workspace operations.
A semantic workflow is work where the next step depends on what the previous step found, or where the result needs to be summarized before deciding what to do next.
Use code.run for:
- investigate a failure
- search → read → decide
- read → edit → reread
- edit → validate
- inspect task/PR state and explain what remains
- run a focused validation and return only the important output
- coordinate several typed workspace tools in one pass
- avoid long shell strings, heredocs, and repeated tool-call chatter
code.run, prefer the typed workspace.* helper surface.
Pass taskSession on the outer os.call; nested calls inherit the task context.
mode: "read"for investigationmode: "edit"when writes may happenmode: "verify"for validation orchestration
code.run is a composer, not a policy bypass. It must use the same typed workspace tools the agent would call directly.
The underlying tools still own:
- schemas
- task scoping
- branch/worktree routing
- durable-action boundaries
- trace IDs
- review gates
code.run for final durable transitions such as:
task.pushtask.pr- stream merges
- deploys
- publishing
- destructive cleanup
os.call operations so the state transition is visible.
Phase 4b — Use batch Only for Fixed Independent Lists
Use batch when the steps are already known and do not require reasoning between results.
Good batch uses:
- read several known files
- run several read-only inspections
- collect status + doctor + context output
- execute a fixed mechanical checklist where later steps do not depend on earlier outputs
code.run instead when:
- later steps depend on earlier results
- output needs trimming or summarizing
- the workflow needs branching logic
- a failure should change what happens next
- edits and rereads happen together
batch shape:
batch, pass taskSession on the outer os.call. Child steps inherit task context.
Do not pass raw branch state unless debugging task-session routing.
Batch must preserve guardrails. The server recursively inspects child steps before execution.
Phase 5 — Implement With Typed Tools First
Make changes only inside the task worktree. Prefer typed workspace tools andcode.run over raw shell commands.
Tool preference order:
context.search,explore,decideNext, andconfidenceScorefor discovery and prior context.code.runfor semantic workflows that compose multiple typed tools.fs.read,fs.search,fs.list,fs.patch,fs.write, andfs.trashfor exact file work.batchfor independent read-only calls or fixed mechanical checklists.git.difffor structured diff inspection after edits.status,audit,review.run,verify,task.push,task.pr, andtask.mergefor known workflows.githubfor GitHub/PR state; currentghonly as a temporary fallback.task.execonly for focused package/test/build commands with no typed equivalent.
- acceptance criteria
- implementation plan
- files changed
- key decisions
- notes for Ko
- improvements noticed
- errors or blockers
- validation commands and results
- context searched and relevant prior handoffs/workpads found
- search context before guessing about prior decisions
- use
explorebefore broad file search - read relevant policy/control files fully
- use
decideNextwhen the next step is unclear - patch only after the edit target is supported by evidence
- reread changed ranges after writing
- run the smallest meaningful test first
- run review/verify gates as appropriate
- record surprising failures in the workpad
git.diff.
Use summary-first diff inspection:
base:
patch: true only when the actual changed lines are needed:
git diff through task.exec unless git.diff cannot express the needed view. Repeated fallback diff usage is a tooling gap.
For detailed diff review, prefer bounded output. Do not return giant diffs into chat.
Dev Tooling First for Service-Backed Workflows
When a task needs local infrastructure, prefer the dev workspace tool surface instead of hand-written setup. Use dev for:- starting/stopping Postgres and Redis
- verifying
pgvectoror required extensions - exporting Keychain-backed credentials into the server process
- starting app services with the correct env
- starting public tunnels for webhooks/callbacks
- checking public callback reachability
- running scenario modes
- inspecting Redis locks safely
- collecting transcripts and runtime logs
Phase 6 — Validate Before Publishing
Run validation that matches the change. For syntax-level changes:noReview, noDb, or focused validation only when the reason is explicit and recorded in the workpad.
Do not publish based on vibes. Fix failures in scope. Stop and escalate when the failure requires product, architecture, destructive, or cross-task judgment.
Run Development E2E Validation for Behavior Changes
For product behavior changes, focused unit tests are not enough. The agent must run the smallest meaningful end-to-end path that proves the changed behavior in a realistic development environment. Use the dev workspace tooling when available. Treat it as the standard production-like local environment for behavior validation. Minimum dev tooling capabilities:- Postgres
- Redis
- Twilio live/test credential loading
- safe test number allowlists
- public callback tunnel support
- server start/restart with env loaded
- callback route reachability checks
- scenario/test-number execution
- lock inspection
- transcript/log artifact capture
- focused unit/integration tests
- local service-backed E2E through dev
- mock scenario where applicable
- live/test-provider scenario where applicable
- production/deployed log validation when the change affects production runtime
Phase 7 — Prove the Change
Behavior Proof Standards
For behavior changes:- run the exact script, tool, endpoint, UI flow, scenario, or E2E path
- use dev tooling for service-backed workflows when available
- inspect state/logs/callbacks after the run
- then use
confirmwhen it adds evidence
- typecheck
- focused tests
- review/verify
- run the actual endpoint, script, workflow, UI flow, or scenario
- inspect logs/state after the run
- verify database/Redis/job/callback side effects where relevant
- use mock mode first
- use provider test mode where it proves request construction or deterministic provider errors
- use live mode only with explicit safe allowlists and required credentials
- verify callbacks/webhooks through reachable public URLs before placing live calls
- confirm cleanup: no orphaned jobs, calls, locks, sessions, or stale state
- request entered the intended app contract
- backend service handled the request
- external provider was called only when intended
- callback/webhook returned to the app
- terminal state was recorded
- locks/resources were released
- logs/transcripts redact secrets and full phone numbers
| Task start point | Review/verify base |
|---|---|
startFrom: "main" | origin/main |
startFrom: "stream" | origin/stream/(area) |
startFrom is main, validate/review against origin/main.
If startFrom is stream, validate/review against origin/stream/(area).
Use the relevant truth source. Confirmation should produce evidence. A syntax check alone is not enough for a behavior change.
Phase 8 — Publish Task Branch Updates
task.push sends current task changes to the remote task branch through the workspace publish path.
Typical call:
ok === truecode === "OK"- returned branch matches the active task branch
- remote update succeeded
- changed file set matches task intent
stderrhas no outcome-changing warning
- uses
taskSession/ task metadata for branch context - supports full changed set or explicit file selection
- can run multiple times before PR promotion
- rejects invalid or ambiguous task context
- preserves review metadata on GitHub
Phase 9 — Promote Task to Stream and Create the Stream Review PR
Run the full PR promotion flow:task.pr returns only a task PR, treat that as incomplete unless taskOnly: true was explicitly requested. Inspect the result, run task.prs if that tool exists, and continue until the stream review PR exists or a real blocker is identified.
taskOnly is an explicit stop mode. Use it only when Ko explicitly asks to stop after creating or refreshing the task PR.
Required checks:
- task PR exists or was created
- task PR targets
stream/(area) - task merge into stream succeeds when applicable
- stream review PR exists, is reused, or is updated
- stream review PR targets
main - returned payload includes Graphite stream PR number and URL
Phase 10 — Handle Metadata Conflicts Safely
Task metadata belongs on GitHub because it gives review context and agent traceability. The current durable metadata model is scoped. Primary task metadata:Phase 11 — Ship the Stream to Main When Explicitly Asked
When Ko asks to ship the stream, merge the stream review PR:stream.sync reports metadata-only conflicts, resolve by policy.
If it reports code/docs/config/test conflicts, stop and ask Ko or resolve only with clear evidence.
After the stream PR merges, tell Ko to pull and restart the server when the changed code affects the local MCP server or workspace scripts.
Use typed workspace server tooling when available:
os.call.
Phase 12 — Finish and Clean Up Only When Safe
Usetask.finish only when cleanup is safe:
- task branch is confirmed merged into the stream
- stream has been shipped or Ko confirms local cleanup is safe
- cleanup target matches the active task branch
- worktree cleanup succeeds
- local branch cleanup succeeds when applicable
- associated task tmux session is closed
- result confirms completion
- there are no pending cleanup warnings
Final Report Contract
Before reporting done, confirm:task.pushreturnedok: truetask.prreturnedok: true- stream review PR URL was captured
- task PR exists when applicable
- task PR targets
stream/(area) - stream review PR targets
main - Graphite stream PR link is included for reviewable work
- temporary/smoke task PRs are closed or explicitly accounted for
- metadata-only conflicts were resolved by policy
- mixed conflicts were escalated or resolved with evidence
- workpad was updated before publish with final decisions, files changed, validation evidence, and issues encountered
git.diffwas used after edits or the reason it was skipped is stated- validation commands match the actual change
stderrcontains no outcome-changing warnings
- close unneeded temporary task PRs
- leave a comment or workpad note with the evidence captured
- do not promote temporary PRs into the stream unless Ko asked for the temp work to become durable
- do not leave temporary PRs open without naming the reason
tmp((stream-area)): ... in the title so Ko can identify them quickly.
Failure Policy
When any publish step returns non-OK:- Stop the pipeline.
- Record the failed command, result code, message, and trace ID in the scoped workpad.
- Fix the root cause inside the current task scope.
- Rerun from the failed step.
- Continue only after the returned result is verified.
Stale Scoped Metadata Is Not Active Task Truth
Completed tasks may leave scoped metadata on main, such as:status may report this as staleTask when the metadata belongs to a different branch than the current checkout.
Do not treat staleTask as active context. It is historical metadata.
Continue using the explicit taskSession returned by task.start.
Only repair metadata when it affects the active task session, the active task worktree, or the current publish/merge operation.
Decision Workflow Summary
Use the decision engine to move from broad uncertainty to evidence-backed action. Normal loop:exploreto get candidate files and graph context.decideNextto choose the next highest-value action.fs.readthe recommended file or section.- Mark evidence when useful with
decideNextinput such asmarkRead,markRelevant, ormarkIrrelevant. - Run
confidenceScore. - Repeat until confidence is high enough to exploit.
- Run
exploitto commit to the implementation path. - After implementation, use real validation plus
confirmwhen useful.
exploreis retrieval, not proof.decideNextis the policy layer.confidenceScoreis an evidence check, not permission to skip tests.exploitmeans “stop wandering; edit this path.”confirmmeans “belief meets reality.”auditis for tool/docs/index drift, not general confirmation.
Explore Is a Discovery Command
Useexplore anywhere you would otherwise start guessing paths, grepping broadly, or asking “where is this implemented?”
Treat it as a workspace navigation primitive alongside fs.read and fs.search.
explorefinds likely files, symbols, tests, docs, and related implementation paths.fs.readverifies actual content.fs.searchfollows up with exact targeted symbol/string lookup after direction is narrowed.decideNextdecides what evidence/action should come next.confirmproves behavior against reality.
explore.
Use it early, especially before broad fs.search or raw shell search.
Hard Rules
- decision engine is the default discovery method
fs.searchis targeted follow-up, not the first move- confidence comes from evidence, not retrieval score
- read files before editing
- run tests or smoke checks that match the behavior changed
- report stream PRs, not only task PRs
- context and decision engine can be used at any point
- if root
.task/current.json,.task/session.json,.task/workpad.md, or.task/verify.jsonappears, treat it as legacy/stale unless the task explicitly concerns migration compatibility - active task truth comes from
taskSessionand scoped.task/(area)/(task-slug)/...metadata
Finish the Task or Name the Real Blocker
Do not stop at the first tool failure when the user asked for a shippable change. Tool failures are work to diagnose, not completion states. For any requested code, docs, workflow, or repo change, continue until exactly one of these terminal states is true:- The change is merged to the requested target branch and local state is updated when requested.
- The change is pushed to a review PR and the user explicitly asked to stop at review.
- A real blocker remains after recovery attempts, and the blocker is named with exact evidence.
- Read the structured error envelope.
- Identify whether the failure is input shape, timeout budget, task-session resolution, stale metadata, merge conflict, dirty worktree, safety filtering, missing dependency, or external service state.
- Retry once with the smallest corrected workspace call.
- If the same class of error repeats, switch to the next workspace-supported path.
- If fallback tooling is required, state why the workspace facade could not complete the operation and keep the fallback scoped to the task worktree.
- Continue toward ship/review after recovery.
- target branch or PR
- commit SHA or merge SHA
- files changed
- validation run
- local state if the user requested