Appendix A — Bootstrapping an Architecture 2.0 Workflow

Author

Affiliation

Harvard John A. Paulson School of Engineering and Applied Sciences

Published

June 25, 2026

The smallest useful Architecture 2.0 workflow is not “install an agent.” It is an explicit loop with a task, representation, tool wrapper, method role, evidence standard, and human accept/reject decision. The first loop should be small enough to inspect, cheap enough to run repeatedly, and constrained enough that failure is informative.

Bootstrap loop. A bootstrap loop is the smallest credible Architecture 2.0 workflow: one bounded task, one representation, one tool interface, one method role, one evidence standard, and one accountable human decision.

The goal of this appendix is to help a reader start without overbuilding. A large agentic research harness may eventually include many tools, critics, planners, datasets, dashboards, and review steps. That is not the first move. The first move is a bounded loop that can produce a trace another architect can read.

Figure A.1 gives the bootstrap pattern. It is deliberately minimal: choose one task, one representation, one tool wrapper, one method role, one set of evidence and rejection gates, and one decision owner. If that small loop cannot reject a result, a larger version will only hide the problem.

Figure A.1: **A first Architecture 2.0 workflow should be small and rejectable:** The workflow should be bounded, logged, rejectable, and owned by an architect. The loop can grow only after the task, representation, tool interface, method role, evidence gates, and human decision point are visible.

A.1 The Silicon Playbook

If an engineer is asked how to incorporate AI into a silicon or architecture workflow, the first answer is not to pick a model. The first answer is to turn one part of the workflow into a bounded, represented, rejectable loop. AI then has a method role inside that loop. It may generate, predict, search, summarize, critique, verify, or coordinate, but it does not own the architecture commitment.

Engineer move

Start with a decision the team already has to make. Name the candidate space, the tool feedback, the evidence path, the rejection rule, and the architect who accepts or escalates the result. Then choose the narrow AI role that makes that loop cheaper, broader, faster, or more inspectable without weakening the evidence standard.

The practical question is therefore not “Where can we add AI?” It is “Which architecture decision is bottlenecked by representation, search, prediction, review, or evidence?” Table A.1 gives the compact playbook.

Table A.1: The silicon playbook starts from decisions, not tools: Each step asks what an engineer should represent, what AI role is legitimate, and what gate prevents a plausible output from becoming an unsupported architecture commitment.

Engineer move	Represent	AI role to allow	Gate before trust
Bound the decision	Workload slice, objective, non-goals, and hard constraints.	Summarizer or planner that exposes missing state.	Reject if the task cannot fail visibly.
Expose the candidate space	Knobs, legal actions, invalid states, and prior rejections.	Generator or searcher constrained to legal moves.	Reject outputs that invent actions, interfaces, or assumptions.
Wrap the feedback	Tool versions, configs, seeds, cost, latency, and failure logs.	Tool caller or coordinator.	Reject runs without provenance or negative traces.
Choose the evidence ladder	Proxy, replay, simulation, synthesis, emulation, or deployment evidence.	Predictor, surrogate, or active learner.	Escalate when proxy confidence exceeds its authority.
Review the result	Alternatives, sensitivity, uncertainty, and rejected regions.	Critic, verifier, or explanation generator.	Reject if the result cannot explain what would change the decision.
Commit or revise	Human-owned acceptance, escalation, or rollback decision.	Decision-support only.	The architect signs off; the method never owns the commitment.

This is deliberately smaller than an enterprise AI strategy. It is a silicon playbook because it treats architecture work as coupled to tools, costs, constraints, evidence, and irreversible commitments. A team can repeat it for an accelerator search, a memory-hierarchy study, a compiler/runtime option, a benchmark update, or a verification triage task, but the same rule holds: make the loop explicit before giving the method more authority.

A.2 Choose a Bounded Task

Start with a task where success and failure can be inspected. Good first tasks include a small design-space exploration, workload characterization, configuration search, benchmark generation, design review, or report critique. Avoid starting with “design a processor” or “automate the flow.” Those are too large to debug.

A bounded task has three properties. First, the input is known: a workload slice, design question, simulator configuration, benchmark version, or review packet. Second, the output is inspectable: a ranked list, plot, rejected candidate set, evidence packet, critique, or recommendation. Third, failure is useful: if the loop gives a bad answer, the trace explains whether the problem was the task, representation, tool wrapper, method, evidence, or human instruction.

For the lighthouse prompt, the first task should not be the whole mobile XR compute subsystem. A better first task might be: characterize an XRBench workload slice and produce three candidate accelerator/memory configurations with a latency-energy evidence packet and rejected alternatives. That is still hard, but it is a loop rather than a wish.

A.3 Choose a Representation

The representation is what the loop can see and change. A minimal representation may include configuration files, workload traces, simulator outputs, architecture descriptions, scripts, plots, notes, constraints, and prior rejected candidates. It does not have to be perfect. It does have to be explicit.

Write down four boundaries before running anything:

What the loop may read.
What the loop may write.
What the loop must not change.
What assumptions live outside the representation.

The last item matters. Early workflows often fail because important state is outside the loop: a simulator default, a benchmark version, an undocumented constraint, a hidden preprocessing step, a fragile script, or a human judgment that never gets recorded. Those gaps are not embarrassing. They are exactly what the bootstrap loop is meant to expose.

A.4 Wrap the Environment

An environment is more than a command that returns a number. It defines the actions the loop can take, the observations it receives, the constraints it must obey, the cost of each evaluation, and the provenance recorded for each run.

For a first wrapper, keep the interface narrow:

a small action space, such as a few tunable architecture parameters;
a fixed workload or small workload set;
explicit invalid-action checks;
one low-fidelity metric and one higher-fidelity check;
logged tool versions, seeds, configurations, and errors;
a run directory that preserves successful and failed attempts.

The wrapper should make failure visible. If a configuration does not compile, times out, violates a constraint, uses a stale benchmark, or produces an incomplete log, that result should be recorded as a negative trace rather than deleted. A first environment that records failures is more valuable than a larger environment that only reports successes.

A.5 Assign the Method Role

Do not begin by asking a model or agent to do everything. Choose one method role and make it explicit. The role might be generator, searcher, predictor, summarizer, critic, planner, tool caller, verifier, or coordinator.

The role should match the task and feedback budget. If evaluations are cheap, a search or optimization role may be reasonable. If evaluations are expensive, a critic, summarizer, or surrogate predictor may be more useful. If the representation is messy, the first useful role may be extraction and organization, not optimization. If the tool wrapper is fragile, the first role may be a verifier that checks whether runs are valid.

A useful rule is to write the method sentence before implementing the method:

This system will act as a role that takes inputs, is allowed to perform actions, receives feedback, and produces evidence for a human decision.

If that sentence cannot be completed, the loop is not ready for method work.

A.6 Write the Evidence and Rejection Rules

Before the first run, state what evidence is enough, what evidence is not enough, and what forces rejection or escalation. This is the smallest version of the trust chapter.

For example:

A proxy estimate can rank candidates but cannot justify a design conclusion.
A simulator result is valid only if the workload version, seed, configuration, and tool version are logged.
A candidate is rejected if it violates a hard constraint, fails to run, or improves one metric by worsening the architectural objective.
A result escalates to higher fidelity only after it passes the low-cost checks and preserves its assumptions.
A human architect must approve any claim that changes the commitment level of the result.

The rejection rules are not pessimism. They are what make automation useful. Without rejection, the loop can only produce artifacts. With rejection, it can produce evidence.

A.7 Fill in the Minimal Design-Loop Card

Appendix \(\ref{chap-appendix-b-design-loop-card}\) gives the full design-loop card and review rubric. For a first bootstrap pass, use the compact checklist in Table A.2. Fill it in before running the loop, then revise it after the first run.

Table A.2: The bootstrap checklist keeps the first loop auditable: A small workflow should name the task, representation, environment, feedback, evidence, rejection rule, and human decision before it runs.

Step	Output to record	Stop or revise if
Bound task	One inspectable architecture question, output type, and non-goal.	The task cannot fail in an informative way.
Representation	Files, traces, constraints, assumptions, and allowed writes.	Important state remains hidden or undocumented.
Environment	Actions, observations, invalid states, cost, logs, and tool versions.	The wrapper hides failures, provenance, or action semantics.
Method role	One explicit role: generator, searcher, critic, verifier, summarizer, or coordinator.	The method is asked to generate, verify, decide, and explain without boundaries.
Evidence rules	What counts as sufficient, insufficient, and higher-fidelity evidence.	A cheap proxy is being used as a final architectural claim.
Rejection rules	Constraint failures, invalid actions, proxy mismatch, missing logs, and escalation triggers.	Nothing in the loop can say no.
Human decision	The named architect-owned decision and commitment level.	The tool appears to own the final commitment.

After the first run, ask five questions:

Did the loop produce a trace another architect can read?
Did it preserve both successful and failed attempts?
Did the evidence match the commitment level?
Did any rejection rule fire, and was that failure informative?
What should be revised first: task, representation, environment, method, evidence rule, or human decision?

If the answer to the first question is no, do not add more agents. Make the loop visible. If the answer to the fourth question is no because nothing could reject a result, do not trust the output. Add a rejection gate. The simplest credible Architecture 2.0 workflow is not the one with the most automation. It is the one whose evidence and failure modes are visible enough to improve.