Appendix B — Design-Loop Card and Review Rubric

Author
Affiliation

Harvard John A. Paulson School of Engineering and Applied Sciences

Published

June 25, 2026

The design-loop card is the practical form of the Architecture 2.0 ontology. It is meant to be filled in for a paper, a project proposal, a class exercise, or an internal design review. The card does not replace a technical report. It exposes the loop behind the report: what the system is trying to do, what it can see, what it can change, how feedback is obtained, what evidence supports a claim, what was rejected, and what judgment remains with the architect.

Design-loop card. A design-loop card is a one-page record of the task, representation, environment, method role, feedback budget, evidence, negative traces, rejection authority, and human decision behind an architecture claim.

The card should be short enough to use. If it becomes a long form, people will not fill it in. If it is too vague, it will not reveal anything. The right level is one page for a first pass and a few supporting notes for the fields where evidence is disputed.

B.1 Why a Card, Not a Paper Summary

A conventional paper summary usually asks for the problem, method, result, and limitations. That is useful, but it often hides the design loop. It may not say what simulator state was assumed, which actions were illegal, how many samples were spent, what alternatives failed, how a proxy was calibrated, or what could have rejected the result. Those omissions matter more once AI systems begin to generate candidates, call tools, choose experiments, or summarize evidence.

The design-loop card asks different questions:

  • What architectural intent is being translated into work?
  • What bounded task is the loop performing?
  • What representation and world model make the work legible?
  • What environment defines valid actions and feedback?
  • What method role is being played?
  • What is the feedback budget?
  • What evidence supports the claim?
  • What negative traces were captured?
  • What can say no?
  • What does the human architect still decide?

This makes the card useful in three settings. In research, it helps compare papers that may use different methods but operate on similar loops. In design reviews, it reveals whether a result is backed by enough evidence for the commitment being made. In teaching, it gives students a disciplined way to read Architecture 2.0 work without reducing it to a list of model names.

Figure B.1 shows the operating pattern. Fill the card, apply the review lens, and assign a readiness status. The point is not to grade the prose of a paper. The point is to expose whether the loop behind the claim is visible enough for another architect to judge.

Figure B.1: The design-loop card and rubric make loop review reusable: The card exposes loop fields, the rubric reviews evidence and rejection structure, and the status records whether the claim is ready, needs evidence, or is unsafe at its current commitment level.

B.2 The Design-Loop Card Fields

Table B.1 gives the working card. The fields are ordered to match the ontology used throughout the lecture.

Table B.1: The design-loop card names the minimum review fields: Each field asks for enough state to understand the task, representation, environment, feedback, evidence, rejection rule, and human decision.
Field Question
Intent What architectural objective is being pursued, and what constraints, non-goals, risks, or deployment assumptions matter?
Task What bounded work is the loop doing: generation, prediction, optimization, critique, verification, workload characterization, benchmark construction, or design-space exploration?
Representation What does the loop know, read, write, or assume? What state, dynamics, objectives, constraints, costs, and uncertainties are represented?
Environment What can the loop act on, observe, and measure? Which actions are invalid, expensive, nondeterministic, or irreversible?
Method role Is the method generating, predicting, optimizing, critiquing, verifying, planning, calling tools, coordinating, or combining several roles?
Feedback budget How many evaluations are available, at what latency, cost, fidelity, and sample efficiency requirement?
Evidence What supports the claim: proxy metrics, simulation, synthesis, verification, deployment telemetry, silicon data, expert review, or sensitivity analysis?
Negative traces What failed, was rejected, violated constraints, crashed tools, disappeared at higher fidelity, or was ruled out by human judgment?
Rejection authority What can say no: type checks, simulator failures, tests, formal tools, signoff, cross-tool disagreement, deployment signals, or expert review?
Human decision What remains an architect-owned judgment, and what commitment does the decision authorize?

The card deliberately includes negative traces and rejection authority. These are often missing from published artifacts, but they are essential for AI-mediated design loops. A system that only remembers successful candidates does not learn the shape of the design space. A system that cannot say what rejects a candidate has not earned architectural trust.

B.3 The Review Rubric

The review rubric asks whether each field is strong enough for the claim being made. The standard should rise with commitment. A speculative idea can survive with weak evidence if it is labeled as such. A tool recommendation, RTL change, or physical-design decision needs a stronger evidence chain.

Table B.2 makes that standard inspectable. It separates a pass signal from a warning sign so review can focus on evidence, rejection, and commitment rather than polish.

Table B.2: The review rubric separates readiness from polish: A loop is ready only when its evidence, rejection structure, and commitment level match the claim being made.
Field Pass signal Warning sign
Intent and task The task is bounded, measurable, and tied to an architectural objective. The task is “use AI” or “generate a design” without a clear decision boundary.
Representation The loop exposes the state needed to make valid architectural actions. Important constraints live in hidden scripts, defaults, or informal assumptions.
Environment Actions, observations, invalid states, costs, and provenance are defined. The tool wrapper returns numbers but hides semantics, failures, or version state.
Method role The method’s job is clear and matched to the feedback budget. The method is chosen because it is fashionable, not because the task needs it.
Feedback budget Latency, fidelity, sample count, and cost are explicit. Claims ignore simulator time, EDA cost, expert review, or license limits.
Evidence The evidence is relevant to the claim and calibrated to the commitment level. A proxy metric is treated as truth without validation or uncertainty.
Negative traces Failed candidates and rejected alternatives are captured with reasons. Only successful runs are recorded.
Rejection authority The loop states what can reject a candidate and what happens next. There is no clear way to say no to a plausible but invalid result.
Human decision Human judgment and accountability are explicit. The loop implies that the method decides, but no one owns the commitment.

The rubric is not a scoring system by default. A simple three-level annotation is often enough:

  • Ready: the field is explicit and adequate for the commitment.
  • Needs evidence: the field is plausible but underspecified.
  • Unsafe: the field is missing or inconsistent with the claim.

The most important review question is not whether every field is perfect. It is whether the loop exposes enough structure for another architect to judge, repeat, reject, or extend the work.

B.4 Paper-to-Loop Exercise

To use the card in a reading group or class, choose a paper and fill in the fields before discussing the claimed result. The exercise usually reveals one of three things.

First, some papers make the loop explicit. They name the task, action space, environment, feedback budget, and evidence chain. These papers are easier to teach and compare because their claims are grounded in a visible process.

Second, some papers have strong technical results but implicit loop structure. They may report a better Pareto point or speedup without exposing enough about the search budget, failed candidates, tool settings, or rejection rules. The card helps readers separate a useful artifact from a fully auditable loop.

Third, some papers make broad claims from narrow evidence. A method may work for one benchmark, simulator, or proxy metric but be presented as a general design method. The card reveals the mismatch between claim scope and evidence scope.

A simple classroom exercise is to assign two students the same paper. One summarizes the paper conventionally. The other fills in the design-loop card. The class then asks what the card exposed that the summary hid.

B.5 Teaching Uses across the Lecture

The card is the central teaching artifact, but each chapter gives instructors a different classroom move. Table B.3 maps the lecture into reusable exercises. The goal is not to turn every chapter into a problem set. It is to make the framework active: students should decompose a prompt, inspect a loop, identify missing state, judge evidence, and state what the architect still owns.

Table B.3: Each chapter should produce a reusable teaching artifact: The artifacts help students decompose prompts, inspect loops, identify missing state, judge evidence, and state what the architect owns.
Unit Artifact Classroom use
Ch. 1: Moonshot Lighthouse prompt decomposition. Ask students to turn one prompt phrase into architecture decisions, evidence needs, and rejection conditions.
Ch. 2: Loop pressure Design-loop pressure ledger. Have students identify which costs are candidate count, feedback latency, verification effort, software drift, or human review.
Ch. 3: Framework Ontology and design-loop card preview. Map a paper or project onto task, representation, environment, method role, feedback, evidence, and decision.
Ch. 4: Data and world models Architecture data-state checklist. Compare what a paper preserves with what a loop would need: traces, configs, provenance, failed runs, and tacit assumptions.
Ch. 5: Environments and tools Environment contract. Specify actions, observations, invalid states, costs, and rejection paths for a simulator, compiler, or benchmark harness.
Ch. 6: Method roles Method-role matrix. Classify a method as generation, prediction, optimization, critique, verification, or coordination, then ask what evidence can reject it.
Ch. 7: Feedback and trust Trust checklist. Decide whether feedback is strong enough for the claimed commitment level and where escalation is required.
Ch. 8: Running loop Lighthouse loop walkthrough. Run the opening prompt through one end-to-end loop turn, from intent and representation to evidence, rejection, and revision.
Ch. 9: Loop patterns Loop-pattern comparison. Compare a fast software loop with a high-commitment architecture loop using the same card fields.
Ch. 10: Ownership Architect-owned boundary. Ask what can be assisted, what must be owned by the architect, and who is accountable for the final commitment.
Appendix A Bootstrap recipe. Design a minimal loop for a new subfield with one task, one representation, one environment, and one rejection rule.
Appendix B Full card and rubric. Use the blank card as a paper review, project proposal, or final design-review handout.

B.6 Lighthouse Mini-Card

Table B.4 gives a deliberately incomplete mini-card for the lighthouse prompt. It is not a finished design. It shows how a short prompt becomes a loop that must be specified before any result can be trusted.

Table B.4: The lighthouse mini-card is a deliberately incomplete first pass: It shows how a short prompt becomes loop state before any generated answer should be trusted.
Field Sketch
Intent Improve efficiency for real-time mobile XR under strict power, memory, software, reliability, and deployment constraints.
Task Bounded design-space exploration for a RISC-V-based compute subsystem, initially scoped to accelerator/memory organization for an XRBench-class workload slice.
Representation Workload traces, architecture description, configurable memory/compute parameters, compiler assumptions, power model, latency targets, and uncertainty about workload drift.
Environment Simulator or cost model plus workload harness, with actions such as changing vector width, memory hierarchy parameters, accelerator tiling, voltage/frequency assumptions, or dataflow choices.
Method role Candidate generator, search/optimization method, surrogate predictor, critic for invalid assumptions, and report generator.
Feedback budget Many cheap proxy evaluations, fewer simulator evaluations, and only a small number of high-fidelity checks before human review.
Evidence Pareto comparison over latency, energy, area proxy, memory traffic, software compatibility, and sensitivity to workload assumptions.
Negative traces Configurations that violate the 3 W target, miss real-time latency, exceed memory bandwidth, require unsupported software, or fail at higher fidelity.
Rejection authority Constraint checker, simulator failure, power/thermal limit, workload QoS violation, compiler/runtime incompatibility, or architect review.
Human decision Decide whether the candidate merits deeper modeling, different representation, stronger fidelity, or rejection.

This mini-card also shows why the book does not treat the lighthouse prompt as a one-shot generation request. The prompt is useful because it exposes the state that must be represented, not because it eliminates the loop.

B.7 Common Failure Modes

The card is most useful when it reveals failures early. Common failure modes include:

  • Missing evidence: the claim is plausible, but the supporting measurement is absent, low fidelity, or unrelated to the decision.
  • No negative traces: the loop records only successful candidates, so future methods repeat known failures.
  • Hidden simulator state: defaults, flags, seeds, workload versions, and tool revisions are not recorded.
  • Proxy mismatch: the method improves a metric that does not track the architectural objective.
  • Invalid action space: the agent can propose configurations that cannot compile, simulate, synthesize, meet timing, or satisfy constraints.
  • Unsupported autonomy: the method is allowed to make decisions whose commitment level exceeds the evidence available.
  • No rejection authority: there is no explicit mechanism that can reject a plausible but wrong result.
  • Unowned commitment: the workflow obscures who accepts risk and who remains accountable for the final decision.

These are not only documentation failures. They are design-loop failures. A loop that hides negative traces, invalid actions, or rejection authority is hard to improve because it cannot distinguish a weak candidate from a weak process.

B.8 Blank Template

Table B.5 is the one-page blank form. It is sufficient for a first pass because it forces the loop owner to name the task, evidence, rejection path, and decision boundary before running the workflow.

Table B.5: The blank card provides a reusable loop template: A reader can fill it in for a paper, tool, benchmark, classroom project, or internal workflow before judging the claim.
Field Entry
Intent
Task
Representation
Environment
Method role
Feedback budget
Evidence
Negative traces
Rejection authority
Human decision

After filling in the card, ask five final questions:

  1. Is the task bounded enough that the loop can be evaluated?
  2. Is the representation sufficient for the actions the method is allowed to take?
  3. Is the feedback budget realistic for the method and claim?
  4. Does the evidence match the commitment level?
  5. What can reject the result, and who owns the final decision?

If those questions cannot be answered, the project may still be promising, but it is not yet a credible Architecture 2.0 loop.