Appendix C — Resource Catalog for Architecture 2.0 Loops
This catalog is not a directory of links and it is not an endorsement list. Links change, tools age, and benchmark versions move. The stable question is what role a resource plays in the design loop. Does it provide workload state? Does it define valid actions? Does it return feedback? Does it expose evidence that can reject a candidate? Does it preserve provenance or negative traces?
The specific examples named below are a snapshot. The durable content is the role each resource plays; the current list of tools, datasets, and benchmarks is the kind of fast-moving record that belongs with the community forming around this topic, where a companion edition can keep it current without reprinting the book.
Architecture 2.0 resource. A resource is useful for Architecture 2.0 when it makes some part of the design loop explicit: task, representation, environment, method role, feedback, evidence, rejection, or human decision.
Table C.1 gives a first-pass catalog. The examples are deliberately representative, not exhaustive. A reader should use the table to ask what is missing from a loop before adding another model or tool.
| Resource family | Examples | Loop role | Watch for |
|---|---|---|---|
| Architecture corpora and QA | Paper/manual corpora, DBLP spines, QuArch-style QA and reasoning data (Prakash et al. 2025b, 2025a). | Bootstrap architecture vocabulary, concepts, and literature-grounded reasoning. | Paper text rarely preserves simulator state, failed candidates, tool logs, or review judgment. |
| Workloads and benchmarks | XRBench, MLPerf, and maintained benchmark suites (Kwon et al. 2023; Mattson et al. 2020; Reddi et al. 2020). | Define workload state, scenarios, metrics, rules, and comparability. | Coverage, drift, update policy, and proxy validity must remain visible. |
| Evaluation harnesses and environments | ArchGym-style environments, benchmark harnesses, simulator wrappers, and tool-calling APIs (Krishnan et al. 2023). | Define valid actions, observations, feedback cost, logging, and rejection behavior. | A wrapper can hide tool semantics, unsupported actions, nondeterminism, and failure modes. |
| Mapping and DSE frameworks | Timeloop and MAESTRO-style mapping/dataflow tools (Parashar et al. 2019; Kwon et al. 2019). | Make architecture search spaces and constraints explicit enough to explore. | Fast feedback is still a model; calibration, workload scope, and invalid candidates matter. |
| Compiler, autotuning, and codegen resources | AutoTVM, Ansor, MLIR, and kernel-generation benchmarks (Chen et al. 2018; Zheng et al. 2020; Lattner et al. 2020; Ouyang et al. 2025). | Connect specialized hardware ideas to executable software paths. | A kernel, schedule, or IR result is not automatically a system-level architecture result. |
| Evidence and provenance artifacts | Design-loop cards, source packets, seeds, configs, tool logs, calibration records, and negative traces. | Make claims auditable, reproducible, rejectable, and teachable. | These records are often uncodified, private, or discarded because they are not publication artifacts. |
C.1 Use The Catalog As A Loop Checklist
The catalog is most useful when it is used as a checklist. For a new Architecture 2.0 project, choose one resource for each role:
- a workload or benchmark that defines the task boundary;
- a representation that records the state the loop can read and change;
- an environment or harness that defines valid actions and observations;
- a feedback source with an explicit latency, fidelity, and cost model;
- an evidence record that preserves configurations, assumptions, and negative traces;
- a rejection rule and human decision owner.
If one of these fields is missing, the loop may still be useful, but its claim should be bounded accordingly. A paper-reading agent can help with literature triage even if it cannot act on RTL. A simulator environment can support design-space exploration even if it cannot validate timing closure. A kernel-generation benchmark can reveal code-generation capability even if it does not prove system-level efficiency.
C.2 Missing Infrastructure
The most important future resources are not only larger corpora. Architecture needs shared records of design-loop state:
- negative-trace repositories that preserve failed candidates and reasons;
- environment schemas that state actions, observations, costs, and invalid states;
- benchmark-update protocols that record drift and version changes;
- confidentiality-preserving ways to share tool traces and design reviews;
- standard design-loop cards for papers, artifacts, and class projects.
These resources would make the field more teachable and more cumulative. They would also make AI-assisted architecture work easier to evaluate, because the community could ask whether a method improved the loop rather than merely whether it produced a plausible artifact.