Appendix D — Architecture 2.0 Resource Directory

Author
Affiliation

Harvard John A. Paulson School of Engineering and Applied Sciences

Published

June 25, 2026

These links are not a general computer-architecture directory. A resource earns space here only if it helps an Architecture 2.0 loop name a task, represent state, expose actions, return feedback, preserve evidence, or reject a result. Use the list as a starting point and check current versions before relying on any benchmark, dataset, simulator, or tool.

D.1 Architecture 2.0 Framing

D.2 Architecture Reasoning and Design-Problem Benchmarks

  • QuArch: quarch.ai. Architecture question-answering and reasoning benchmark. Use it to test whether a model can reason over architecture concepts, but not as evidence that a loop can act through tools or reject candidates.

  • CVDP benchmark: NVlabs/cvdp_benchmark and Hugging Face dataset. Comprehensive Verilog design problems for RTL design and verification. Use it when the loop claim involves HDL generation, test harnesses, simulation feedback, or verification failures.

  • VerilogEval: NVlabs/verilog-eval. Specification-to-RTL and Verilog code-generation benchmark with executable checks. Use it for method-role claims about RTL generation and compile/test feedback, not for system-level architecture claims.

  • KernelBench: ScalingIntelligence/KernelBench. GPU-kernel generation benchmark with correctness and performance evaluation. Use it to study software-loop feedback, codegen, and performance evidence before claiming architecture-level benefit.

D.3 Architecture Environments and Design-Space Exploration

  • ArchGym: Architecture Gym. Environment interface for ML-assisted architecture design. Use it as a concrete example of actions, observations, costs, and feedback in a bounded architecture loop.

  • Timeloop: NVlabs/timeloop. Mapping, modeling, and code-generation tool for tensor workloads on accelerator architectures. Use it for dataflow, mapping, memory hierarchy, and accelerator design-space loops.

  • Accelergy: Accelergy. Energy-estimation infrastructure for accelerators. Use it when a loop needs an explicit energy feedback source and calibration boundary.

  • MAESTRO: maestro-project/maestro. Analytical cost model for DNN dataflows and tiling. Use it as a fast-feedback model whose limits must be recorded before higher commitment.

D.4 Full-System Simulation and Hardware/Software Harnesses

  • gem5: gem5 simulator. Modular computer-system simulator. Use it for architecture feedback that needs workload execution, timing behavior, and reproducible simulator state.

  • FireSim: FireSim. FPGA-accelerated full-system simulation. Use it when the loop needs stronger hardware/software feedback than a software-only proxy can provide.

  • Chipyard: Chipyard docs. Integrated framework for generating and evaluating hardware systems. Use it when the loop must connect generators, RTL, simulation, and implementation artifacts.

D.5 Physical-Design and EDA Evidence

  • OpenROAD: OpenROAD project. Open-source RTL-to-GDS flow. Use it for loops that need physical-design feedback, timing/area/power evidence, or signoff-adjacent rejection.

  • CircuitNet: CircuitNet. VLSI CAD dataset for machine-learning applications in EDA. Use it for cross-stage prediction and physical-design learning claims, while preserving tool provenance and task scope.

  • ChiPBench: AI chip-placement benchmark. Benchmark focused on end-to-end physical-design impact for AI chip placement. Use it when placement evidence must be tied to downstream physical metrics, not only intermediate scores.

D.6 Workload and Benchmark Governance

  • XRBench: XRBench paper. Extended-reality machine-learning benchmark suite. Use it as a workload anchor for mobile-XR architecture loops, including scenario definition and workload coverage questions.

  • MLCommons benchmarks: MLCommons benchmarks. Benchmark governance and reporting infrastructure. Use it as a model for workload versions, run rules, comparability, and community-maintained evidence rather than as a generic performance leaderboard.