# **OpenROAD** Tutorial

**Open-Source ASIC Design for Computer Architects** 

Austin Rovinski Tutu Ajayi Christopher Batten

#### Presenters / Organizers



Austin Rovinski Cornell University



**Tutu Ajayi** University of Michigan



Chris Batten Cornell University

#### Audience

Computer architects looking to:

- Enhance research with accurate modeling
- Learn the basics about chip design
- Explore OpenROAD and other open-source resources
- Explore chip design techniques and algorithms
  - Improve hardware design across the stack
  - Improve EDA tools



#### **Motivation**

Why should computer architects and researchers care?

- Extending algorithms and techniques to real hardware designs
- More accurate design space exploration
- Hands on experience for job opportunities

Why choose Open Source?

- Easier collaboration using publicly available IP and kits
- Reproducibility and Apples-to-Apples comparison of new implementations
- Easily re-use publicly available flows, best practices, designs and IP cores
- Support form the open-source community
- Opportunities for free/sponsored tape-outs
- FREE!



#### Goals / Schedule

- Introduction to chip design and flow
  - Basic demonstration and illustration
- OpenROAD Tutorial
  - Overview of OpenROAD flows and abstractions
  - Demos and exercises using the OpenROAD flow
- Further discussions
  - OpenROAD limitations
  - OpenROAD roadmap?
  - How can I contribute to OpenROAD?
  - Additional information on other open-source resources



## Chip Design Flow

- 1. Design Specification and Algorithm
- 2. RTL Implementation and Simulation
- 3. Synthesis to Gate Level
- 4. Physical Implementation
- 5. Verification and Signoff



#### **Design and Flow Preparation**

- Design Preparation
  - RTL Files
  - Timing Constraints
  - Design Parameters
- Flow Setup
  - EDA Tool Setup
  - Design selection

**Synthesis** 

RTI

• Process development kit

Floorplan

Place

CTS

Route

Finish

• Standard cell libraries



Verification

GDS

#### **Design Synthesis**

- Synthesis transforms RTL to netlist
  - RTL Parsing and Design Elaboration
  - Generic Mapping

**Synthesis** 

RTL

- Generic Optimizations
- Technology Mapping
- Technology Driven Optimization
- Constraint Checking / Adherence
- OpenROAD flows leverages Yosys

Floorplan

Place

CTS

Route

- x = a'bc + a'bc' y = b'c' + ab' + ac
- x = a'b

Finish

INVX1SC(.A(a),.Z(U1)); AND2X1SC(.A(U1),.B(b),.Z(U55)); AND2X1SC(.A(U2),.B(U3),.Z(U23)); OR2X1SC(.A(U23),.B(U21),.Z(y));



Verification

GDS

#### **OpenROAD** Synthesis

# Demo



#### **OpenROAD** Synthesis

1. Perform file preprocessing (mainly for yosys)

2. Parse input files

3. Elaborate the design

4. Optimize the netlist

5. Map the generic netlist cells to technology specific cells

6. Generate Verilog netlist

17. Executing Verilog backend.



Verification

GDS

CTS

## **Design Floorplanning**

- **ASIC** Fundamentals
  - Standard Cells  $\bigcirc$
  - Standard Cell Rows  $\bigcirc$
  - Metal Stack  $\bigcirc$
  - Power Grid 0
  - Macros  $\bigcirc$
- **Design Specific** 
  - Setting Die Area Ο
  - Assigning pin locations Ο
  - Placing hard macros Ο
  - Placing "guides" for cell placement Ο



RTL

Synthesis Floorplan Place CTS Route Finish

#### **OpenROAD** Floorplanning

# Demo

Verification

GDS



#### **OpenROAD** Floorplanning

| 1. Initialize chip area              |
|--------------------------------------|
| 2. I/O pin placement                 |
| 3. Insert tapcells and endcaps       |
| 4. Generate power grid               |
| [INFO PDN-0001] Inserting grid: grid |
|                                      |

CTS

Route

Finish

Verification

GDS



Place

### **Design Placement**

- Global Placement
  - Minimize congestion and log wires
- Placement Optimizations
  - Resizing
  - Buffering
- Detail Placement
  - Overlap
  - Orientation







#### **OpenROAD** Placement

# Demo



#### **OpenROAD** Placement

#### ▼Global Placement

| 2. Nesterov gradient descent (with timing-driven weighting)                                                                                                                                                                                                                              |  |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| 3. Timing optimization and electrical rule fixing                                                                                                                                                                                                                                        |  |
| Perform port buffering<br>[INFO RSZ-0027] Inserted 35 input buffers.<br>[INFO RSZ-0028] Inserted 18 output buffers.<br>Perform buffer insertion<br>[INFO RSZ-0058] Using max wire length 661um.<br>[INFO RSZ-0039] Resized 39 instances.<br>Repair tie lo fanout<br>Repair tie hi fanout |  |

CTS

Route

Finish

Verification

GDS

Place



#### **OpenROAD** Placement

#### Detailed Placement

Synthesis

1. Optimize and legaliize placement

Detailed placement improvement.

Floorplan

2. Cell mirroring

INFO DPL-0020] Mirrored 20 instances
[INFO DPL-0021] HPWL before 2703.4 u
[INFO DPL-0022] HPWL after 2700.8 u
[INFO DPL-0023] HPWL delta -0.1 %

Place

CTS

Route

Finish

Verification

GDS

RTL

#### Clock Tree Synthesis (CTS)

- Clock trees are built and buffered
- Reducing Skew (setup/hold time)
- Inserting buffers for high fanout signals

Floorplan

Place

**CTS** 

**Synthesis** 

RTL



#### OpenROAD CTS

# Demo



sis 刘 Floorplan 🔾

Place

CTS Route

Finish

Verification

GDS

#### OpenROAD CTS

| 7. Insert filler cells                       |  |  |
|----------------------------------------------|--|--|
| [INFO DPL-0001] Placed 704 filler instances. |  |  |
| average displacement 0.0 u                   |  |  |
| max displacement 2.2 u                       |  |  |
| original HPWL 2896.4 u                       |  |  |



## **Design Routing**

- Global Routing
- Detail Routing

Synthesis

RTL

• Routing optimization/fixing

Floorplan

Place



#### **OpenROAD** Routing

# Demo



Floorplan Place

CTS **Route** 

Verification

Finish



#### **OpenROAD** Routing

1. Generate routing grid

2. Perform global routing

3. Check for antenna violations

[INFO ANT-0002] Found 0 net violations. [INFO ANT-0001] Found 0 pin violations.

Place

CTS

Finish

Route

Verification

GDS

Floorplan



#### **OpenROAD** Routing

#### . Region guery

#### Dest process quides

4 Track assignment

```
5. Detailed routing
```

```
[INFO DRT-0194] Start detail routing.
[INFO DRT-0195] Start 0th optimization iteration.
Completing 10% with 0 violations.
elapsed time = 00:00:00, memory = 96.41 (MB).
Completing 20% with 0 violations.
elapsed time = 00:00:00, memory = 96.70 (MB).
```

GDS

## **Design Finish**

- Parasitic extraction
- Timing Signoff
- Dummy Metal Fill
- Export
  - Layout (GDS)
  - Netlist (Verilog)

Synthesis

- Reports
- KLayout for GDS Export and Viewing

Floorplan

Place

CTS

Route



RTL

Finish

Verification

GDS

#### **OpenROAD** Finishing

# Demo



sis 刘 Floorplan 刘

Place

CTS > Route >





#### **OpenROAD** Finishing





#### **Design Verification**

- Design Rule Check (DRC)
- Layout vs Schematic (LVS)
- Back-annotated Simulations



Verification

GDS

Finish



CTS > Route

Place



• Ready to send to fab!



Finish

Verification

GDS



# Break

#### **OpenROAD-flow-scripts Structure**



#### Platform Configs vs. Design Configs



```
export TECH_LEF = ... Technology files
export SC LEF = ...
export LIB FILES = ...
export GDS FILES = ...
```

export CELL PAD IN SITES GLOBAL PLACEMENT ?= ... export CELL PAD IN SITES DETAIL PLACEMENT ?= ... export PLACE DENSITY ?= ... Good default

**Design files** 

parameters

export PLACE DENSITY = ... Parameter overrides

# **Debugging Common Design Problems**

#### What Do Messages Mean?

- INFO: Report data, status, or current progress
- WARNING: Unexpected situation, but tools will do best to continue
  - Designer should fix warnings or validate they are benign
- ERROR: Unexpected situation, tools cannot work around issue
- CRIT: openroad must exit immediately (rare)
  - All segfaults / asserts / crashes are bugs :)

#### **Debugging Strategy**

- Review error which caused flow to abort
- Check warnings and errors starting from beginning of flow
  - Early warnings can be cause of later errors
- Try to identify root cause of issue
  - Designer problem?
  - Tool problem?
  - Unrealistic expectations?

#### **Common Problems and Solutions**

- Utilization too high fails placement
  - Increase die area or decrease core utilization
- Utilization too high fails resizing
  - Check for proper SDC constraints
  - Check that user-generated macros have reasonable constraints (e.g. good .lib files)
- Congestion too high fails global routing
  - Try previous fixes
  - Try decreasing layer adjustment
- Congestion too high fails detail routing
  - Try previous fixes
  - Try adding cell padding to space cells further apart
  - If violations always occur on same cell(s), try marking those cells as dont\_use
- Design too small fails PDN generation
  - Try increasing design size or reducing power grid pitch

## **Common Problems and Solutions**

- Design runtime too long
  - Increase utilization if too low
  - Relax timing constraints
  - Reduce design complexity
  - Faster machine :)
- Failing setup time
  - Hard problem may just need to reduce constraints
  - Change architecture: more pipelining, reduce complexity
- Failing hold time
  - Check that user cells (e.g. SRAM) are properly constrained
  - Check design constraints are valid (SDC)
    - Designs with multiple clocks are tricky!
  - Check that your PDK has properly correlated parasitics

## Exercises 1 & 2

# Analyzing Your Design

#### **Reporting Chip Metrics – Area**

- Different area numbers mean different things
- Some metrics assume 100% utilization 70-90% more typical
- Buffering and clock tree can add significant area (20%+)
- Chip I/O (pad rings, etc.) & fab markers (fiducials, etc.) rarely accounted for
- Test interfaces can add significant area too!

|                 | Logic | SRAM | Buffers | Clock tree | Chip I/O   | Fab<br>Markers | Unutilized<br>Space |
|-----------------|-------|------|---------|------------|------------|----------------|---------------------|
| Synthesized     | 1     | 1    | Some    | ×          | Usually no | ×              | ×                   |
| Placed & Routed | 1     | 1    | 1       | 1          | Usually no | *              | *                   |
| "Die area"      | 1     | 1    | 1       | 1          | Sometimes  | Usually no     | 1                   |
| "Die size"      | 1     | 1    | 1       | 1          | 1          | 1              | 1                   |

#### **Reporting Chip Metrics – Power**

- Buffers and clock tree consume significant power (40%+)
- Chip I/O can be simulated but usually isn't
- Simulation type makes a huge difference!
  - Activity factor vs. switching activity (SAIF) vs. vector (VCD)

|                 | Logic | SRAM | Buffers | Clock tree | Chip I/O   | Supply<br>losses |
|-----------------|-------|------|---------|------------|------------|------------------|
| Synthesized     | 1     | 1    | Some    | *          | Usually no | ×                |
| Placed & Routed | 1     | 1    | 1       | 1          | Usually no | ×                |
| Real chip       | 1     | 1    | 1       | 1          | Sometimes  | *                |
| Wall power      | 1     | 1    | 1       | 1          | 1          | 1                |

### **Reporting Chip Metrics – Frequency**

- Classic synthesis can provide mediocre/poor estimates of real chip frequency
- Physical synthesis provides much better estimates
- Place & route offers excellent estimates
  - Typical, best, worst, and other modeling corners
- Real chips have a distribution of frequencies and are binned

|                    | Parasitics model | Gate timing model        | Clock tree model |  |  |
|--------------------|------------------|--------------------------|------------------|--|--|
| Synthesis          | Wire-load        | Usually "typical corner" | Ideal            |  |  |
| Physical Synthesis | Estimated        | Usually "typical corner" | Estimated        |  |  |
| Placed & Routed    | Extracted        | Usually "typical corner" | Extracted        |  |  |
| Real chip          | Binned           |                          |                  |  |  |

# Demo 2

## Exercise 3

## Exercise 4

# Demo 3

## Exercise 5

## **Limitations and Future Directions**

### OpenROAD Roadmap – Active Projects

#### • Ease of use

- Simplify install process
- Broaden OS support
- Python API, Python module
- Documentation improvements
- Improved support
  - Support and tune additional PDKs
  - Support additional technology rules
- Enhanced features
  - Hierarchical implementation
  - Universal Power Format (UPF) support
- Maintenance
  - Code cleanup and optimization

## OpenROAD Roadmap – Long-term Projects

#### • Enhanced Features

- Vector-based power calculation
- CCS timing engine
- Incremental implementation
- COPILOT: >100x improvement to tool throughput
  - Massively distributed workloads
- ML-based EDA
  - Interfaces for data collection
  - ML-guided optimization
- Education and outreach
  - Courses, tutorials, and more!

## **OpenROAD** Limitations

- Ease of use
  - Mediocre support for SystemVerilog (yosys)
  - Lack of design checking / sanity checking
- Quality of Results
  - No multi-Vt flow yet
  - No automatic clock gating yet
  - Lacking quality hierarchical implementation
    - Slow runtime on large designs
- Design features
  - Hierarchical extraction accuracy is limited
  - Analog / mixed signal support is very preliminary

## **OpenROAD** Advantages

- Accessibility
  - No license limitations / license servers!
    - Run 100s of OpenROAD instances for free
  - Access to source code for debugging / modification
  - Share and get help with tool questions (no paywalls)
- Active community
  - Updates nearly daily
  - Issues fixed and upstreamed in days, not months
  - Pull requests accepted for any useful fixes / features
- Reproducibility
  - Easy to package designs and reproduce exactly
  - Able to validate other's research

# Demo 4

#### OpenLane vs. OpenROAD-flow-scripts

- Based off OpenROAD, yosys
- Support for several PDKs
- Focus on full-chip closed-source signoff
- Make-based flow
- Supports Docker, native execution
- Maintained by OpenROAD team

- Based off OpenRoad, yosys
- Support only for sky130
- Focus full-chip open-source signoff for sky130
- Tcl-based flow
- Only supports Docker
- Maintained by Efabless

# Thank you for attending!

