



## Hardware Evaluation of NIST PQC Round-2 Algorithms

Deepraj Soni<sup>1</sup>, Kanad Basu<sup>2</sup>, Mohammed Nabeel<sup>3</sup>, Ramesh Karri<sup>1</sup> New York University

<sup>1</sup>New York University - New York, NY, USA <sup>2</sup>University of Texas at Dallas - Dallas, Texas, USA <sup>3</sup>New York University - Abu Dhabi, Abu Dhabi, UAE

Paper: <u>https://eprint.iacr.org/2019/047.pdf</u> Website: <u>https://wp.nyu.edu/hipqccheck/</u>

# Outline

- High-Level Synthesis (HLS)
- Design Space Exploration (DSE)
- HLS-based Design flow
- Design Space Exploration example for a PQC algorithm
- Security level-2 Signature schemes comparison
- FPGA Demo
- Conclusion
- Future Work
- Acknowledgement





# Motivation for High Level Synthesis (HLS)

Two approaches: (1) Register-Transfer Level (RTL) - based implementation and (2) High Level Synthesis (HLS) - based implementation.

### **RTL-Based Implementation:**

- Better performance and less resource utilization (area overhead).
- Requires more time for implementation and verification.
- The architecture is fixed.

### **HLS-Based Implementation:**

- Might be less efficient and customizable.
  - Depends upon application, skill of the engineer, etc.
- Faster and easier implementation due to algorithmic approach.
- Easy to change the design and architecture.
  - Useful for Design-space exploration.





# **High Level Synthesis (HLS)**

**High-Level Synthesis (HLS)** is used to automatically generate RTL designs starting from a high-level specification.

- It leverages state-of-the-art compilers (e.g., GCC or LLVM).
- It implements several hardware-oriented and technology-aware optimizations.



#### HLS based design process





## **Design Space Exploration**

Design Space Exploration (DSE) refers to systematic analysis and removing of unwanted design points based on parameters of interest.

It helps to evaluate the trade-off between parameters of interest.

For IoT devices, area is the most important parameter.

For Servers, speed is the most important parameter.



#### **Design-space exploration**





# **Design Space Exploration (DSE)**

We are focusing on three parameters: Security, Time and Area.

For each algorithm security level, tens of different design points are identified.



#### **3-D Design-space exploration**









HLS-based implementation of PQC algorithms.



•



## **Examples of C code changes**

NYU





## Design Flow for PQC algorithm Part-2: Generation of RTL

- The modules/loops/functions which take more time or area is defined as critical modules/loops/functions.
- Loop Unrolling and Pipelining improves performance.
- C/RTL co-simulation for verification of Hardware.
- Final optimized RTL verified with KATs.



## HLS-based design exploration flow of PQC algorithms.

Paper: <u>https://eprint.iacr.org/2019/047.pdf</u> Website: <u>https://wp.nyu.edu/hipqccheck/</u>



## Scope of the ongoing study

- 17 KEMs.
- 9 Signature schemes.

| <sup>•</sup> Algorithm | Security<br>Level 1 | Security<br>Level 2 | Security<br>Level 3 | Security<br>Level 4 | Security<br>Level 5 | Key Generation         |
|------------------------|---------------------|---------------------|---------------------|---------------------|---------------------|------------------------|
| Crystals-Dilithium     | х                   | х                   | Х                   | Х                   |                     | Signature Generation   |
| qTESLA                 | Х                   | Х                   | Х                   |                     | Х                   | Signature Verification |
| MQDSS                  |                     | Х                   |                     | Х                   |                     | Signature Vernication  |
| LUOV                   |                     | Х                   |                     | Х                   | Х                   |                        |
| SPHINCS+               | Х                   |                     | Х                   |                     | Х                   |                        |
| PICNIC                 | Х                   |                     | Х                   |                     | Х                   |                        |
| FALCON                 | Х                   |                     | Х                   |                     | Х                   |                        |
| GeMSS                  | Х                   |                     | Х                   |                     | Х                   |                        |
| Rainbow                | х                   |                     | Х                   |                     | Х                   |                        |

10





## **Design Space Exploration (DSE) of CRYSTALS-Dilithium**



- Design-space exploration of CRYSTALS-Dilithium is normalised with baseline security level-1 LUT and latency.
- The area overhead is similar for different security level.
- The Latency increases as security strength increases.





## **DSE of CRYSTALS-Dilithium**



### **Signature Generation**

- Design-space exploration of CRYSTALS-Dilithium is normalised with baseline security level-1 LUT and latency.
- The optimization directives improves the performance and area in hardware compared to baseline implementation.
- The area overhead is similar for different security level.
- The Latency increases as security strength increases.





## **DSE of CRYSTALS-Dilithium**



Signature Verification

- Design-space exploration of CRYSTALS-Dilithium is normalised with baseline security level-1 LUT and latency.
- The difference in area and latency is less as the points are closer to each other.
- The Latency increases as security strength increases.





### Area Comparison for Security level-2 Signature Schemes



LookUp Table comparison for security level-2 of signature schemes.

14





### **Performance Comparison for Security level-2 Signature Schemes**







### **FPGA** Demo for CRYSTALS-Dilithium Signature Generation







## Conclusion

- The RTL generated by HLS can be used for hardware design of the PQC algorithm. It can be used as first implementation of hardware. Manual implementation can further improve the design.
- Other teams are focusing on software/hardware co-design and speed up or implementing some part of the design in hardware. We are focusing on complete hardware design and its evaluation.
- For Security level-2
  - o CRYSTALS-Dilithium has the best performance in signature schemes.
  - o CRYSTALS-Dilithium and MQDSS have less area while LUOV area overhead is significantly more.
- With HLS, design-space exploration is analyzed. Design-Space exploration helps to estimate performance and area of hardware architecture.





### **Future Research**

- For design-space exploration, **POWER** would be added as one more parameter.
- FPGA implementation and analysis of the PQC algorithms.
- Automate the HLS-synthesizable C generation process.
- Evaluate the hardware implementations against side-channel attacks.





### Acknowledgement

Special thanks to NIST PQC algorithm Developers for helping us realize the hardware by answering questions while implementation.

- o Dr. Nina Bindel for qTESLA.
- o Dr. Ward Beullens for LUOV.
- o Dr. Greg Zaverucha and Dr. Sebastian Ramacher for Picnic.
- o Dr. Jintai Ding and Dr. Ming-Shing Chen for RAINBOW.
- o Dr. Alessandro Barenghi for LEDAcrypt.
- o Dr. Xianhui Lu for LAC.
- o Dr. Ludovic Perret for GeMSS
- o Dr. Marc Manzano and Dr. Najwa Aaraj of DarkMatter inc. Abu Dhabi, UAE offered timely and insightful feedback (especially to explore the security-informed trade-offs) on the early drafts of the report.



