#### Protected Hardware Implementation of WAGE

Yunsi Fei<sup>1</sup> and Guang Gong<sup>2</sup> and Cheng Gongye<sup>1</sup> and Kalikinkar Mandal<sup>3</sup> and Raghvendra Rohit<sup>4</sup> and Tianhong Xu<sup>1</sup> and Yunjie Yi<sup>2</sup> and Nusa Zidaric<sup>2</sup>

<sup>1</sup>Department of Electrical and Computer Engineering, Northeastern University, USA <sup>2</sup>Department of Electrical and Computer Engineering, University of Waterloo, Canada <sup>3</sup>Faculty of Computer Science, University of New Brunswick, Canada <sup>4</sup>University of Rennes, CNRS, IRISA, France

> NIST LWC Workshop, October 21, 2020

#### Protected Hardware Implementation of WAGE

## Outline

- Introduction to WAGE
- The masking scheme for WAGE
- Hardware implementation
- Conclusion and future work

## Introduction to WAGE

- WAGE is a hardware oriented authenticated encryption scheme (128-bit security)
- (unprotected) hardware implementations of WAGE have a small footprint<sup>1</sup>
  - minimal interface: 2900 GE (STMicro 65 nm), 3290 GE (TSMC 65 nm)
  - LWC hw API<sup>2</sup> : **332 slices** (Xilinx Artix-7)
    - among 10 smallest Xilinx Artix-7 candidates
    - the datapath (CryptoCore) takes less than 30% area
- WAGE is built on top of the initialization phase of the Welch-Gong stream cipher
  - theoretical foundations from the 90's
  - WG-29 proceeded to Phase 2 of the eSTREAM competition
- WAGE can be configured as a psuedorandom bit generator WG-PRBG

<sup>&</sup>lt;sup>1</sup> comparison with other candidates is difficult due to differences in ASIC libraries and optimization levels for synthesis tools

<sup>&</sup>lt;sup>2</sup> source: latest report by CERG, GMU, https://eprint.iacr.org/2020/1207

## Introduction to WAGE

- WAGE permutation has 111 iterations of a round function
- round function: an LFSR, decimated WGPs and small Sboxes SB, round constants
- defined over  $\mathbb{F}_{2^7}$ , with internal state 259 bits (37 stage LFSR)



### Introduction to WAGE

| $\mathbb{F}_{2^7}$      | $f(x) = x^7 + x^3 + x^2 + x + 1$ , $f(\omega) = 0$ polynomial basis: $PB = \{1, \omega, \dots, \omega^6\}$                                                                                              |  |  |  |
|-------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|
| $x\in \mathbb{F}_{2^7}$ | $x = \sum_{i=0}^{6} x_i \omega^i, x_i \in \mathbb{F}_2$ vector representation: $[x]_{PB} = (x_0, x_1, x_2, x_3, x_4, x_5, x_6)$                                                                         |  |  |  |
|                         |                                                                                                                                                                                                         |  |  |  |
| LFSR                    | $fb = S_{31} \oplus S_{30} \oplus S_{26} \oplus S_{24} \oplus S_{19} \oplus S_{13} \oplus S_{12} \oplus S_8 \oplus S_6 \oplus (\omega \otimes S_0)$                                                     |  |  |  |
|                         | $\omega \otimes (x_0, x_1, x_2, x_3, x_4, x_5, x_6) \leftarrow (x_6, x_0 \oplus x_6, x_1 \oplus x_6, x_2 \oplus x_6, x_3, x_4, x_5)$                                                                    |  |  |  |
|                         |                                                                                                                                                                                                         |  |  |  |
| WGP                     | $WGP7(x^d) = x^d + (x^d + 1)^{33} + (x^d + 1)^{39} + (x^d + 1)^{41} + (x^d + 1)^{104},  d = 13$                                                                                                         |  |  |  |
|                         |                                                                                                                                                                                                         |  |  |  |
| SB Q                    | $ \left  \begin{array}{cc} Q(x_0, x_1, x_2, x_3, x_4, x_5, x_6) = (x_0 \oplus (x_2  x_3), x_1, x_2, \overline{x}_3 \oplus (x_5  x_6), x_4, \overline{x}_5 \oplus (x_2  x_4), x_6) \end{array} \right  $ |  |  |  |
| SB P                    | $P(x_0, x_1, x_2, x_3, x_4, x_5, x_6) = (x_6, x_3, x_0, x_4, x_2, x_5, x_1)$                                                                                                                            |  |  |  |
| SB R                    | $R(x_0, x_1, x_2, x_3, x_4, x_5, x_6) = (x_6, \overline{x}_3 \oplus (x_5  x_6), x_0 \oplus (x_2  x_3), x_4, x_2, \overline{x}_5 \oplus (x_2  x_4), x_1)$                                                |  |  |  |
|                         | $(x_0, x_1, x_2, x_3, x_4, x_5, x_6) \leftarrow R^5(x_0, x_1, x_2, x_3, x_4, x_5, x_6)$                                                                                                                 |  |  |  |
| SB                      | $(x_0, x_1, x_2, x_3, x_4, x_5, x_6) \leftarrow Q \ (x_0, x_1, x_2, x_3, x_4, x_5, x_6)$                                                                                                                |  |  |  |
|                         | $(x_0, x_1, x_2, x_3, x_4, x_5, x_6) \leftarrow  (\overline{x}_0, x_1, \overline{x}_2, x_3, x_4, x_5, x_6)$                                                                                             |  |  |  |
| []                      |                                                                                                                                                                                                         |  |  |  |
| state                   | $S_{24} \leftarrow S_{24} \oplus SB(S_{27}) \qquad \qquad S_5 \leftarrow S_5 \oplus SB(S_8) \qquad \qquad S_j \leftarrow S_{j+1} \text{ for}$                                                           |  |  |  |
| update                  | $S_{30} \leftarrow S_{30} \oplus SB(S_{34}) \qquad \qquad S_{11} \leftarrow S_{11} \oplus SB(S_{15}) \qquad \qquad 0 \le j \le 35$                                                                      |  |  |  |
| function                | $fb \leftarrow fb \oplus WGP(S_{36}) \oplus rc_1 \qquad S_{19} \leftarrow S_{19} \oplus WGP(S_{18}) \oplus rc_0 \qquad \qquad S_{36} \leftarrow fb$                                                     |  |  |  |
|                         |                                                                                                                                                                                                         |  |  |  |

## The masking scheme for WAGE

- masking:
  - variable x is masked with a random value r:  $x' = x \oplus r$
  - notation:  $\left[ \left[ x \right] \right] = \left( r, x' \right)$
- ${f \bullet}$  adversarial model: attacker can probe up to t intermediate variables in the circuit
- number of shares n = t + 1 (t-SNI or t-strong non interference security)
- *n*-order masking:
  - variable x is shared among n variables:  $x = x^1 \oplus x^2 \oplus \cdots \oplus x^n$
  - notation:  $[[x]] = (x^1, x^2, ..., x^n)$

• 
$$[[x]] \oplus [[y]] = (x^1 \oplus y^1, x^2 \oplus y^2, \dots, x^n \oplus y^n)$$
  
•  $[[\bar{x}]] = (\bar{x}^1, x^2, \dots, x^n)$   
•  $[[x]] [[y]]$ : use *t*-SNI secure AND gadget

#### Protected Hardware Implementation of WAGE

## The masking scheme for WAGE

n shares:

| state    | $\mathbf{S} = \mathbf{S}^1 \oplus \mathbf{S}^2 \oplus \cdots \oplus \mathbf{S}^n  \text{ where } \qquad \mathbf{S}^k = (S_{36}^k, \dots, S_0^k), \ 1 \leq k \leq n$ |  |  |
|----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| stage    | $[[S_j]] = (S_j^1, S_j^2, \dots, S_j^n)$ or bit-wise $([[x_{j,0}]], [[x_{j,1}]], \dots, [[x_{j,6}]]), 0 \le j \le 36$                                               |  |  |
|          |                                                                                                                                                                     |  |  |
| LFSR     | $fb^k = FB(\mathbf{S}^k), \ 1 \le k \le n$                                                                                                                          |  |  |
| (linear) | $\omega \otimes [[S_0]] \leftarrow ([[x_6]], [[x_0]] \oplus [[x_6]], [[x_1]] \oplus [[x_6]], [[x_2]] \oplus [[x_6]], [[x_3]], [[x_4]], [[x_5]])$                    |  |  |
|          |                                                                                                                                                                     |  |  |
| WGP      | need ANF expressions for each of the 7 output bits: $WGP \Rightarrow SecWGP$                                                                                        |  |  |
|          |                                                                                                                                                                     |  |  |
| SB       | $Q(x_0, x_1, x_2, x_3, x_4, x_5, x_6) = (x_0 \oplus (x_2 - x_3), x_1, x_2, \overline{x}_3 \oplus (x_5 - x_6), x_4, \overline{x}_5 \oplus (x_2 - x_4), x_6)$         |  |  |
| ₩        | $\Rightarrow \qquad (\dots, [[\overline{x_3}]] \oplus ([[x_5]]  [[x_6]]), \dots)$                                                                                   |  |  |
| SecSB    | use t-SNI secure AND gadgets                                                                                                                                        |  |  |
| [        |                                                                                                                                                                     |  |  |
| state    | $[[S_{24}]] \leftarrow [[S_{24}]] \oplus SecSB([[S_{27}]])  [[S_5]] \ \leftarrow [[S_5]] \ \oplus SecSB([[S_8]])$                                                   |  |  |
| update   | $[[S_{30}]] \leftarrow [[S_{30}]] \oplus SecSB([[S_{34}]]) \qquad [[S_{11}]] \leftarrow [[S_{11}]] \oplus SecSB([[S_{15}]]) \qquad shift shared state:$             |  |  |
| function | $[[tmp]] \leftarrow SecWGP([[S_{36}]]) \qquad \qquad [[S_{19}]] \leftarrow [[S_{19}]] \oplus SecWGP([[S_{18}]]) \qquad [[S_j]] \ \leftarrow [[S_{j+1}]]$            |  |  |
|          | $fb^1 \leftarrow fb^1 \oplus tmp^1 \oplus rc_1 \qquad \qquad S^1_{19} \leftarrow S^1_{19} \oplus rc_0 \qquad \qquad \text{for } 0 \leq j \leq 35$                   |  |  |
|          | $fb^k \leftarrow fb^k \oplus tmp^k, 2 \le k \le n \qquad \qquad [[S_{36}]] \leftarrow [[fb]]$                                                                       |  |  |
|          |                                                                                                                                                                     |  |  |

# The masking scheme for WAGE

n shares:

| state    | $[[S_{24}]] \leftarrow [[S_{24}]] \oplus SecSB([[S_{27}]])$ | $[[S_5]] \leftarrow [[S_5]] \oplus SecSB([[S_8]])$           |                                  |
|----------|-------------------------------------------------------------|--------------------------------------------------------------|----------------------------------|
| update   | $[[S_{30}]] \leftarrow [[S_{30}]] \oplus SecSB([[S_{34}]])$ | $[[S_{11}]] \leftarrow [[S_{11}]] \oplus SecSB([[S_{15}]])$  | shift shared state:              |
| function | $[[tmp]] \leftarrow SecWGP([[S_{36}]])$                     | $[[S_{19}]] \leftarrow [[S_{19}]] \oplus SecWGP([[S_{18}]])$ | $[[S_j]] \leftarrow [[S_{j+1}]]$ |
|          | $fb^1 \leftarrow fb^1 \oplus tmp^1 \oplus rc_1$             | $S_{19}^1 \leftarrow S_{19}^1 \oplus rc_0$                   | for $0 \le j \le 35$             |
|          | $fb^k \leftarrow fb^k \oplus tmp^k, 2 \le k \le n$          |                                                              | $[[S_{36}]] \leftarrow [[fb]]$   |

Schematic of the masked WAGE permutation for 1-order protection (n = 2):



- protected hardware implementation is built on top of original WAGE hardware
- the datapath was modified with additional:
  - state registers and LFSR feedback XOR gates (e.g.  $\mathbf{S}^1, \dots, \mathbf{S}^n$ )
  - XOR gates and MUXes for non-linear components (e.g. SBmux)
  - MUXes to support the mode (e.g. Amux<sub>0</sub>, RLmux<sub>0</sub>)



recall:

| SB           | $Q(x_0, x_1, x_2, x_3, x_4, x_5, x_6) = (x_0 \oplus (x_2 - x_3), x_1, x_2, \overline{x}_3 \oplus (x_5 - x_6), x_4, \overline{x}_5 \oplus (x_2 - x_4), x_6)$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |  |  |  |
|--------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|
| $\Downarrow$ | $\Rightarrow \qquad (\dots, [[\overline{x_3}]] \oplus ([[x_5]]  [[x_6]]), \dots)$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |  |  |  |
| SecSB        | use t-SNI secure AND gadgets                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |  |  |  |
| SecSB        | $([[x_0]], [[x_1]], [[x_2]], [[x_3]], [[x_4]], [[x_5]], [[x_6]]) \leftarrow R^5([[x_0]], [[x_1]], [[x_2]], [[x_3]], [[x_4]], [[x_5]], [[x_6]]) \\ ([[x_0]], [[x_1]], [[x_2]], [[x_3]], [[x_4]], [[x_5]], [[x_6]]) \leftarrow Q (([x_0]], [[x_1]], [[x_2]], [[x_3]], [[x_4]], [[x_5]], [[x_6]]) \\ ([x_0]], [[x_1]], [[x_2]], [[x_3]], [[x_4]], [[x_5]], [[x_6]]) \leftarrow Q (([x_0]], [[x_1]], [[x_2]], [[x_3]], [[x_4]], [[x_5]], [[x_6]]) \\ ([x_0]], [[x_1]], [[x_2]], [[x_3]], [[x_4]], [[x_5]], [[x_6]]) \leftarrow Q (([x_0]], [[x_1]], [[x_2]], [[x_3]], [[x_4]], [[x_5]], [[x_6]]) \\ ([x_0]], [[x_1]], [[x_2]], [[x_3]], [[x_4]], [[x_5]], [[x_6]]) \leftarrow Q (([x_0]], [[x_1]], [[x_2]], [[x_3]], [[x_4]], [[x_5]], [[x_6]]) \\ ([x_0]], [[x_1]], [[x_2]], [[x_1]], [[x_1]], [[x_1]], [[x_1]], [[x_1]], [[x_1]], [[x_1]], [[x_1]], [[x_2]], [[x_1]], [[x_2]], [[x_1]], [[x_2]], [[x_1]], [[x_1]], [[x_2]], [[x_1]], [[x_1]],$ |  |  |  |
|              | ([[w]]), [[w1]]), [[w2]]), [[w3]]), [[w4]]), [[w3]]), [[w3]]), ([w3])), ([w3])), ([w1])), ([w2])), ([w3])), ([w3]), ([w3])), ([w3]))), ([w3]))), ([w3]))), ([w3]))), ([w3]))), ([w3]))), ([w3]))))))))))))))))))))))))))))))))))))                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |  |  |  |

- SecMult: t-SNI secure AND gadget <sup>1</sup>
- number of random bits needed:  $\frac{n(n-1)}{2}$
- assume random bits always available
- $\bullet \ \mbox{unrolled: } [[z]] \ \mbox{computed in a single clock cycle}$
- replace each in SB with SecMult gadget
- SB also unrolled

<sup>1</sup>Barthe et al., "Strong Non-Interference and Type-Directed Higher-Order Masking", CCS'16

1: for i = 1 to n do 2:  $z^i \leftarrow x^i y^i$ 3: end for 4: for i = 1 to n do 5: for j = i + 1 to n do 6:  $r \leftarrow^{\$} \mathbb{F}_2, z^i \leftarrow z^i \oplus r$ 7:  $t \leftarrow x^i y^j, r \leftarrow r \oplus t$ 8:  $t \leftarrow x^j y^i, r \leftarrow r \oplus t$ 9:  $z^j \leftarrow z^j \oplus r$ 10: end for 11: end for

recall:

| WGP | $WGP7(x^d) = x^d + (x^d+1)^{33} + (x^d+1)^{39} + (x^d+1)^{41} + (x^d+1)^{104},  d=13$ |
|-----|---------------------------------------------------------------------------------------|
|     | need ANF expressions for each of the 7 output bits: $WGP\RightarrowSecWGP$            |

design flow:

- 1. ANF: obtain algebraic normal form for every component of the WGP output
- 2. AOXI<sup>1</sup>: synthesis tools restricted to use only AND, OR, XOR, and NOT gates
- 3. AXI: replace OR gates by AND and NOT gates, optimize
- 4. SecWGP (protected AXI): replace AND gates by SecMult gadgets

| Implementation | Area | Number of gates |    |     | 5   |
|----------------|------|-----------------|----|-----|-----|
| Approach       | [GE] | AND             | OR | XOR | INV |
| Constant array | 258  |                 |    | -   |     |
| ANF            | 958  | 1132            | -  | 439 | -   |
| AOIX           | 825  | 146             | 30 | 310 | 39  |
| AXI            | 759  | 172             | -  | 313 | 66  |

<sup>1</sup>INV is used instead of NOT to avoid confusion with number of shares n

• Hardware implementation area breakdown by components using SecMult



- Implementation area for ST Micro 65nm (post-PAR) [GE]
- Common share SecMult for SB: (x<sub>2</sub> x<sub>3</sub>) and (x<sub>2</sub> x<sub>4</sub>) common share SecMult implementation of WAGE omitted for brevity

| SB          |         |           |  |  |  |
|-------------|---------|-----------|--|--|--|
| Algorithm   | SecMult | SecMult+  |  |  |  |
|             |         | comm. sh. |  |  |  |
| Unprotected | 63      |           |  |  |  |
| n=2         | 285     | 285       |  |  |  |
| n = 3       | 626     | 715       |  |  |  |
| n = 4       | 1140    | 1275      |  |  |  |

|              | WGP     | WAGE    |
|--------------|---------|---------|
| Algorithm    | SecMult | SecMult |
| Const. array | 258     | 2900    |
| Unprotected  | 759     | 3830    |
| n = 2        | 2830    | 11177   |
| n = 3        | 6030    | 21566   |
| n = 4        | 10200   | 33985   |

## Conclusion and future work

- we presented the first high-order masking scheme of WAGE
- implementation results for ST Micro 65 nm and TSMC 65nm
- comparison with other candidates is difficult different countermeasures, ASIC libraries, optimization levels for synthesis tools
- future work:
  - iterative implementation of SecSB and SecWGP to reduce number of random bits needed per single clock cycle
  - explore tradeoffs among the throughput, hardware area, and the amount of randomness available per single clock cycle
  - exploring other countermeasures threshold, unified masked multiplication, domain oriented masking
- more details available at https://eprint.iacr.org/2020/1202 this work will also be presented in SAC2020 tomorrow
- Acknowledgements: Hardware implementations in this work are based on the original WAGE hardware. Authors would like to thank Dr. Mark Aagaard for the help with synthesis tools.

Thank You!