# Efficient Cryptographic Computation for Real-World Programs and People Advancing Algorithms and Systems Yibin Yang ### Applied Cryptographer Zero-Knowledge Proof Secure Multi-Party Computation Patient data ## Zero-Knowledge Proof (ZKP) [GMR85] #### Diagnostic model #### Diagnostic model ### Secure Multi-Party Computation (MPC) [Yao86] # ZKP and MPC are *Generic*: Being capable of *any* function # ZKP and MPC are *Generic*: Being capable of *any* function ### Why Zero-Knowledge Proofs Will Shape The Future Of Data Privacy Zero-knowledge proofs have enormous potential to improve data privacy protocols in ways that benefit both organizations and individuals. They... Oct 31, 2024 Why Zero-Knowledge Proofs Will Shape The Future Of Data Privacy Zero-knowledge proofs have enorr ways that benefit both organizatio Oct 31, 2024 Secure Multiparty Computation (SMPC) Market to Reach \$1.64 Billion by 2031 As Revealed In New Report The secure multiparty computation (SMPC) market revolves around technologies enabling multiple parties to compute a function collaboratively without revealing... Nov 25, 2024 ### Why Zero-Knowledge Proofs Will Shape The Future Of Data Privacy Zero-knowledge proofs have enorr ways that benefit both organizatio Oct 31, 2024 WhaTech #### Secure Multiparty Computation (SMPC) Market to Reach \$1.64 Billion by 2031 As Revealed In New Report The secure multiparty computation Cointelegraph Nov 25, 2024 #### enabling multiple parties to comp European Central Bank is exploring blockchain and MPC technology The central bank has been experimenting with multiparty computation, which could support the entire European economy in the future. Jul 10, 2024 Why Zero-Knowledge Proofs Will Shape The Future Of Data **Privacy** Zero-knowledge proofs have enorr ways that benefit both organizatio Oct 31, 2024 WhaTech Secure Multiparty Computation (SMPC) Market to Reach \$1.64 Billion by 2031 As Revealed In New Report The secure multiparty computation Cointelegraph Nov 25, 2024 enabling multiple parties to comp European Central Bank is exploring blockchain and MPC technology Jul 10, 2024 # ZKP and MPC deployments are rare ## Generic Toolchains: Being capable of *any real-world* computation ## Generic Toolchains: Being capable of *any real-world* computation ## Generic Toolchains: Being capable of *any real-world* computation Efficient # Generic Toolchains: Being capable of *any real-world* computation ## My Research Focus **Generic Toolchains:** Being capable of any real-world computation Compile Crypto 8 Secure Efficient ## My Research Focus # Do We Already Have Such a VM? # Do We Already Have Such a VM? # Do We Already Have Such a VM? Existing Generic Methods: Being capable of *any* computation ``` void merge(int arr[], int l, int m, int r) int i, j, k, n1 = m - l + 1, n2 = r - m; int L[n1], R[n2]; // Copy data to temp arrays L[] and R[] for (\underline{i} = 0; i < n1; i++) L[i] = arr[l + i]; for (j = 0; j < n2; j++) R[j] = arr[m + 1 + j]; // Merge the temp arrays back into arr[l..r] i = 0, j = 0, k = 1; while (i < n1 && j < n2) {</pre> if (L[i] <= R[j]) {... else { … k++; // Copy the remaining elements of L[] if there are any while (i < n1) { … // Copy the remaining elements of R[] if there are any while (j < n2) { … ``` ``` void merge(int arr[], int l, int m, int r) int i, j, k, n1 = m - l + 1, n2 = r - m; int L[n1], R[n2]; // Copy data to temp arrays L[] and R[] for (\underline{i} = 0; i < n1; i++) L[i] = arr[l + i]; for (j = 0; j < n2; j++) R[j] = arr[m + 1 + j]; // Merge the temp arrays back into arr[l..r] i = 0, j = 0, k = 1; while (i < n1 && j < n2) {</pre> if (L[i] <= R[j]) {... else {-- k++; // Copy the remaining elements of L[] if there are any while (i < n1) { … // Copy the remaining elements of R[] if there are any while (j < n2) { … ``` #### My Generic Toolchains: Being capable of *any real-world* computation Branching if $C_0$ else $C_1$ ``` void merge(int arr[], int l, int m, int r) int i, j, k, n1 = m - l + 1, n2 = r - m; int L[n1], R[n2]; // Copy data to temp arrays L[] and R[] for (\underline{i} = 0; [i < n1; i++) L[i] = arr[l + i]; for (j = 0; j < n2; j++) R[j] = arr[m + 1 + j]; // Merge the temp arrays back into arr[l..r] i = 0, j = 0, k = 1; while (i < n1 && j < n2) { if (L[i] <= R[j]) { - else { … k++; Copy the remaining elements of L[] if there are any while (i < n1) { … // Copy the remaining elements of R[] if there are any while (j < n2) { … ``` ### Branching if $C_0$ else $C_1$ $$C = (1 - b)C_0 + bC_1 | C | \approx |C_0| + |C_1|$$ ``` void merge(int arr[], int l, int m, int r) int i, j, k, n1 = m - l + 1, n2 = r - m; int L[n1], R[n2]; // Copy data to temp arrays L[] and R[] for (\underline{i} = 0; [i < n1; i++) L[i] = arr[l + i]; for (j = 0; | j < n2; | j++) R[j] = arr[m + 1 + j]; // Merge the temp arrays back into arr[l..r] i = 0, j = 0, k = 1; while (i < n1 && j < n2) { if (L[i] <= R[j]) {- else { … k++; Copy the remaining elements of L[] if there are any while (i < n1) { … // Copy the remaining elements of R[] if there are any while (j < n2) { … ``` Branching if $C_0$ else $C_1$ $$C = (1 - b)C_0 + bC_1 | C | \approx |C_0| + |C_1|$$ Memory M[i] ``` void merge(int arr[], int l, int m, int r) int i, j, k, n1 = m - l + 1, n2 = r - m; int L[n1], R[n2]; // Copy data to temp arrays L[] and R[] for (\underline{i} = 0; i < n1; i++)[L[i] = arr[l + i]; for (j = 0; j < n2; j++) R[j] = arr[m + 1 + j]; // Merge the temp arrays back into arr[l..r] i = 0, j = 0, k = 1; while (i < n1 && j < n2) {</pre> if (L[i] <= R[j]) {...</pre> else { … k++; // Copy the remaining elements of L[] if there are any while (i < n1) { … // Copy the remaining elements of R[] if there are any while (j < n2) { … ``` ### Branching if $C_0$ else $C_1$ $$C = (1 - b)C_0 + bC_1 | C | \approx |C_0| + |C_1|$$ ### Memory M[i] $$C = \sum_{j=1}^{N} (i \stackrel{?}{=} j) \cdot M[j] \qquad |C| \approx N$$ ``` void merge(int arr[], int l, int m, int r) int i, j, k, n1 = m - l + 1, n2 = r - m; int L[n1], R[n2]; // Copy data to temp arrays L[] and R[] for (\underline{i} = 0; i < n1; i++)[L[i] = arr[l + i]; for (j = 0; j < n2; j++) R[j] = arr[m + 1 + j]; // Merge the temp arrays back into arr[l..r] i = 0, j = 0, k = 1; while (i < n1 && j < n2) {</pre> if (L[i] <= R[j]) {...</pre> else { … k++; // Copy the remaining elements of L[] if there are any while (i < n1) { … // Copy the remaining elements of R[] if there are any while (j < n2) \{ \dots \} ``` Prior ZK VMs [BCTV14b, BCTV14a, BCG+13] "<10Hz"! **Prior Work** **Prior Work** [H<sup>1</sup>Y<sup>1</sup>DK S&P'21] [YHKD EuroS&P'22] [YH Security'24] 1 Co-first Authorship **Prior Work** [H<sup>1</sup>Y<sup>1</sup>DK S&P'21] [YHKD EuroS&P'22] [YH Security'24] 1 Co-first Authorship [YHHKV CCS'23] ♥ [HHKVY Asiacrypt'24] ↓ [Yang ePrint'25] **↓** Alphabetic Order **Prior Work** [H<sup>1</sup>Y<sup>1</sup>DK S&P'21] [YHKD EuroS&P'22] [YH Security'24] 1 Co-first Authorship [YHHKV CCS'23] ♥ [HHKVY Asiacrypt'24] ↓ [Yang ePrint'25] **↓** Alphabetic Order Distinguished Paper Award [YPHK CCS'23] [YHHKV CCS'24] **↓** Alphabetic Order Distinguished Paper Award 1 Co-first Authorship • A zero-knowledge (ZK) full-toolchain system for any ANSI C program at $\approx \! 10 \text{KHz} \; (\approx \! 1000 \text{x})$ - A zero-knowledge (ZK) full-toolchain system for any ANSI C program at $\approx \! 10 \text{KHz} \ (\approx \! 1000 \text{x})$ - A two-party computation (2PC) full-toolchain system for any assembly program at $\approx 1$ KHz ( $\approx 1000 x$ ) - A zero-knowledge (ZK) full-toolchain system for any ANSI C program at $\approx \! 10 \text{KHz} \ (\approx \! 1000 \text{x})$ - A two-party computation (2PC) full-toolchain system for any assembly program at pprox 1KHz (pprox 1000 x) - A zero-knowledge (ZK) read-write memory achieving optimal complexity Alphabetic Order - A zero-knowledge (ZK) full-toolchain system for any ANSI C program at pprox 10KHz (pprox 1000 x) - A two-party computation (2PC) full-toolchain system for any assembly program at pprox 1KHz (pprox 1000 x) - A zero-knowledge (ZK) read-write memory achieving optimal complexity - A zero-knowledge (ZK) branching protocol achieving optimal complexity Alphabetic Order - A zero-knowledge (ZK) full-toolchain system for any ANSI C program at $\approx \! 10 \text{KHz} \; (\approx \! 1000 \text{x})$ - A two-party computation (2PC) full-toolchain system for any assembly program at pprox 1KHz (pprox 1000 x) - A zero-knowledge (ZK) read-write memory achieving optimal complexity - A zero-knowledge (ZK) branching protocol achieving optimal complexity - A zero-knowledge (ZK) CPU+RAM achieving optimal complexity (pprox 100 x) - 1. A zero-knowledge (ZK) full-toolchain system for any ANSI C program at $\approx \! 10 \text{KHz} \; (\approx \! 1000 \text{x})$ - 2. A zero-knowledge (ZK) read-write memory achieving optimal complexity - 3. A zero-knowledge (ZK) branching protocol achieving optimal complexity ## Notation ## Notation ### Notation void merge(int arr[], int l, int m, int r) void merge(int arr[], int l, int m, int r) int L[n1], R[n2]; int i, j, k, n1 = m - l + 1, n2 = r - m; void merge(int arr[], int l, int m, int r) int i, j, k, n1 = m - l + 1, n2 = r - m; void merge(int arr[], int l, int m, int r) void merge(int arr[], int l, int m, int r) #### void merge(int arr[], int l, int m, int r) Soundness Zero Knowledge int i, j, k, n1 = m - l + 1, n2 = r - m; int L[n1], R[n2]; // Copy data to temp arrays L[] and R[] for $(\underline{i} = 0; i < n1; i++) L[i] = arr[l + i];$ for (j = 0; j < n2; j++) R[j] = arr[m + 1 + j];// Merge the temp arrays back into arr[l..r] i = 0, j = 0, k = 1;while (i < n1 & j < n2) { I want to know more! I know w if (L[i] <= R[j]) {... else {… // Copy the remaining elements of L[] if there are any // Copy the remaining elements of R[] if there are any W Prover **ADD ADD ADD** Verifier MUL MUL MUL • • • SHL SHL SHL **AND** AND AND $b_0, i_0, D_0 \quad M[i_0]$ $b_1, i_1, D_1$ $M[i_1]$ #### void merge(int arr[], int l, int m, int r) Soundness Zero Knowledge int i, j, k, n1 = m - l + 1, n2 = r - m; int L[n1], R[n2]; // Copy data to temp arrays L[] and R[] for $(\underline{i} = 0; i < n1; i++) L[i] = arr[l + i];$ for (j = 0; j < n2; j++) R[j] = arr[m + 1 + j];// Merge the temp arrays back into arr[l..r] i = 0, j = 0, k = 1;while (i < n1 & j < n2) { I want to know more! I know w if (L[i] <= R[j]) {... else {--// Copy the remaining elements of L[] if there are any // Copy the remaining elements of R[] if there are any W Prover **ADD ADD ADD** Verifier MUL MUL MUL • • • SHL SHL SHL **AND** AND AND $[b_0], [i_0], [D_0]$ $[M[i_0]]$ $[b_1], [i_1], [D_1]$ $[M[i_1]]$ void merge(int arr[], int l, int m, int r) #### Soundness #### Zero Knowledge I want to know more! • • • Each costs O(1): Verifier $$[x] + [y] = [x + y]$$ $$[x] \cdot [y] = [x \cdot y]$$ $test\_zero([x])$ Verifier needs to read every instruction; otherwise, the unread one is not executed Verifier needs to read every instruction; otherwise, the unread one is not executed Verifier needs to read every slot; otherwise, the unread one is not executed Verifier needs to read every instruction; otherwise, the unread one is not executed Verifier needs to read every slot; otherwise, the unread one is not executed Verifier needs to read every instruction; otherwise, the unread one is not executed Verifier needs to read every slot; otherwise, the unread one is not executed 1. We <u>repeatedly</u> use the same branching or memory, the linear cost can be effectively amortized over multiple accesses Verifier needs to read every instruction; otherwise, the unread one is not executed Verifier needs to read every slot; otherwise, the unread one is not executed - 1. We <u>repeatedly</u> use the same branching or memory, the linear cost can be effectively amortized over multiple accesses - 2. knows <u>everything</u>, only needs to verify (Remark: can still cheat!) Verifier needs to read every instruction; otherwise, the unread one is not executed Verifier needs to read every slot; otherwise, the unread one is not executed Tech. 1: reuse and amortize Tech. 2: P knows and helps #### IEEE S&P 2021 # Zero Knowledge for Everything and Everyone: Fast ZK Processor with Cached ORAM for ANSI C Programs David Heath\* Yibin Yang\* David Devecsery Vladimir Kolesnikov Georgia Institute of Technology Email: {heath.davidanthony, yyang811, ddevec, kolesnikov}@gatech.edu \*Authors contributed equally # A zero-knowledge (ZK) full-toolchain system for any ANSI C program at $\approx\!10\text{KHz}\,(\approx\!1000\text{x})$ Design ### Demo #### Demo ``` #include <stdio.h> gzip1.3 #include <ctype.h> #include <sys/types.h> #include <sys/stat.h> #include <errno.h> #include <signal.h> 1904 h = 0; do { 1905 hufts = 0; 1906 if ((r = inflate_block(&e)) != 0) 1907 1908 return r; if (hufts > h) 1909 h = hufts; 1910 } while (!e); 1911 char *p = strrchr(name, '.'); 4454 4455 if (p == NULL) return; if (p == name) p++; 4456 4457 do { 4458 if (*--p == '.') *p = '_'; } while (p != name); 4459 ``` #### Demo I can hack gzip1.3 (found as the CVE-2005-1228 bug) ``` #include <stdio.h> #include <ctype.h> #include <sys/types.h> #include <sys/stat.h> #include <errno.h> #include <signal.h> h = 0; 1904 do { 1905 hufts = 0; 1906 if ((r = inflate_block(&e)) != 0) 1907 1908 return r; if (hufts > h) 1909 h = hufts; 1910 } while (!e); 1911 4454 char *p = strrchr(name, '.'); 4455 if (p == NULL) return; if (p == name) p++; 4456 4457 do { 4458 if (*--p == '.') *p = '_'; } while (p != name); 4459 ``` foo.zip # gzip1.3 ``` // Check begin int len_if = strlen(ifname); int len_of = strlen(ofname); if (len_of > len_if) asm("CALL Proof"); for (int ind = 0; ind < len_of; ind++)</pre> if (ifname[ind] != ofname[ind]) asm("CALL Proof"); // Check end ``` Check that the output path is different ``` gconeice@gconeice-ThinkPad-X1-Carbon-Gen-9:~/prover 69x39 gconeice@gconeice-ThinkPad-X1-Carbon-Gen-9:~/verifier 69x39 ~/verifier > ls /prover ) ls compile.sh data compile.sh crypt.h getopt.h gzip.h revision.h lzw.h gzip.c dir-traversal-bug.gz gzip.h tailor.h config.h revision.h config.h data gzip.c l<u>z</u>w.h ~/verifier > ./compile.sh gzip.c gzip_input tailor.h getopt.h crypt.h /prover > cat gzip_input gzip -N -d dir-traversal-bug.gz -/prover > ./compile.sh gzip.c ``` Efficient # Is $\approx 10$ KHz Fast Enough? # Is $\approx 10$ KHz Fast Enough? # Is $\approx 10$ KHz Fast Enough? V.S. We can run Linux programs gzip/sed/bzip, and prove the existence of CVE bugs in <20s # ZK Branching #### ZK Branching We carefully choose the instruction set, resulting in a relatively small CPU "unit" circuit | | Syntax | Semantics | |--------------------|-----------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | | MOV $tar\ \{src\}$ | $\mathcal{R}[tar] \leftarrow val(src)$ | | | | $val(src_1)$ , if $\mathcal{R}[src_0] \neq 0$ | | | CMOV $tar \ src_0 \ \{src_1\}$ | $\mathcal{R}[tar] \leftarrow \begin{cases} val(src_1), & \text{if } \mathcal{R}[src_0] \neq 0 \\ \mathcal{R}[tar], & \text{otherwise} \end{cases}$ | | | ADD $tar \ src_0 \ \{src_1\}$ | $\mathcal{R}[tar] \leftarrow \mathcal{R}[src_0] + val(src_1)$ | | | SUB $tar\ src_0\ \{src_1\}$ | $\mathcal{R}[tar] \leftarrow \mathcal{R}[src_0] - val(src_1)$ | | | MUL $tar\ src_0\ \{src_1\}$ | $\mathcal{R}[tar] \leftarrow \mathcal{R}[src_0] \cdot val(src_1)$ | | | XOR $tar\ src_0\ \{src_1\}$ | $\mathcal{R}[tar] \leftarrow \mathcal{R}[src_0] \oplus val(src_1)$ | | Algebra | AND $tar \ src_0 \ \{src_1\}$ | $\mathcal{R}[tar] \leftarrow \mathcal{R}[src_0] \wedge val(src_1)$ | | | OR $tar \ src_0 \ \{src_1\}$ | $\mathcal{R}[tar] \leftarrow \mathcal{R}[src_0] \vee val(src_1)$ | | | , , | $\mathcal{R}[tar] \leftarrow \begin{cases} 1, & \text{if } \mathcal{R}[src] = 0 \\ 0, & \text{otherwise} \end{cases}$ $\mathcal{R}[tar] \leftarrow \begin{cases} 1, & \text{if } \mathcal{R}[src] \geq 2^{31} \\ 0, & \text{otherwise} \end{cases}$ | | | EQZ $tar\ src$ | $\mathcal{R}[tar] \leftarrow \begin{cases} 0, & \text{otherwise} \end{cases}$ | | | | $\begin{cases} 0, & \text{otherwise} \\ 1, & \text{otherwise} \end{cases}$ | | | $\mathtt{MSB}\ tar\ src$ | $\mathcal{R}[tar] \leftarrow \begin{cases} 1, & \text{if } \mathcal{R}[src] \geq 2^{sr} \end{cases}$ | | | | 0, otherwise | | | POW2 tar src | $\mathcal{R}[tar] \leftarrow 2^{\mathcal{R}[src]}$ | | Control Flow | JMP $\{dst\}$ | $pc \leftarrow val(dst)$ | | | DN7 one (dot) | $\int val(dst)$ , if $\mathcal{R}[src] \neq 0$ | | | BNZ $src$ $\{dst\}$ | $pc \leftarrow \begin{cases} val(dst), & \text{if } \mathcal{R}[src] \neq 0 \\ pc + 1, & \text{otherwise} \end{cases}$ | | | PC $tar \{src\}$ | $\mathcal{R}[tar] \leftarrow pc + val(src) \; ; \; pc \leftarrow pc + 1$ | | | HALT | – no effect, pc unchanged – | | | QED | – no effect, pc unchanged – | | Memory | LOAD $tar \ addr_0 \ \{addr_1\}$ | $\mathcal{R}[tar] \leftarrow \mathcal{M}[\mathcal{R}[addr_0] + val(addr_1)]$ | | | STORE $src \ addr_0 \ \{addr_1\}$ | $\mathcal{M}[\mathcal{R}[addr_0] + val(addr_1)] \leftarrow \mathcal{R}(src)$ | | $\mathcal P$ Input | INPUT $tar$ | $\mathcal{R}[tar] \leftarrow x \text{ where } x \in \{02^{32} - 1\} \text{ is chosen by } \mathcal{P}$ | | | ORACLE $\{id\}$ | honest $\mathcal{P}$ privately calls oracle procedure $val(id)$ ; $pc \leftarrow pc + 1$ | | | | $\int_{\mathcal{R}} x$ if $x$ is an immediate | | | $val(x) \triangleq$ | $\begin{cases} x, & \text{if } x \text{ is an immediate} \\ \mathcal{R}[x], & \text{if } x \text{ is a register id} \end{cases}$ | | | | [K[x], if x is a register id] | # ZK Branching We carefully choose the instruction set, resulting in a relatively small CPU "unit" circuit | | Syntax | Semantics | |---------------------|-------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Algebra | MOV $tar\ \{src\}$ | $\mathcal{R}[tar] \leftarrow val(src)$ | | | CMOV $tar\ src_0\ \{src_1\}$ | $\mathcal{R}[tar] \leftarrow \begin{cases} val(src_1), & \text{if } \mathcal{R}[src_0] \neq 0 \\ \mathcal{R}[tar], & \text{otherwise} \end{cases}$ | | | ADD $tar \ src_0 \ \{src_1\}$ | $\mathcal{R}[tar] \leftarrow \mathcal{R}[src_0] + val(src_1)$ | | | SUB $tar \ src_0 \ \{src_1\}$ | $\mathcal{R}[tar] \leftarrow \mathcal{R}[src_0] - val(src_1)$ | | | MUL $tar \ src_0 \ \{src_1\}$ | $\mathcal{R}[tar] \leftarrow \mathcal{R}[src_0] \cdot val(src_1)$ | | | XOR $tar \ src_0 \ \{src_1\}$ | $\mathcal{R}[tar] \leftarrow \mathcal{R}[src_0] \oplus val(src_1)$ | | | AND $tar \ src_0 \ \{src_1\}$ | $\mathcal{R}[tar] \leftarrow \mathcal{R}[src_0] \wedge val(src_1)$ | | | OR $tar \ src_0 \ \{src_1\}$ | $\mathcal{R}[tar] \leftarrow \mathcal{R}[src_0] \vee val(src_1)$ | | | EQZ $tar\ src$ | $\mathcal{R}[tar] \leftarrow \begin{cases} 1, & \text{if } \mathcal{R}[src] = 0\\ 0, & \text{otherwise} \end{cases}$ | | | MSB $tar\ src$ | $\mathcal{R}[tar] \leftarrow \begin{cases} 1, & \text{if } \mathcal{R}[src] = 0 \\ 0, & \text{otherwise} \end{cases}$ $\mathcal{R}[tar] \leftarrow \begin{cases} 1, & \text{if } \mathcal{R}[src] \geq 2^{31} \\ 0, & \text{otherwise} \end{cases}$ | | | POW2 $tar\ src$ | $\mathcal{R}[tar] \leftarrow 2^{\mathcal{R}[src]}$ | | Control Flow | JMP $\{dst\}$ | $pc \leftarrow val(dst)$ | | | ${\tt BNZ} \ src \ \{dst\}$ | $pc \leftarrow \begin{cases} val(dst), & \text{if } \mathcal{R}[src] \neq 0 \\ pc + 1, & \text{otherwise} \end{cases}$ | | | PC $tar \{src\}$ | $\mathcal{R}[tar] \leftarrow \mathtt{pc} + val(src) \; ; \; \mathtt{pc} \leftarrow \mathtt{pc} + 1$ | | | HALT | – no effect, pc unchanged – | | | QED | – no effect, pc unchanged – | | Memory | | $\mathcal{R}[tar] \leftarrow \mathcal{M}[\mathcal{R}[addr_0] + val(addr_1)]$ $\mathcal{M}[\mathcal{R}[addr_0] + val(addr_1)] \leftarrow \mathcal{R}(src)$ | | $\mathcal{P}$ Input | INPUT $tar$ | $\mathcal{R}[tar] \leftarrow x \text{ where } x \in \{02^{32} - 1\} \text{ is chosen by } \mathcal{P}$ | | | ORACLE $\{id\}$ | honest $\mathcal{P}$ privately calls oracle procedure $val(id)$ ; $pc \leftarrow pc + 1$ | | | $val(x) \triangleq$ | $\begin{cases} x, & \text{if } x \text{ is an immediate} \\ \mathcal{R}[x], & \text{if } x \text{ is a register id} \end{cases}$ | $$O(N) \Rightarrow O(\log N)$$ per access Assuming a read-write memory with N slots, we propose BubbleCache: $O(N) \Rightarrow O(\log N)$ per access [b], [i], [D] $[M_0]$ $[M_1]$ $[M_2]$ $[M_3]$ $[M_4]$ $[M_5]$ $[M_6]$ $[M_7]$ Assuming a read-write memory with N slots, we propose BubbleCache: $$O(N) \Rightarrow O(\log N)$$ per access $[M_5]$ $[M_6]$ $$[M_0]$$ [if $(i = 0) (1 - b)M_0 + bD$ , else $M_0$ ] [ $$M_1$$ ] [if $(i = 1) (1 - b)M_1 + bD$ , else $M_1$ ] [ $$M_2$$ ] [if $(i = 2) (1 - b)M_2 + bD$ , else $M_2$ ] [ $$M_3$$ ] [if $(i = 3) (1 - b)M_3 + bD$ , else $M_3$ ] [ $$M_4$$ ] [if $(i = 4) (1 - b)M_4 + bD$ , else $M_4$ ] [if $$(i = 5) (1 - b)M_5 + bD$$ , else $M_5$ ] [if $$(i = 6) (1 - b)M_6 + bD$$ , else $M_6$ ] [ $$M_7$$ ] [if $(i = 7) (1 - b)M_7 + bD$ , else $M_7$ ] $$O(N) \Rightarrow O(\log N)$$ per access #### USENIX Security 2024 #### Two Shuffles Make a RAM: Improved Constant Overhead Zero Knowledge RAM Yibin Yang Georgia Institute of Technology David Heath University of Illinois Urbana-Champaign Assuming a read-write memory with N slots, we propose a ZK memory: $$O(N) \Rightarrow O(\log N) \Rightarrow O(1)$$ per access Assuming a read-write memory with N slots, we propose a ZK memory: $$O(N) \Rightarrow O(\log N) \Rightarrow O(1)$$ per access #### "Two Shuffles (i.e., Permutations) Make a RAM" Assuming a read-write memory with N slots, we propose a ZK memory: $$O(N) \Rightarrow O(\log N) \Rightarrow O(1)$$ per access #### "Two Shuffles (i.e., Permutations) Make a RAM" Assuming a read-write memory with N slots, we propose a ZK memory: $$O(N) \Rightarrow O(\log N) \Rightarrow O(1)$$ per access #### "Two Shuffles (i.e., Permutations) Make a RAM" Assuming a read-write memory with N slots, we propose a ZK memory: $$O(N) \Rightarrow O(\log N) \Rightarrow O(1)$$ per access #### "Two Shuffles (i.e., Permutations) Make a RAM" $[0], [M_0]$ [4], [5] $[6], [M_6]$ $[2], [M_2]$ $[1], [M_1]$ $[2], [M_2]$ $[3], [M_3]$ $[4], [M_4]$ $[5], [M_5]$ $[6], [M_6]$ $[7], [M_7]$ Assuming a read-write memory with N slots, we propose a ZK memory: $$O(N) \Rightarrow O(\log N) \Rightarrow O(1)$$ per access #### "Two Shuffles (i.e., Permutations) Make a RAM" Global final check # Constant-Overhead ZK Memory Assuming a read-write memory with N slots, we propose a ZK memory: $$O(N) \Rightarrow O(\log N) \Rightarrow O(1)$$ per access ## "Two Shuffles (i.e., Permutations) Make a RAM" (T denotes the time of accesses) ## Simplifications: $\label{eq:Read-Write} \textbf{Read-Write}([1],[D_0]) \quad \textbf{Read-Write}([0],[D_1]) \quad \textbf{Read-Write}([0],[D_2]) \quad \textbf{Read-Write}([1],[D_3])$ Read-Write([1],[ $D_0$ ]) Read-Write([0],[ $D_1$ ]) Read-Write([0],[ $D_2$ ]) Read-Write([1],[ $D_3$ ]) $\mbox{Read-Write}([1],[D_0]) \ \mbox{Read-Write}([0],[D_1]) \ \mbox{Read-Write}([0],[D_2]) \ \mbox{Read-Write}([1],[D_3])$ $\mbox{Read-Write}([1],[D_0]) \ \mbox{Read-Write}([0],[D_1]) \ \mbox{Read-Write}([0],[D_2]) \ \mbox{Read-Write}([1],[D_3])$ $\mbox{Read-Write}([1],[D_0]) \ \mbox{Read-Write}([0],[D_1]) \ \mbox{Read-Write}([0],[D_2]) \ \mbox{Read-Write}([1],[D_3])$ $\mbox{Read-Write}([1],[D_0]) \ \mbox{Read-Write}([0],[D_1]) \ \mbox{Read-Write}([0],[D_2]) \ \mbox{Read-Write}([1],[D_3])$ $\mbox{Read-Write}([1],[D_0]) \ \mbox{Read-Write}([0],[D_1]) \ \mbox{Read-Write}([0],[D_2]) \ \mbox{Read-Write}([1],[D_3])$ #### Tech. 2: P knows and helps Maintain triples: (index, value, timestamp) Time = 1 Time = 2 Time = 3 Time = 4 $\mbox{Read-Write}([1],[D_0]) \ \mbox{Read-Write}([0],[D_1]) \ \mbox{Read-Write}([0],[D_2]) \ \mbox{Read-Write}([1],[D_3])$ ## Tech. 2: P knows and helps Maintain triples: (index, value, timestamp) Record $$\left[ [0], [M_0], [0] \right] \left[ [1], [M_1], [0] \right] \left[ [1], [D_0], [1] \right] \left[ [0], [D_1], [2] \right] \left[ [0], [D_2], [3] \right] \left[ [1], [D_3], [4] \right]$$ Time = 1 Time = 2 Time = 3 Time = 4 $\mbox{Read-Write}([1],[D_0]) \ \mbox{Read-Write}([0],[D_1]) \ \mbox{Read-Write}([0],[D_2]) \ \mbox{Read-Write}([1],[D_3])$ ### Tech. 2: P knows and helps Time = 1 Time = 2 Time = 3 Time = 4 $\mbox{Read-Write}([1],[D_0]) \ \mbox{Read-Write}([0],[D_1]) \ \mbox{Read-Write}([0],[D_2]) \ \mbox{Read-Write}([1],[D_3])$ ## Tech. 2: P knows and helps $\mbox{Read-Write}([1],[D_0]) \ \mbox{Read-Write}([0],[D_1]) \ \mbox{Read-Write}([0],[D_2]) \ \mbox{Read-Write}([1],[D_3])$ ### Tech. 2: P knows and helps Record $\mbox{Read-Write}([1],[D_0]) \ \mbox{Read-Write}([0],[D_1]) \ \mbox{Read-Write}([0],[D_2]) \ \mbox{Read-Write}([1],[D_3])$ ## Tech. 2: P knows and helps $\mbox{Read-Write}([1],[D_0]) \ \mbox{Read-Write}([0],[D_1]) \ \mbox{Read-Write}([0],[D_2]) \ \mbox{Read-Write}([1],[D_3])$ ## Tech. 2: P knows and helps # Concrete improvements over BubbleCache: $12-160\times$ , depending on the network settings # Concrete improvements over BubbleCache: $12-160\times$ , depending on the network settings "One Shuffle (i.e., Permutation) Makes a ROM" Concrete improvements over BubbleCache: $12-160\times$ , depending on the network settings "One Shuffle (i.e., Permutation) Makes a ROM" ## ZK RAM and ROM over vectors Total cost: O(Nm + Tm) for T accesses and length-m vectors ## ACM CCS 2023 # Batchman and Robin: Batched and Non-batched Branching for Interactive ZK Yibin Yang Georgia Institute of Technology, USA yyang811@gatech.edu David Heath University of Illinois Urbana-Champaign, USA daheath@illinois.edu Carmit Hazay Bar-Ilan University, Ligero Inc., Israel Carmit.Hazay@biu.ac.il Vladimir Kolesnikov Georgia Institute of Technology, USA kolesnikov@gatech.edu Muthuramakrishnan Venkitasubramaniam Ligero Inc., USA muthu@ligero-inc.com Containing 2 Branches and 1 Repetition I know the inputs that make one of these two circuits to produce an output of zero SHL AND Containing 2 Branches and 1 Repetition I know the inputs that make one of these two circuits to produce an output of zero SHL AND Containing 2 Branches and 1 Repetition I know the inputs that make one of these two circuits to produce an output of zero #### Straightforward-but-expensive approach: SHL AND LT XOR ... Containing $\boldsymbol{B}$ Branches and 1 Repetition I know the inputs that make one of these B circuits to produce an output of zero SHL AND LT $\times R$ XOR ... Containing B Branches and R Repetition I know R inputs that make one of these B circuits to produce an output of zero SHL AND LT $\times R$ XOR ... Containing B Branches and R Repetition I know R inputs that make one of these B circuits to produce an output of zero $$O(RB \mid C \mid)$$ $$\leftarrow$$ $$O(B \mid C \mid + R \mid C \mid)$$ Topology vector: public and determined by the circuit inner\_product $(\overrightarrow{tv}, [in_1 \ in_2 \ in_3 \ in_4 \ \ell_1 \ r_1 \ \ell_1r_1 \ \ell_2 \ r_2 \ \ell_2r_2]) = [0]$ $[in_1 \quad in_2 \quad in_3 \quad in_4 \quad \ell_1 \quad r_1 \quad \ell_1 r_1 \quad \ell_2 \quad r_2 \quad \ell_2 r_2]$ $\exists i$ , inner\_product $(\overrightarrow{tv_i},[in_1 \ in_2 \ in_3 \ in_4 \ \ell_1 \ r_1 \ \ell_1 r_1 \ \ell_2 \ r_2 \ \ell_2 r_2]) = [0]$ ## Tech. 2: P knows and helps $\exists i$ , inner\_product $(\overrightarrow{tv_i},[in_1 \ in_2 \ in_3 \ in_4 \ \ell_1 \ r_1 \ \ell_1 r_1 \ \ell_2 \ r_2 \ \ell_2 r_2]) = [0]$ inner\_product( $[\overrightarrow{tv_i}]$ ,[ $in_1$ $in_2$ $in_3$ $in_4$ $\ell_1$ $r_1$ $\ell_1r_1$ $\ell_2$ $r_2$ $\ell_2r_2$ ]) = [0] inner\_product( $[\overrightarrow{tv_i}]$ ,[ $in_1$ $in_2$ $in_3$ $in_4$ $\ell_1$ $r_1$ $\ell_1r_1$ $\ell_2$ $r_2$ $\ell_2r_2$ ]) = [0] inner\_product( $[\overrightarrow{tv_i}]$ ,[ $in_1$ $in_2$ $in_3$ $in_4$ $\ell_1$ $r_1$ $\ell_1r_1$ $\ell_2$ $r_2$ $\ell_2r_2$ ]) = [0] $\operatorname{inner\_product}([\overrightarrow{tv_{i_1}}],[in_1^{(1)}\ in_2^{(1)}\ in_3^{(1)}\ in_4^{(1)}\ \ell_1^{(1)}\ \ell_1^{(1)}\ \ell_1^{(1)}\ \ell_1^{(1)}\ \ell_2^{(1)}\ \ell_2^{(1)}\ \ell_2^{(1)}\ \ell_2^{(1)}])=[0]$ $\text{inner\_product}([\overrightarrow{tv_{i_2}}],[in_1^{(2)} \ in_2^{(2)} \ in_3^{(2)} \ in_4^{(2)} \ \ell_1^{(2)} \ \ell_1^{(2)} \ \ell_1^{(2)} \ \ell_1^{(2)} \ \ell_2^{(2)} \ \ell_2^{(2)} \ \ell_2^{(2)} \ \ell_2^{(2)}]) = [0]$ $$\text{inner\_product}([\overrightarrow{tv_{i_R}}],[in_1^{(R)} \ in_2^{(R)} \ in_3^{(R)} \ in_4^{(R)} \ \ell_1^{(R)} \ \ell_1^{(R)} \ \ell_1^{(R)} \ \ell_1^{(R)} \ell_1^{(R)} \ \ell_2^{(R)} \ \ell_2^{(R)} \ \ell_2^{(R)} \ell_2^{(R)}]) = [0]$$ $$[i_1][i_2]\cdots[i_R]$$ ### Total cost: O(Nm + Tm) for Taccesses and length-m vectors $$\begin{aligned} & \text{inner\_product}([\overrightarrow{tv_{i_1}}], [in_1^{(1)} & in_2^{(1)} & in_3^{(1)} & in_4^{(1)} & \ell_1^{(1)} & r_1^{(1)} & \ell_1^{(1)} r_1^{(1)} & \ell_2^{(1)} & r_2^{(1)} & \ell_2^{(1)} r_2^{(1)}]) = [0] \\ & \text{inner\_product}([\overrightarrow{tv_{i_2}}], [in_1^{(2)} & in_2^{(2)} & in_3^{(2)} & in_4^{(2)} & \ell_1^{(2)} & r_1^{(2)} & \ell_1^{(2)} r_1^{(2)} & \ell_2^{(2)} & r_2^{(2)} & \ell_2^{(2)} r_2^{(2)}]) = [0] \end{aligned}$$ $$\text{inner\_product}([\overrightarrow{tv_{i_R}}],[in_1^{(R)} \ in_2^{(R)} \ in_3^{(R)} \ in_4^{(R)} \ \ell_1^{(R)} \ \ell_1^{(R)} \ \ell_1^{(R)} \ \ell_1^{(R)} \ell_1^{(R)} \ \ell_2^{(R)} \ \ell_2^{(R)} \ \ell_2^{(R)} \ell_2^{(R)}]) = [0]$$ $$[i_1][i_2]\cdots[i_R]$$ **ZK ROM** $$[i_1][i_2]\cdots[i_R] \qquad \overrightarrow{tv_{i_1}}[\overrightarrow{tv_{i_2}}]\cdots[\overrightarrow{tv_{i_R}}]$$ $$O(B \mid C \mid + R \mid C \mid)$$ $$\begin{split} & \text{inner\_product}([\overrightarrow{tv_{i_1}}],[in_1^{(1)} \ in_2^{(1)} \ in_3^{(1)} \ in_4^{(1)} \ \ell_1^{(1)} \ r_1^{(1)} \ \ell_1^{(1)} \ r_1^{(1)} \ \ell_2^{(1)} \ r_2^{(1)} \ \ell_2^{(1)} \ r_2^{(1)}]) = [0] \\ & \text{inner\_product}([\overrightarrow{tv_{i_2}}],[in_1^{(2)} \ in_2^{(2)} \ in_3^{(2)} \ in_4^{(2)} \ \ell_1^{(2)} \ r_1^{(2)} \ \ell_1^{(2)} \ r_1^{(2)} \ \ell_2^{(2)} \ r_2^{(2)} \ \ell_2^{(2)} \ r_2^{(2)}]) = [0] \end{split}$$ $$\text{inner\_product}([\overrightarrow{tv_{i_R}}],[in_1^{(R)} \ in_2^{(R)} \ in_3^{(R)} \ in_4^{(R)} \ \ell_1^{(R)} \ \ell_1^{(R)} \ \ell_1^{(R)} \ \ell_1^{(R)} \ell_1^{(R)} \ \ell_2^{(R)} \ \ell_2^{(R)} \ \ell_2^{(R)} \ell_2^{(R)}]) = [0]$$ $$[\overrightarrow{tv_{i_1}}][\overrightarrow{tv_{i_2}}]\cdots[\overrightarrow{tv_{i_R}}]$$ $$O(B \mid C \mid + R \mid C \mid)$$ $$\begin{split} & \text{inner\_product}([\overrightarrow{tv_{i_1}}], [in_1^{(1)} \ in_2^{(1)} \ in_3^{(1)} \ in_4^{(1)} \ \ell_1^{(1)} \ \ell_1^{(1)} \ \ell_1^{(1)} \ \ell_1^{(1)} \ \ell_2^{(1)} \ \ell_2^{(1)} \ \ell_2^{(1)} \ \ell_2^{(1)} \ \ell_2^{(1)}]) = [0] \\ & \text{inner\_product}([\overrightarrow{tv_{i_2}}], [in_1^{(2)} \ in_2^{(2)} \ in_3^{(2)} \ in_4^{(2)} \ \ell_1^{(2)} \ \ell_1^{(2)} \ \ell_1^{(2)} \ \ell_1^{(2)} \ \ell_2^{(2)} \ \ell_2^{(2)} \ \ell_2^{(2)} \ \ell_2^{(2)}]) = [0] \\ & \underbrace{O(R \mid C \mid)} \\ & \underbrace{O(R \mid C \mid)} \end{split}$$ $\text{inner\_product}([\overrightarrow{tv_{i_R}}],[in_1^{(R)} \ in_2^{(R)} \ in_3^{(R)} \ in_4^{(R)} \ \ell_1^{(R)} \ \ell_1^{(R)} \ \ell_1^{(R)} \ \ell_1^{(R)} \ell_1^{(R)} \ \ell_2^{(R)} \ \ell_2^{(R)} \ \ell_2^{(R)} \ell_2^{(R)}]) = [0]$ $$[\overrightarrow{tv_{i_1}}][\overrightarrow{tv_{i_2}}]\cdots[\overrightarrow{tv_{i_R}}]$$ $$O(B \mid C \mid + R \mid C \mid)$$ $$\begin{split} & \text{inner\_product}([\overrightarrow{tv_{i_1}}],[in_1^{(1)} \ in_2^{(1)} \ in_3^{(1)} \ in_4^{(1)} \ \ell_1^{(1)} \ \ell_1^{(1)} \ \ell_1^{(1)} \ \ell_1^{(1)} \ \ell_2^{(1)} \ \ell_2^{(1)} \ \ell_2^{(1)} \ \ell_2^{(1)}]) = [0] \\ & \text{inner\_product}([\overrightarrow{tv_{i_2}}],[in_1^{(2)} \ in_2^{(2)} \ in_3^{(2)} \ in_4^{(2)} \ \ell_1^{(2)} \ \ell_1^{(2)} \ \ell_1^{(2)} \ \ell_1^{(2)} \ \ell_2^{(2)} \ \ell_2^{(2)} \ \ell_2^{(2)} \ \ell_2^{(2)}]) = [0] \\ & & O(R \mid C \mid) \end{split}$$ $\text{inner\_product}([\overrightarrow{tv_{i_R}}],[in_1^{(R)} \ in_2^{(R)} \ in_3^{(R)} \ in_4^{(R)} \ \ell_1^{(R)} \ \ell_1^{(R)} \ \ell_1^{(R)} \ \ell_1^{(R)} \ell_1^{(R)} \ \ell_2^{(R)} \ \ell_2^{(R)} \ \ell_2^{(R)} \ell_2^{(R)}]) = [0]$ - 1. A zero-knowledge (ZK) full-toolchain system for any ANSI C program at $\approx \! 10 \text{KHz}$ ( $\approx \! 1000 \text{x}$ ) - 2. A zero-knowledge (ZK) read-write memory achieving optimal complexity - 3. A zero-knowledge (ZK) branching protocol achieving optimal complexity Distinguished Paper Award \* Co-first Authorship # Future Work #### **KHz** ## My PhD **Past** Hz **Future Work** MHz GHz #### **KHz** ## My PhD - A zero-knowledge (ZK) full-toolchain system for any ANSI C program at pprox 10KHz (pprox 1000 x) - A two-party computation (2PC) full-toolchain system for any assembly program at pprox 1KHz (pprox 1000 x) - A zero-knowledge (ZK) read-write memory achieving optimal complexity - A zero-knowledge (ZK) branching protocol achieving optimal complexity - A zero-knowledge (ZK) CPU+RAM achieving optimal complexity (pprox 100 x) - A zero-knowledge (ZK) full-toolchain system for any ANSI C program at pprox 10KHz (pprox 1000 x) - A two-party computation (2PC) full-toolchain system for any assembly program at pprox 1KHz (pprox 1000 x) - A zero-knowledge (ZK) read-write memory achieving optimal complexity - A zero-knowledge (ZK) branching protocol achieving optimal complexity - A zero-knowledge (ZK) CPU+RAM achieving optimal complexity (pprox 100 x) - A zero-knowledge (ZK) full-toolchain system for any ANSI C program at pprox 10KHz (pprox 1000 x) - A two-party computation (2PC) full-toolchain system for any assembly program at pprox 1KHz (pprox 1000 x) - A zero-knowledge (ZK) read-write memory achieving optimal complexity - A zero-knowledge (ZK) branching protocol achieving optimal complexity - A zero-knowledge (ZK) CPU+RAM achieving optimal complexity (pprox 100 x) - A zero-knowledge (ZK) full-toolchain system for any ANSI C program at $\approx \! 10 \text{KHz} \; (\approx \! 1000 \text{x})$ - A two-party computation (2PC) full-toolchain system for any assembly program at pprox 1KHz (pprox 1000 x) - A zero-knowledge (ZK) read-write memory achieving optimal complexity - A zero-knowledge (ZK) branching protocol achieving optimal complexity - A zero-knowledge (ZK) CPU+RAM achieving optimal complexity (pprox 100 x) - A zero-knowledge (ZK) full-toolchain system for any ANSI C program at $\approx \! 10 \text{KHz} \; (\approx \! 1000 \text{x})$ - A two-party computation (2PC) full-toolchain system for any assembly program at pprox 1KHz (pprox 1000 x) - A zero-knowledge (ZK) read-write memory achieving optimal complexity - A zero-knowledge (ZK) branching protocol achieving optimal complexity - A zero-knowledge (ZK) CPU+RAM achieving optimal complexity (pprox 100 x) Compiler Efficiency PL Hardware Cryptography Al ML Network Database Bioinfo .... Cryptography Privacy, Robustness Network Bioinfo .... The state-of-the-art ZKML (Hao et al. USENIX Security'24) uses my *open-sourced* ZK ROM to improve efficiency! Distinguished Paper Award 1 Co-first Authorship Distinguished Paper Award 1 Co-first Authorship Distinguished Paper Award 1 Co-first Authorship ## Acknowledgements