# Heterogeneous Isolated Execution for Commodity GPUs

Insu Jang<sup>1</sup>, Adrian Tang<sup>2</sup>, Taehoon Kim<sup>1</sup>, Simha Sethumadhavan<sup>2</sup>, and Jaehyuk Huh<sup>1</sup>

<sup>1</sup> KAIST, School of Computing
 <sup>2</sup> Columbia University, Department of Computer Science





Heterogeneous computing is emerging (GPUs, FPGAs, etc)



Machine Learning



Image Processing

$$\int \frac{dy}{dx}$$

Complex Calculations

Heterogeneous computing is emerging (GPUs, FPGAs, etc)



Machine Learning



Image Processing

$$\int \frac{dy}{dx}$$

Complex Calculations

Problem: lack of trusted execution environment in devices

Untrusted Kernel Space

Heterogeneous computing is emerging (GPUs, FPGAs, etc)



Machine Learning



Image Processing

$$\int \frac{dy}{dx}$$

Complex Calculations

Problem: lack of trusted execution environment in devices

Untrusted Kernel Space

Device Driver

Heterogeneous computing is emerging (GPUs, FPGAs, etc)



Machine Learning



Image Processing

$$\int \frac{dy}{dx}$$

Complex Calculations

Problem: lack of trusted execution environment in devices

Untrusted Kernel Space

Device Driver



Heterogeneous computing is emerging (GPUs, FPGAs, etc)



Machine Learning



Image Processing



Complex Calculations



Heterogeneous computing is emerging (GPUs, FPGAs, etc)



Machine Learning



Image Processing



Complex Calculations



Heterogeneous computing is emerging (GPUs, FPGAs, etc)



Machine Learning



Image Processing



Complex Calculations

Problem: lack of trusted execution environment in devices

User



Heterogeneous computing is emerging (GPUs, FPGAs, etc)





Heterogeneous computing is emerging (GPUs, FPGAs, etc)





Heterogeneous computing is emerging (GPUs, FPGAs, etc)





Heterogeneous computing is emerging (GPUs, FPGAs, etc)





- Problem: lack of trusted execution environment in devices
- Existing works regarding TEE for peripheral devices
  - SGXIO [Weiser, CODASPY'17]: use a trusted hypervisor

- Graviton [Volos, OSDI'18]: use a modified GPU with a root of trust

- Problem: lack of trusted execution environment in devices
- Existing works regarding TEE for peripheral devices
  - SGXIO [Weiser, CODASPY'17]: use a trusted hypervisor



- Graviton [Volos, OSDI'18]: use a modified GPU with a root of trust



- Problem: lack of trusted execution environment in devices
- Existing works regarding TEE for peripheral devices
  - SGXIO [Weiser, CODASPY'17]: use a trusted hypervisor



- Graviton [Volos, OSDI'18]: use a modified GPU with a root of trust



All device I/O accesses from software are handled by CPU

Process



**DRAM** 

**GPU** 





All device I/O accesses from software are handled by CPU

Process





All device I/O accesses from software are handled by CPU







All device I/O accesses from software are handled by CPU





All device I/O accesses from software are handled by CPU





All device I/O accesses from software are handled by CPU





All device I/O accesses from software are handled by CPU







All device I/O accesses from software are handled by CPU







All device I/O accesses from software are handled by CPU



Idea: Prevent I/O from Attackers by Securing I/O Path!



- Implementation based on Intel SGX (basic TEE necessary)
- Extend TEE to I/O path (from SGX enclave to the device)



- Implementation based on Intel SGX (basic TEE necessary)
- Extend TEE to I/O path (from SGX enclave to the device)



- Implementation based on Intel SGX (basic TEE necessary)
- Extend TEE to I/O path (from SGX enclave to the device)



- Implementation based on Intel SGX (basic TEE necessary)
- Extend TEE to I/O path (from SGX enclave to the device)



- Implementation based on Intel SGX (basic TEE necessary)
- Extend TEE to I/O path (from SGX enclave to the device)



- Implementation based on Intel SGX (basic TEE necessary)
- Extend TEE to I/O path (from SGX enclave to the device)



## **Contributions and Threat Model**



#### **Contributions and Threat Model**

- Provide confidentiality and integrity to user data in GPU
- No GPU modifications are required
  - Provide GPU TEE by securing I/O path
  - No protection against physical attacks; software based attacks prevented

#### Threat Model

- Attackers have all privileged permission on software level
- Not consider physical attacks on any hardware
- Protect the system from privileged software attacks

# **HIX Architecture**

- Trusted GPU Device Driver: GPU Enclave
- MMIO Protection
- Inter-Enclave Communication

#### **HIX: Architecture Overview**



: Three communication paths to be protected











: Three communication paths to be protected



: Three communication paths to be protected



: Three communication paths to be protected



































#### **GPU Enclave: Trusted Device Driver**

- Move device driver from untrusted kernel space to trusted enclave
- Extended SGX enclave that owns and controls GPU in TEE



Exclusively access to GPU in the system through MMIO

Exclusively access to GPU in the system through MMIO: How?

- Exclusively access to GPU in the system through MMIO: How?
- Extend SGX EPC access validation mechanism for MMIO
  - Validate address translation information during TLB misses

- Exclusively access to GPU in the system through MMIO: How?
- Extend SGX EPC access validation mechanism for MMIO
  - Validate address translation information during TLB misses



- Exclusively access to GPU in the system through MMIO: How?
- Extend SGX EPC access validation mechanism for MMIO
  - Validate address translation information during TLB misses

Virtual Address Space

- Exclusively access to GPU in the system through MMIO: How?
- Extend SGX EPC access validation mechanism for MMIO
  - Validate address translation information during TLB misses

| Virtual Address Space | Virtual Address Space |  |
|-----------------------|-----------------------|--|

Physical Address Space

- Exclusively access to GPU in the system through MMIO: How?
- Extend SGX EPC access validation mechanism for MMIO
  - Validate address translation information during TLB misses

| Virtual Address Space |  |
|-----------------------|--|
|-----------------------|--|

Physical Address Space

DRAM



- Exclusively access to GPU in the system through MMIO: How?
- Extend SGX EPC access validation mechanism for MMIO
  - Validate address translation information during TLB misses

| Virtual Address Space |  |
|-----------------------|--|
| •                     |  |

Physical Address Space

GPU





- Exclusively access to GPU in the system through MMIO: How?
- Extend SGX EPC access validation mechanism for MMIO
  - Validate address translation information during TLB misses

Virtual Address Space

Physical Address Space

GPU

DRAM

- Exclusively access to GPU in the system through MMIO: How?
- Extend SGX EPC access validation mechanism for MMIO
  - Validate address translation information during TLB misses

Virtual Address Space

Physical Address Space

GPU

DRAM

- Exclusively access to GPU in the system through MMIO: How?
- Extend SGX EPC access validation mechanism for MMIO
  - Validate address translation information during TLB misses

Virtual Address Space

Physical Address Space

GPU

Main Memory

DRAM

- Exclusively access to GPU in the system through MMIO: How?
- Extend SGX EPC access validation mechanism for MMIO
  - Validate address translation information during TLB misses

Virtual Address Space

Physical Address Space MMIO PA Main Memory

GPU DRAM

- Exclusively access to GPU in the system through MMIO: How?
- Extend SGX EPC access validation mechanism for MMIO
  - Validate address translation information during TLB misses

Virtual Address Space MMIO VA

Physical Address Space MMIO PA Main Memory

GPU DRAM

- Exclusively access to GPU in the system through MMIO: How?
- Extend SGX EPC access validation mechanism for MMIO
  - Validate address translation information during TLB misses



- Exclusively access to GPU in the system through MMIO: How?
- Extend SGX EPC access validation mechanism for MMIO
  - Validate address translation information during TLB misses



- Exclusively access to GPU in the system through MMIO: How?
- Extend SGX EPC access validation mechanism for MMIO
  - Validate address translation information during TLB misses



- Exclusively access to GPU in the system through MMIO: How?
- Extend SGX EPC access validation mechanism for MMIO
  - Validate address translation information during TLB misses

**GPU Enclave** 



- Exclusively access to GPU in the system through MMIO: How?
- Extend SGX EPC access validation mechanism for MMIO
  - Validate address translation information during TLB misses



- Exclusively access to GPU in the system through MMIO: How?
- Extend SGX EPC access validation mechanism for MMIO
  - Validate address translation information during TLB misses



- Exclusively access to GPU in the system through MMIO: How?
- Extend SGX EPC access validation mechanism for MMIO
  - Validate address translation information during TLB misses



- Exclusively access to GPU in the system through MMIO: How?
- Extend SGX EPC access validation mechanism for MMIO
  - Validate address translation information during TLB misses



- Exclusively access to GPU in the system through MMIO: How?
- Extend SGX EPC access validation mechanism for MMIO
  - Validate address translation information during TLB misses



- Exclusively access to GPU in the system through MMIO: How?
- Extend SGX EPC access validation mechanism for MMIO
  - Validate address translation information during TLB misses



#### **MMIO Access Validation**

- Exclusively access to GPU in the system through MMIO: How?
- Extend SGX EPC access validation mechanism for MMIO
  - Validate address translation information during TLB misses

































- PCle hardware registers can be manipulated by software
- Solution: freeze MMIO routing information (MMIO lockdown)



- PCIe hardware registers can be manipulated by software
- Solution: freeze MMIO routing information (MMIO lockdown)



- PCIe hardware registers can be manipulated by software
- Solution: freeze MMIO routing information (MMIO lockdown)



- PCle hardware registers can be manipulated by software
- Solution: freeze MMIO routing information (MMIO lockdown)



- PCle hardware registers can be manipulated by software
- Solution: freeze MMIO routing information (MMIO lockdown)



#### **Architecture Review**



- Inter-process communication: message queue & shared memory
- Confidentiality & integrity provided by authenticated encryption





- Inter-process communication: message queue & shared memory
- Confidentiality & integrity provided by authenticated encryption



- Inter-process communication: message queue & shared memory
- Confidentiality & integrity provided by authenticated encryption



- Inter-process communication: message queue & shared memory
- Confidentiality & integrity provided by authenticated encryption



- Inter-process communication: message queue & shared memory
- Confidentiality & integrity provided by authenticated encryption



- Inter-process communication: message queue & shared memory
- Confidentiality & integrity provided by authenticated encryption



- Inter-process communication: message queue & shared memory
- Confidentiality & integrity provided by authenticated encryption



### **Communication Challenge: DMA**

#### Challenge

- DMA from device to enclaves not allowed by SGX
- Data copy can only be done through (slow) MMIO



### **Trusted DMA Support**

- GPU DMAs encrypted data from shared memory to GPU
- GPU enclave launches in-GPU decryption kernel



# **Evaluation**

#### **Evaluation**

- Prototype Implementation
  - Hardware changes are emulated in a KVM/QEMU virtual machine
  - GPU enclave implementation is based on Gdev [Kato, ATC'12]

- Performance analysis: Rodinia GPU microbenchmark
  - Measure overheads due to cryptography, etc.
  - Baseline: unmodified Gdev NVIDIA GPU driver

|                   | Baseline                | HIX                   |  |  |  |
|-------------------|-------------------------|-----------------------|--|--|--|
| Trusted Execution | No                      | Yes                   |  |  |  |
| Encryption        | N/A                     | AES-OCB [Rogaway '14] |  |  |  |
| GPU               | NVIDIA Geforce GTX 580* |                       |  |  |  |

<sup>\*</sup> Newer devices are not supported by Gdev





| App<br>Name | SRAD   | PF      | NN      | NW      | LUD    | HS     | GS     | BFS    | ВР      |
|-------------|--------|---------|---------|---------|--------|--------|--------|--------|---------|
| Memcpy      | 48.4MB | 256.0MB | 501.2KB | 192.1MB | 32.0MB | 12.0MB | 64.0MB | 46.9MB | 159.8MB |



| App<br>Name | SRAD   | PF | NN | NW      | LUD | HS | GS | BFS | ВР      |
|-------------|--------|----|----|---------|-----|----|----|-----|---------|
| Memcpy      | 48.4MB |    |    | 192.1MB |     |    |    |     | 159.8MB |













| App<br>Name | SRAD   | PF      | NN      | NW      | LUD    | HS     | GS     | BFS    | ВР      |
|-------------|--------|---------|---------|---------|--------|--------|--------|--------|---------|
| Memcpy      | 48.4MB | 256.0MB | 501.2KB | 192.1MB | 32.0MB | 12.0MB | 64.0MB | 46.9MB | 159.8MB |



| App<br>Name | SRAD   | PF      | NN      | NW      | LUD    | HS     | GS     | BFS    | ВР      |
|-------------|--------|---------|---------|---------|--------|--------|--------|--------|---------|
| Memcpy      | 48.4MB | 256.0MB | 501.2KB | 192.1MB | 32.0MB | 12.0MB | 64.0MB | 46.9MB | 159.8MB |



**Large Amount of Data** → **High Cryptography Overheads** 











High Computational Ratio → Cryptography Overhead Ratio Reduced

| App<br>Name | SRAD   | PF      | NN      | NW      | LUD    | HS     | GS     | BFS    | ВР      |
|-------------|--------|---------|---------|---------|--------|--------|--------|--------|---------|
| Memcpy      | 48.4MB | 256.0MB | 501.2KB | 192.1MB | 32.0MB | 12.0MB | 64.0MB | 46.9MB | 159.8MB |



| App<br>Name | SRAD   | PF      | NN      | NW      | LUD    | HS     | GS     | BFS    | ВР      |
|-------------|--------|---------|---------|---------|--------|--------|--------|--------|---------|
| Memcpy      | 48.4MB | 256.0MB | 501.2KB | 192.1MB | 32.0MB | 12.0MB | 64.0MB | 46.9MB | 159.8MB |

















HIX: Provide trusted execution environment to commodity GPUs

#### Access granted to their own devices



#### **Expandable Device Protection**

# Heterogeneous Isolated Execution for Commodity GPUs

Thank you for Listening!

Q&A



