WHITE PAPER
Massive Disaggregated Processing for Sensors at the Edge
mrcy.com 4
incorporate a PCIe switch architecture to operate as either
root or endpoints for GPUs, NVMe storage, and other PCIe
devices. Another critical edge function is crypto acceleration
performed in line with IPSEC and TLS security protocols.
A DESIGN THAT ADAPTS DPUs AND GPUs
FOR EDGE APPLICATIONS
A rugged, small form factor product
combining a DPU and a GPU
Mercury 's recently announced Rugged Distributed
Processing (RDP) product family is a new class of
processing systems for edge applications. Rugged and
built to meet SWaP constraints, the RDP products feature
powerful GPUs and DPUs from NVIDIA. The first member
of the product family is the RDP 1U Rugged Distributed
GPGPU Server, described here in some detail.
The Bluefield DPU
Central to the RDP 1U architecture is the NVIDIA®
Bluefield® DPU. It is perfectly designed to support high-
performance, distributed GPU edge processing, offering:
▪ 200 Gb/s of Ethernet connectivity to
sensors, storage, and other systems
▪ A PCIe Gen4 switch + 16 lanes of connectivity for high-
bandwidth data movement to and from GPU processing
▪ 8 ARM CPU cores to initiate stream processing
applications and control the routing of
data streams without adding latency
▪ Accelerated switching and packet processing
(ASAP²) engine for bolstering advanced networking
▪ Dedicated encryption acceleration engines,
supporting security at line-rate speeds
▪ Storage control engines, with compression
and decompression acceleration
▪ Path to add GPU processing to an existing
high-speed compute rack without requiring
the replacement of compute servers.
The A100 Tensor Core GPU
The RDP 1U edge server delivers cutting-edge computing power
from NVIDIA's A100 Tensor Core GPU. Its highly parallel math
operations are ideal for signal processing and AI algorithms.
The A100's performance-enhancing features include:
▪ 6,912 processing cores that can be partitioned
into seven isolated GPU instances to dynamically
adjust to shifting workload demands
▪ 80GB of GPU memory supporting 2 TB/s of
memory bandwidth to enable extremely high-
speed processing of huge data streams
▪ Native support for a range of math precisions,
including double precision (FP64), single precision
(FP32), half precision (FP16), and integer (INT8)
The 1U Short-depth Chasis
The physical form factor and high-speed Ethernet
connectivity of the RDP chassis reflect its design
goal—disaggregated parallel processing for edge
applications. Some of its defining characteristics are:
▪ A 19" rack-mountable unit, suitable for shipboard,
large aircraft, or remote ground installations
▪ Compact dimensions of 1U height (1.75"),
17 " width, and just 20" depth
▪ Integrated air cooling
▪ Tested for 0°C to 35°C operation
▪ 16 lanes of internal PCIe Gen4 communications,
linking the DPU and GPU
▪ Two 100 Gbps (or single 200G) fiber-
optic Ethernet network ports delivering
data streams to and from the DPU
▪ A 1 Gbps copper Ethernet network port
for system control communications
Figure 3: NVIDIA A100 80 GB PCIe GPU