Unreal Engine NNE vs. Unity Sentis: Runtime Latency Benchmarks for Real-Time Physics Approximation

RJH Rizo

May 22, 2026 May 22, 2026

Unreal Engine NNE vs. Unity Sentis: Runtime Latency Benchmarks for Real-Time Physics Approximation

By Rizowan Ahmed (@riz1raj)
Senior Technology Analyst | Covering Enterprise IT, Hardware & Emerging Trends

Stop trying to solve Navier-Stokes or complex viscoelastic stress tensors on the CPU during a tight frame budget. It is a difficult challenge. For years, game physics engines like PhysX, Chaos, and Havok have forced developers to choose between simplified, rigid-body approximations or significant frame-rate drops when simulating complex soft bodies, cloth, and fluid dynamics.

The paradigm has shifted. We are increasingly training deep neural networks offline on high-fidelity simulation data and running lightweight, quantized models at runtime to approximate physics. But the success of this approach hinges entirely on the efficiency of the runtime inference engine embedded within the game engine.

This technical deep dive provides a rigorous evaluation of Unreal Engine Neural Network Engine (NNE) vs. Unity Sentis for Real-Time Physics Approximation. We will evaluate their underlying architectures, memory management strategies, and relative latency profiles across modern desktop and mobile hardware.

The Architectural Battleground: Unreal NNE vs. Unity Sentis

Before looking at the performance characteristics, we must understand the fundamental architectural divergence between Epic Games’ Neural Network Engine (NNE) and Unity’s Sentis framework. Both are designed to ingest ONNX (Open Neural Network Exchange) models, but their integration philosophies and execution pipelines are different.

Unreal Engine NNE: Rigid, Native, and RDG-Bound

Unreal Engine’s NNE is an abstraction layer wrapping multiple platform-specific backends. NNE is integrated with Unreal’s Render Dependency Graph (RDG), which provides a critical architectural pathway.

Backend Flexibility: NNE can route inference through NNERuntimeORT (ONNX Runtime, utilizing DirectML on Windows), NNERuntimeIREE (for Vulkan-based mobile and console targets), or platform-proprietary APIs like Apple’s CoreML.
RDG Integration: By operating directly within the RDG, NNE allows developers to enqueue neural network inference passes directly into the GPU command buffer alongside standard rendering and compute passes. This helps minimize the latency penalty of CPU-GPU synchronization.
Memory Management: NNE relies on native C++ memory allocation. Tensors are represented as RDG buffers (FRDGBufferRef), which can be bound directly to compute shaders without staging copies.

Unity Sentis: High-Level, Burst-Compiled, and C#-Centric

Unity Sentis takes a different approach. It is a native, cross-platform inference engine written in C# but optimized using Unity’s Burst Compiler and Entity Component System (ECS) infrastructure.

Execution Model: Sentis parses the ONNX computational graph and compiles it into a series of optimized Unity Compute Shaders (HLSL) or Burst-compiled C# jobs for CPU execution.
Backend Portability: Because Sentis compiles down to Unity’s own intermediate representation (IR) and compute shaders, it runs on platforms Unity supports without requiring external third-party dynamic libraries (DLLs) or platform-specific SDKs.
Memory Management: Sentis uses a managed wrapper around unmanaged memory. Tensors are represented by the Tensor class. While Sentis employs tensor pooling to avoid GC (Garbage Collector) spikes, developers must manage tensor lifecycles (via Dispose() or using blocks) to prevent memory leaks on the native side.

The Physics Approximation Problem Space

To establish a realistic comparison, we must define the workload. Physics approximation demands low-latency, high-frequency inference on structured spatial data.

We designed two distinct test scenarios representing common game-development bottlenecks:

Scenario A: Soft-Body Deforming Mesh (MLP). A Multi-Layer Perceptron (MLP) with dense layers. This model predicts the vertex displacement of a soft-body character mesh upon collision.
Scenario B: Fluid Boundary Prediction (Lightweight GNN/CNN). A 1D Convolutional Neural Network (CNN) representing a simplified Graph Neural Network (GNN) approximation. It processes local particle densities to predict boundary pressure forces.

Methodology & Hardware Stack

All evaluations were conducted in production-equivalent builds (packaged, non-editor builds) using recent engine releases: Unreal Engine (using NNE with DirectML/CoreML backends) and Unity (using Sentis with GPU Compute/Burst backends).

Tests were executed on three hardware configurations to capture the spectrum of target platforms:

Desktop High-End: AMD Ryzen 9 7950X3D, 64GB DDR5, NVIDIA GeForce RTX 4090.
Desktop Mid-Range: Intel Core i7-13700K, 32GB DDR5, AMD Radeon RX 7700 XT.
Mobile/Silicon: Apple M3 Max.

We evaluated relative Inference Latency, CPU-GPU Roundtrip Overhead, and Memory Footprint over continuous frames of physical simulation.

Evaluation Results: Scenario A (Soft-Body MLP)

The MLP model represents a compute-bound task with relatively small input/output tensors. Performance here is heavily dictated by raw GEMM (General Matrix Multiply) execution speed and driver-level scheduling.

Desktop High-End (RTX 4090)

Unreal NNE (DirectML Backend): Demonstrated low average inference latency and minimal CPU-GPU roundtrip overhead via RDG asynchronous compute.
Unity Sentis (GPU Compute Backend): Showed slightly higher average inference latency and moderate CPU-GPU roundtrip overhead.

Desktop Mid-Range (RX 7700 XT)

Unreal NNE (DirectML Backend): Maintained a latency advantage with efficient pipeline integration.
Unity Sentis (GPU Compute Backend): Exhibited higher latency and increased dispatch overhead.

Apple M3 Max (Unified Memory)

Unreal NNE (CoreML Backend): Leveraged the unified memory architecture for low latency.
Unity Sentis (Metal Compute Backend): Showed stable performance but higher overhead compared to the native CoreML backend.

Evaluation Results: Scenario B (Fluid Boundary CNN)

The CNN/GNN approximation is highly memory-bandwidth bound. It requires processing larger spatial tensors, making memory layout optimization and cache utilization the deciding factors.

Desktop High-End (RTX 4090)

Unreal NNE (DirectML Backend): Achieved faster inference times due to optimized tensor layout handling.
Unity Sentis (GPU Compute Backend): Showed higher latency under the memory-bandwidth-bound workload.

Desktop Mid-Range (RX 7700 XT)

Unreal NNE (DirectML Backend): Maintained lower average inference latency.
Unity Sentis (GPU Compute Backend): Experienced higher latency during dense tensor processing.

Apple M3 Max (Unified Memory)

Unreal NNE (CoreML Backend): Outperformed the generic compute shader implementation.
Unity Sentis (Metal Compute Backend): Demonstrated functional parity but with higher overall execution times.

Anatomy of the Latency Gap: Why Unreal NNE Outperforms Sentis

The evaluation indicates that Unreal Engine NNE consistently demonstrates lower latency across the tested hardware configurations, particularly in reducing the CPU-GPU roundtrip overhead. Let’s analyze the technical reasons for this discrepancy.

1. Native Driver Bindings vs. Compiled Compute Shaders

Unreal’s NNE leverages optimized, hardware-specific vendor libraries. On Windows, NNERuntimeORT directly calls into Microsoft’s DirectML, which has deep, driver-level optimizations for compatible GPU architectures. On Apple Silicon, NNE routes through CoreML, utilizing the Apple Neural Engine (ANE) where available.

Unity Sentis, conversely, compiles ONNX operators down to generic HLSL compute shaders. While this cross-platform approach is highly portable, it may lack the low-level, hardware-specific instruction scheduling that DirectML and CoreML exploit directly.

2. Command Buffer Dispatch and Main-Thread Stalls

In Unity Sentis, enqueuing an inference pass requires executing C# code that schedules compute dispatch calls. Even with the Burst Compiler, there is a measurable main-thread overhead associated with translating tensor shapes into compute thread groups. If asynchronous updates are not carefully managed, Sentis can block the main thread waiting for the GPU to return tensor results.

Unreal NNE integrates directly with the Render Dependency Graph. The NNE inference pass is scheduled as an RDG node. The engine’s renderer schedules the pass, optimizes the transitions between resource states, and executes it asynchronously on the compute queue, minimizing CPU stalls.

3. Quantization and Tensor Layouts

Unreal NNE’s ONNX Runtime backend handles quantization and tensor layout optimization during the initialization phase. Unity Sentis requires developers to be more deliberate with model import settings and manual layout optimizations, which can impact memory bandwidth consumption if not configured correctly.

API Ergonomics and Developer Workflow

Performance is not the only metric that matters; developer velocity is also critical.

Unity Sentis offers a user-friendly API. Importing an ONNX file into the Unity Editor automatically generates a model asset. Writing an inference pipeline in C# is concise and readable:

using Unity.Sentis;

public class FluidApproximation : MonoBehaviour {
    public ModelAsset modelAsset;
    private IWorker m_Engine;

    void Start() {
        var model = ModelLoader.Load(modelAsset);
        m_Engine = WorkerFactory.CreateWorker(BackendType.GPUCompute, model);
    }

    void Update() {
        using var inputTensor = new TensorFloat(new TensorShape(1, 4, 64), GetInputData());
        m_Engine.Execute(inputTensor);
        TensorFloat outputTensor = m_Engine.PeekOutput() as TensorFloat;
        // Apply forces to physics particles...
    }

    void OnDestroy() {
        m_Engine?.Dispose();
    }
}

Unreal Engine NNE, by contrast, requires more boilerplate code. Developers must work with C++ templates, handle asynchronous RDG builder patterns, and manage backend selection. It is powerful, but requires a deeper understanding of the engine's rendering pipeline.

Architectural Recommendations

For technical architects designing real-time physics approximation pipelines, the choice of engine and runtime framework should be dictated by your performance budget and target platforms:

Choose Unreal Engine NNE if: You are targeting high-end consoles and PC, your physics simulation requires tight integration with rendering pipelines (e.g., Niagara fluids or complex cloth), and you have the C++ expertise to write custom RDG passes. Low latency profiles are critical for high-frame-rate targets.
Choose Unity Sentis if: You are targeting a wide range of heterogeneous platforms (including WebGL, Nintendo Switch, and mobile), your physics approximation can tolerate a slightly higher latency window, and you require rapid prototyping capabilities where developers can easily swap out ONNX models in the editor.

Looking ahead, as dedicated Neural Processing Units (NPUs) become standard in mainstream PC silicon and mobile SOCs, both Epic and Unity are working to expose native NPU backends. Whichever engine you choose, the integration of machine learning models into real-time simulation pipelines represents a significant shift in how physics can be approximated in modern game engines.

Advanced Analysis Tech

Rizowan's Blog

Unreal Engine NNE vs. Unity Sentis: Runtime Latency Benchmarks for Real-Time Physics Approximation

Unreal Engine NNE vs. Unity Sentis: Runtime Latency Benchmarks for Real-Time Physics Approximation

The Architectural Battleground: Unreal NNE vs. Unity Sentis

Unreal Engine NNE: Rigid, Native, and RDG-Bound

Unity Sentis: High-Level, Burst-Compiled, and C#-Centric

The Physics Approximation Problem Space

Methodology & Hardware Stack

Evaluation Results: Scenario A (Soft-Body MLP)

Desktop High-End (RTX 4090)

Desktop Mid-Range (RX 7700 XT)

Apple M3 Max (Unified Memory)

Evaluation Results: Scenario B (Fluid Boundary CNN)

Desktop High-End (RTX 4090)

Desktop Mid-Range (RX 7700 XT)

Apple M3 Max (Unified Memory)

Anatomy of the Latency Gap: Why Unreal NNE Outperforms Sentis

1. Native Driver Bindings vs. Compiled Compute Shaders

2. Command Buffer Dispatch and Main-Thread Stalls

3. Quantization and Tensor Layouts

API Ergonomics and Developer Workflow

Architectural Recommendations

Post a Comment

Master the Digital Space

Don't Stop Building.