nn-deploy | Neural Network Compiler & Deployment

Optimization Passes

Shape Inference

Propagate tensor shapes through the graph using operation semantics

Constant Folding

Evaluate operations with all-constant inputs at compile time

Dead Code Elimination

Remove operations not reachable from any output

Operator Fusion

Fuse compatible operation sequences into single optimized kernels

Quantization

Convert float32 weights and activations to int8 for faster inference

Layout Optimization

Convert tensor layouts from NCHW to NHWC for optimal GPU execution

Memory Planning

Analyze tensor lifetimes and assign memory offsets to minimize peak usage