GitHub - mdaiter/ane: Reverse engineered the Apple Neural Engine, with working Python and Objective C samples [proxy]

Apple Neural Engine (ANE) Reverse Engineering

Reverse engineering artifacts for Apple's Neural Engine stack: ANECompiler, Espresso, and AppleNeuralEngine frameworks.

Target Audience: Performance engineers and security researchers working with Apple silicon ML acceleration.

Key Findings
Architecture Overview
Repository Structure
Espresso Engine Teardown
Compiler Engine Teardown
ANE Runtime Details
Security Analysis
Performance Analysis
Example Runthrough
Comprehensive Reference

Key Findings

Discovery	Details	Significance
SDPA Layer	`ANECSDPALayerDesc` is only 8 bytes	Native transformer attention in ANE hardware
40+ Optimization Passes	`Pass_fuse_conv_batchnorm`, `Pass_fold_constants`, etc.	Full Espresso compiler pipeline discoverable
XPC Daemon Architecture	`aned` at `/usr/libexec/aned`	Privilege boundary for ANE access
Entitlement Bypass	Struct init functions work without signing	Can probe all layer descriptor layouts
PBZE Format	LZFSE-compressed espresso.net	System models decodable with libcompression
Silent Failures	`compileModel:` returns NULL without error	Operations fail silently without entitlements
IOSurface Memory	`EspressoANEIOSurface` (21 methods)	Zero-copy tensor sharing with Metal
Quantization Modes	`quantization_mode:2` on inner_product	ANE-specific quantization discovered
CoreML ANE Path	`MLComputeUnitsAll` enables ANE	Working path for ANE execution via public API
HWX Binary Format	Magic `0xBEEFFACE`, Mach-O-like	Pre-compiled ANE instructions per chip generation
16 ANE Cores	M3 Pro has 16 neural engine cores	Confirmed via `MLNeuralEngineComputeDevice`

Quick Reference: What Works Without Entitlements

Operation	Works?	Notes
Load ANECompiler.framework	Yes	All frameworks load
Call `ANEC*Initialize()`	Yes	Can probe struct sizes
Create `EspressoContext` (CPU)	Yes	Platform 0 works
Load `EspressoNetwork`	Yes	CPU inference works
Create `_ANEClient`	Yes	Object created but...
Call `compileModel:`	No	Returns NULL silently
Call `loadModel:`	No	Returns NULL silently
ANE inference	No	Requires entitlements
CoreML with ANE	Yes	Use `MLComputeUnitsAll` - working path!
XPC to aned	Yes	Connection succeeds, ops need entitlements
MLComputePlan	Yes	Can inspect device availability

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                       User Application                           │
│                  (Core ML, Create ML, BNNS)                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─────────────────────────┐    ┌─────────────────────────────┐ │
│  │   Espresso.framework    │    │ AppleNeuralEngine.framework │ │
│  │   ─────────────────     │    │ ─────────────────────────── │ │
│  │  • EspressoContext      │    │  • _ANEClient               │ │
│  │  • EspressoNetwork      │    │  • _ANEModel                │ │
│  │  • 40+ Pass_* classes   │    │  • _ANERequest              │ │
│  │  • CPU/GPU/ANE dispatch │    │  • _ANEDaemonConnection     │ │
│  └───────────┬─────────────┘    └──────────────┬──────────────┘ │
│              │                                  │                │
├──────────────┴──────────────────────────────────┴────────────────┤
│                     ANECompiler.framework                        │
│                     ─────────────────────                        │
│  • ANECConvLayerDesc (176 bytes)    • ANECSDPALayerDesc (8 bytes)│
│  • ANECPoolLayerDesc (96 bytes)     • ANECLinearLayerDesc (64 B) │
│  • ANECTensorDims (40 bytes)        • 30+ layer descriptors      │
├─────────────────────────────────────────────────────────────────┤
│                    XPC Transport Layer                           │
│            Service: com.apple.appleneuralengine                  │
├─────────────────────────────────────────────────────────────────┤
│                    aned (/usr/libexec/aned)                      │
│                    ───────────────────────                       │
│  • ANEProgramCreate()          • Model cache management          │
│  • ANEProgramInstanceCreate()  • Garbage collection              │
│  • Sandbox extension handling  • Telemetry                       │
├─────────────────────────────────────────────────────────────────┤
│                    ANE Hardware (M1/M2/M3+)                      │
│                    ───────────────────────                       │
│  • 16 neural engine cores      • Dedicated SRAM                  │
│  • Up to 15.8 TOPS (M1)        • IOSurface DMA                   │
└─────────────────────────────────────────────────────────────────┘

Data Flow: Model Compilation

.mlmodelc/                      aned daemon                 Hardware
────────────                    ───────────                 ────────
     │                               │                          │
     │  1. _ANEModel create          │                          │
     ├──────────────────────────────►│                          │
     │                               │                          │
     │  2. compileModel: (XPC)       │                          │
     ├──────────────────────────────►│                          │
     │                               │  3. ANECompiler          │
     │                               ├─────────────────────────►│
     │                               │                          │
     │                               │  4. ANEProgramCreate()   │
     │                               ├─────────────────────────►│
     │                               │                          │
     │  5. Return program handle     │                          │
     │◄──────────────────────────────┤                          │
     │                               │                          │
     │  6. loadModel: (XPC)          │                          │
     ├──────────────────────────────►│                          │
     │                               │  7. Map to ANE memory    │
     │                               ├─────────────────────────►│
     │                               │                          │
     │  8. evaluateWithModel:        │  9. Execute on ANE       │
     ├──────────────────────────────►├─────────────────────────►│
     │                               │                          │

Repository Structure

ane/
├── __init__.py          # Package exports, builds API tree for tooling
├── compiler.py          # ANECompiler.framework ctypes bindings
│                        #   - Layer descriptor structs
│                        #   - ANEC*Initialize() wrappers
│                        #   - Struct size probing
│
├── espresso.py          # Espresso model format parser
│                        #   - EspressoNet, EspressoLayer classes
│                        #   - Layer type documentation
│                        #   - CPU vs ANE model comparison
│
├── runtime.py           # Espresso/ANE runtime bindings
│                        #   - EspressoContext creation
│                        #   - EspressoNetwork loading
│                        #   - ObjC class introspection
│
├── xpc.py               # ANE XPC protocol documentation
│                        #   - _ANEDaemonConnection methods
│                        #   - _ANEClient methods
│                        #   - XPC operation categories
│
├── pbze.py              # PBZE (compressed espresso.net) decoder
│                        #   - LZFSE decompression via libcompression
│                        #   - Header parsing
│                        #   - Compression statistics
│
├── sample.py            # Example graph building code
│                        #   - SimpleANEGraph class
│                        #   - CNN and Transformer examples
│
├── tests/
│   └── test_ane.py      # Comprehensive pytest suite (623 lines)
│
└── helper/
    ├── ane_helper.m     # Objective-C helper for privileged ANE access
    ├── ane_helper.entitlements
    └── build.sh         # Build script

Espresso Engine Teardown

Espresso is Apple's internal ML inference runtime that powers Core ML. It handles model execution across CPU, GPU, and ANE.

Model Format (`.espresso.net`)

Two formats exist:

JSON (human-readable):

{
  "format_version": 200,
  "storage": "model.espresso.weights",
  "layers": [
    {
      "name": "conv1",
      "type": "convolution",
      "bottom": "input",
      "top": "conv1_output",
      "kernel_size": 3,
      "stride": 1,
      "pad": 1,
      "C": 64
    }
  ],
  "analyses": {},
  "properties": {}
}

PBZE (binary, LZFSE-compressed):

Offset  Size  Description
──────  ────  ───────────
0x00    4     Magic: b'pbze'
0x04    4     Version (usually 0)
0x08    8     Unknown (header size?)
0x10    4     Uncompressed size (BIG ENDIAN!)
0x14    4     Unknown
0x18    4     Padding
0x1C    ...   LZFSE data (starts with b'bvx2')

Layer Types

Compute Layers

Type	Description	Key Attributes
`inner_product`	Dense/fully-connected	`nB`, `nC`, `quantization_mode`, `is_lookup`, `has_biases`
`convolution`	2D convolution	`kernel_size`, `stride`, `pad`, `C`, `groups`
`batch_matmul`	Batched matrix multiply	`transpose_a`, `transpose_b`
`elementwise`	Binary/unary operations	`operation` (see operation codes below)
`activation`	Nonlinearities	`type` (relu, gelu, tanh, sigmoid, etc.)
`softmax`	Softmax normalization	`axis`
`reduce`	Reduction operations	`mode` (sum, mean, max, min, prod)

Memory/Shape Layers

Type	Description	Key Attributes
`reshape`	Tensor reshape	`shape`
`transpose`	Permute dimensions	`axes`
`concat`	Concatenate tensors	`axis`
`general_concat`	N-D concatenation	`axis`, flexible inputs
`split_nd`	Split along axis	`axis`, `num_splits` or `split_sizes`
`general_slice`	Slice tensor	`starts`, `ends`, `strides`
`expand_dims`	Add dimension	`axes`
`load_constant`	Load constant tensor	`blob_weights`

Quantization Layers

Type	Description	Notes
`dynamic_quantize`	Runtime quantization	Converts FP to INT8
`dynamic_dequantize`	Runtime dequantization	Converts INT8 to FP

Special Layers

Type	Description
`instancenorm_1d`	Instance normalization
`get_shape`	Returns tensor shape
`nonzero`	Find nonzero indices
`scatter_nd`	Scatter operation
`tile`	Tile/repeat tensor

Elementwise Operation Codes

Code   Operation          Code   Operation
────   ─────────          ────   ─────────
0      add                25     pow
1      sub                26     exp
2      mul                27     log
3      div                28     abs
4      floor_div          
                          101    select (ternary: a ? b : c)
10     max                105    less_than
11     min                106    less_equal
                          107    not_equal
20     sqrt               108    equal
21     rsqrt              109    greater_equal
22     square             110    greater_than
23     neg                
24     reciprocal         117    floor
                          118    ceil

CPU vs ANE Model Differences

When a model is compiled for ANE, several transformations occur:

Aspect	CPU Model	ANE Model
Layer count	Fewer	More (ops decomposed)
Reshape ops	`reshape` layer	Often replaced with `convolution`
Embeddings	`inner_product`	`inner_product` with `is_lookup:1`
FC layers	`inner_product`	`inner_product` with `quantization_mode:2`
Tensor manipulation	Single ops	`split_nd`/`concat` chains

Example: A model with 50 CPU layers might have 80+ ANE layers due to operation decomposition.

Optimization Passes (40+ discovered)

Espresso includes extensive optimization passes accessible via EspressoCustomPass subclasses:

Pass_fuse_conv_batchnorm          # Fuse BN into conv weights
Pass_fold_constants               # Constant folding
Pass_eliminate_dead_code          # DCE
Pass_fuse_activation              # Fuse relu/gelu into preceding op
Pass_optimize_transpose           # Eliminate redundant transposes
Pass_convert_to_ane_layout        # Convert to ANE memory layout
Pass_quantize_weights             # Weight quantization
Pass_split_large_tensors          # Split tensors for ANE tile size
... (and 30+ more)

Compiler Engine Teardown

ANECompiler.framework compiles neural network graphs to ANE-executable instructions.

Layer Descriptor Sizes (Runtime Probed)

All sizes determined by calling ANEC*Initialize() with a sentinel-filled buffer:

Struct	Size	Field Layout (inferred)
`ANECKernelSize`	24	3x u64: depth, height, width
`ANECStep`	12	3x u32: depth, height, width
`ANECPadding`	24	6x u32: d_front, d_back, h_front, h_back, w_front, w_back
`ANECTensorDims`	40	5x u64: N, C, H, W, D
`ANECTensorDesc`	64	ptr(8) + dims(48) + flags(8)
`ANECConvLayerDesc`	176	Kernel, stride, padding, dilation, groups, etc.
`ANECPoolLayerDesc`	96	Kernel, stride, pool type, etc.
`ANECLinearLayerDesc`	64	Input features, output features, bias
`ANECMatrixMultLayerDesc`	16	transpose_a, transpose_b flags
`ANECSoftmaxLayerDesc`	48	Axis, stable flag
`ANECSDPALayerDesc`	8	Minimal - attention is native!
`ANECNeuronLayerDesc`	32	Activation type, params
`ANECReductionLayerDesc`	24	Reduction mode, axes
`ANECReshapeLayerDesc`	48	Target shape
`ANECTransposeLayerDesc`	32	Permutation
`ANECConcatLayerDesc`	16	Axis
`ANECGatherLayerDesc`	24	Axis, batch_dims

Layer Categories (All 40+ Discovered)

Category              Layer Types
────────              ───────────
Attention/Transformer SDPA
Convolution           Conv, CrossCorrelation, DepthwiseConv
Pooling               Pool, GlobalPool, AdaptivePool
Normalization         Norm, BatchNorm, LayerNorm, GroupNorm, LRN
Linear/Matrix         Linear, MatrixMult, Einsum
Activation            Neuron, Softmax, LogSoftmax, Dropout
Reshape/Layout        Reshape, Transpose, Flatten, Unflatten, 
                      Concat, Split, Tile, Expand, Squeeze
Spatial               Resize, Pad, CropResize, Resample, 
                      AffineTransform, GridSample
Reduction             Reduction, TopK, Sort, ArgMax, ArgMin
Scatter/Gather        Gather, GatherND, Scatter, ScatterND
Misc                  Shape, Range, Random, Fill, 
                      RingBuffer, InputView, Copy

Version APIs

from ane import ANECompiler

ane = ANECompiler()
print(f"MPS Dialect Version: {ane.mps_dialect_version}")
print(f"MPS SPI Dialect Version: {ane.mps_spi_dialect_version}")
print(f"Validate Network Version: {ane.validate_network_version}")
print(f"Analytics Buffer Size: {ane.analytics_buffer_size}")

ANE Runtime Details

XPC Protocol

Communication with ANE hardware goes through the aned daemon via XPC.

Services

Service	Purpose
`com.apple.appleneuralengine`	Main service (requires entitlements)
`com.apple.appleneuralengine.private`	Private/internal service
`com.apple.aned`	Daemon Mach service

XPC Operations

Compilation:

-[_ANEDaemonConnection compileModel:sandboxExtension:options:qos:withReply:]
-[_ANEDaemonConnection compiledModelExistsFor:withReply:]
-[_ANEDaemonConnection compiledModelExistsMatchingHash:withReply:]
-[_ANEDaemonConnection purgeCompiledModel:withReply:]

Loading:

-[_ANEDaemonConnection loadModel:sandboxExtension:options:qos:withReply:]
-[_ANEDaemonConnection loadModelNewInstance:options:modelInstParams:qos:withReply:]
-[_ANEDaemonConnection unloadModel:options:qos:withReply:]

Execution:

-[_ANEDaemonConnection prepareChainingWithModel:options:chainingReq:qos:withReply:]

Real-time:

-[_ANEDaemonConnection beginRealTimeTaskWithReply:]
-[_ANEDaemonConnection endRealTimeTaskWithReply:]

Memory Management

ANE uses IOSurface for tensor memory, enabling zero-copy sharing with GPU/Metal.

EspressoANEIOSurface Methods:

-createIOSurfaceWithExtraProperties:
-metalBufferWithDevice:
-setExternalStorage:ioSurface:
-nFrames
-bytesPerFrame
-totalBytes
// ... 21 methods total

Entitlements

Entitlement	Purpose	Required For
`com.apple.aned.private.allow`	Primary ANE access	compile, load, evaluate
`com.apple.aned.private.adapterWeight.allow`	Adapter weights access	Custom weight loading
`com.apple.aned.private.aggressivePowerSaving.allow`	Power saving modes	Low-power inference
`com.apple.ANECompilerService.allow`	Compiler service access	Model compilation
`com.apple.aned.private.processModelShare.allow`	Cross-process model sharing	Shared inference
`com.apple.ane.memoryUnwiringOptOutAccess.allow`	Memory unwiring control	Large model persistence
`com.apple.private.modelPurgeInAllPartitions.allow`	Model cache purging	Cache management
`com.apple.aned.private.secondaryANECompilerServiceAccess.allow`	Secondary compiler	Parallel compilation
`com.apple.private.ANEStorageMaintainer.allow`	Storage maintenance	Cache cleanup

Boot Arguments (Internal/Debug Builds Only)

On Apple internal builds, these boot-args can bypass entitlement checks:

Boot Arg	Purpose	Effect
`ane_skipAdapterWeightAccessCheck`	Bypass adapter weight entitlement	Skip `com.apple.aned.private.adapterWeight.allow` check
`ane_vm_allowPrecompiledBinary`	Allow precompiled binaries	Skip binary validation in VM
`ane_vm_debugDumpBootArg`	Enable debug dumps	Dump ANE state on errors
`ane_vm_forceValidationOnGuest`	Force validation in VM	Extra validation for VMs

Note: These boot-args only work when isInternalBuild returns true (Apple internal builds only). Consumer macOS always returns false for isInternalBuild.

Internal Build Detection

The aned daemon checks for internal builds via _ANEDeviceInfo.isInternalBuild, which:

Checks for /AppleInternal directory existence
Queries os_variant_has_internal_content("com.apple.aned")
Checks os_variant_allows_internal_security_policies("com.apple.aned")

All checks return false on consumer macOS installations.

Model Cache

Compiled models are cached in:

/var/folders/<user_hash>/com.apple.aned/

Cache operations in aned:

com.apple.aned.modelCacheAsyncIO
com.apple.aned.modelCacheGC
com.apple.aned.danglingModelsGC

Runtime Class Reference

Key classes discovered through runtime introspection:

`_ANEDeviceInfo` (Class Methods)

+ (BOOL)hasANE;                    // Returns YES on Apple Silicon
+ (NSInteger)numANEs;              // Number of ANE devices (usually 1)
+ (NSInteger)numANECores;          // Number of cores (e.g., 16 for M1)
+ (NSString *)productName;         // "macOS"
+ (NSString *)buildVersion;        // e.g., "25B78"
+ (NSInteger)aneArchitectureType;  // Hardware architecture identifier
+ (NSInteger)aneSubType;           // Hardware subtype
+ (BOOL)isVirtualMachine;          // VM detection
+ (BOOL)isInternalBuild;           // Apple internal build detection
+ (BOOL)precompiledModelChecksDisabled;
+ (NSString *)bootArgs;            // Current boot arguments
+ (BOOL)isBootArgPresent:(NSString *)arg;
+ (BOOL)isBoolBootArgSetTrue:(NSString *)arg;

`_ANEStrings` (Class Methods - Returns Constant Strings)

+ (NSString *)restrictedAccessEntitlement;      // "com.apple.aned.private.allow"
+ (NSString *)adapterWeightsAccessEntitlement;  // "com.apple.aned.private.adapterWeight.allow"
+ (NSString *)adapterWeightsAccessEntitlementBypassBootArg;  // "ane_skipAdapterWeightAccessCheck"
+ (NSString *)internalLibraryPath;              // "/AppleInternal/Library"
+ (NSString *)systemLibraryPath;                // "/System/Library"
// ... and many more

Hardware Info Example Output

hasANE = 1
numANEs = 1
numANECores = 16
productName = macOS
buildVersion = 25B78
isVirtualMachine = 0
isInternalBuild = 0
precompiledModelChecksDisabled = 0

Security Analysis

Attack Surface

1. XPC Message Handling

The aned daemon accepts XPC messages from clients. Potential vectors:

Malformed model paths: Does compileModel: properly validate URL paths?
Sandbox extensions: sandboxExtension: parameter passes filesystem access tokens
Memory corruption: Large or malformed layer descriptors
Race conditions: Concurrent compile/load/unload operations

2. IOSurface Sharing

IOSurface enables shared memory between processes:

Client Process          aned Daemon           ANE Hardware
──────────────          ───────────           ────────────
     │                       │                     │
     │ Create IOSurface      │                     │
     ├──────────────────────►│                     │
     │                       │ Map to ANE          │
     │                       ├────────────────────►│
     │                       │                     │
     │ Write input data      │                     │
     ├───────────────────────┼────────────────────►│
     │                       │                     │
     │ Read output data      │                     │
     │◄──────────────────────┼─────────────────────┤

Concerns:

Shared memory lifetime management
Buffer overflow if sizes mismatch
Use-after-free on premature unmap

3. Model Cache

The /var/folders/.../com.apple.aned/ cache:

World-readable in some configurations
Contains compiled ANE bytecode
Could leak model architecture details

What Works Without Entitlements

These operations succeed without code signing:

Framework loading: All three frameworks load via dlopen/ctypes
Struct initialization: All ANEC*Initialize() functions callable
Size probing: Can determine struct layouts by sentinel analysis
CPU inference: EspressoContext(platform=0) works
Model parsing: Read and parse .espresso.net files
Client creation: _ANEClient object creation succeeds

What Fails Without Entitlements

These operations fail silently (no error, just NULL return):

compileModel:options:qos:error: - returns nil
loadModel:options:qos:error: - returns nil
evaluateWithModel:options:request:qos:error: - returns nil
_ANEDeviceController - can't access valid device

Security note: Silent failures make debugging difficult but also prevent enumeration of error conditions.

Performance Analysis

Profiling APIs

Layer-Level Profiling

@interface EspressoProfilingLayerInfo : NSObject
@property (readonly) NSString *name;
@property (readonly) NSString *debug_name;
@property (readonly) double average_runtime;        // seconds
@property (readonly) int selected_runtime_engine;   // 0=CPU, 1=GPU, 2=ANE
@property (readonly) NSArray *runtimes;
@end

Network-Level ANE Profiling

@interface EspressoProfilingNetworkANEInfo : NSObject
@property (readonly) uint64_t total_ane_time_ns;
@property (readonly) uint64_t ane_time_per_eval_ns;
@end

Request-Level Stats

@interface _ANERequest : NSObject
@property uint32_t perfStatsMask;    // Bitmask for which stats to collect
@property (readonly) id perfStats;
@property (readonly) NSArray *perfStatsArray;
@end

Operation Mapping

Operations with Native ANE Support

These map 1:1 to ANE instructions:

Convolution (all variants)
Matrix multiplication
Scaled Dot-Product Attention (SDPA)
Softmax
Common activations (ReLU, GeLU, Tanh)
Pooling operations
Element-wise arithmetic

Operations That Get Decomposed

These are broken into multiple ANE ops:

LayerNorm → multiple passes
Complex reductions
Non-standard activations
Dynamic shapes

Fallback to CPU/GPU

Operations fall back when:

Tensor too large for ANE SRAM
Unsupported operation type
Dynamic control flow
Precision requirements exceed INT8/FP16

Example Runthrough

Building a CNN Graph

from ane import SimpleANEGraph

# Create graph builder
graph = SimpleANEGraph()

# Input: (batch=1, channels=3, height=224, width=224)
graph.add_conv2d("conv1", (1, 3, 224, 224), 
                 out_channels=64, kernel_size=7, stride=2, padding=3)
# Output: (1, 64, 112, 112)

graph.add_pool2d("pool1", (1, 64, 112, 112), kernel_size=3, stride=2)
# Output: (1, 64, 56, 56)

graph.add_conv2d("conv2", (1, 64, 56, 56), 
                 out_channels=128, kernel_size=3, padding=1)
# Output: (1, 128, 56, 56)

graph.add_conv2d("conv3", (1, 128, 56, 56), 
                 out_channels=256, kernel_size=3, padding=1)
# Output: (1, 256, 56, 56)

graph.add_pool2d("pool2", (1, 256, 56, 56), kernel_size=2, stride=2)
# Output: (1, 256, 28, 28)

graph.add_linear("fc1", input_features=256*28*28, output_features=1024)
graph.add_linear("fc2", input_features=1024, output_features=1000)
graph.add_softmax("softmax", (1, 1000))

print(graph.summary())

Output:

ANE Computation Graph
============================================================

conv1 (conv2d)
  Input:  (1, 3, 224, 224)
  Output: (1, 64, 112, 112)
  Desc:   176 bytes
  Kernel: 7x7
  Stride: 2x2
  Pad:    3,3

pool1 (pool2d)
  Input:  (1, 64, 112, 112)
  Output: (1, 64, 56, 56)
  Desc:   96 bytes
  Kernel: 3x3
  Stride: 2x2

...

============================================================
Total layers: 8
Total descriptor bytes: 680

Building Transformer Attention

from ane import build_transformer_attention

graph = build_transformer_attention()
print(graph.summary())

Output:

ANE Computation Graph
============================================================

proj_qkv (linear)
  Input:  (512, 512, 1, 1)
  Output: (512, 1536, 1, 1)
  Desc:   64 bytes

attention (sdpa)
  Input:  (1, 8, 512, 64)
  Output: (1, 8, 512, 64)
  Desc:   8 bytes              <-- Native transformer attention!

proj_out (linear)
  Input:  (512, 512, 1, 1)
  Output: (512, 512, 1, 1)
  Desc:   64 bytes

============================================================
Total layers: 3
Total descriptor bytes: 136

Loading Espresso Models

from ane import (
    create_espresso_cpu_context,
    load_espresso_network,
    get_network_layer_count,
    EspressoNet,
)

# Method 1: Direct runtime loading (CPU only without entitlements)
ctx = create_espresso_cpu_context()
print(f"Context: {hex(ctx)}")

model_path = "/path/to/model.espresso.net"
net = load_espresso_network(model_path, ctx)
print(f"Network: {hex(net)}")
print(f"Layers: {get_network_layer_count(net)}")

# Method 2: Parse the file directly
model = EspressoNet.from_file(model_path)
print(f"Format version: {model.format_version}")
print(f"Layer types: {model.layer_type_counts()}")

# Analyze inner_product layers for quantization
for ip in model.get_inner_product_info():
    print(f"  {ip['name']}: {ip['nB']}x{ip['nC']}, "
          f"quant={ip['quantization_mode']}, lookup={ip['is_lookup']}")

Decoding PBZE Files

from ane import decode_espresso_net, get_pbze_stats, is_pbze_file

path = "/System/Library/SomeFramework/model.espresso.net"

# Check format
if is_pbze_file(path):
    stats = get_pbze_stats(path)
    print(f"Compressed size: {stats['compressed_size']} bytes")
    print(f"Uncompressed size: {stats['uncompressed_size']} bytes")
    print(f"Compression ratio: {stats['compression_ratio']:.2f}x")

# Decode (handles both JSON and PBZE automatically)
data = decode_espresso_net(path)
print(f"Layers: {len(data['layers'])}")

Using the Native Helper

For full ANE access, use the signed Objective-C helper:

# Build and sign
cd helper
./build.sh "Developer ID Application: Your Name (TEAMID)"

# Check status
echo '{"cmd": "status"}' | ./ane_helper
# {"ok":true,"client":true,"model_count":0,"model_ids":[]}

# Compile a model
echo '{"cmd": "compile", "model_path": "/path/to/model.mlmodelc"}' | ./ane_helper
# {"ok":true,"model_id":"ABC123","state":1}

# Load into ANE memory
echo '{"cmd": "load", "model_id": "ABC123"}' | ./ane_helper
# {"ok":true,"model_id":"ABC123","program_handle":12345}

# Unload
echo '{"cmd": "unload", "model_id": "ABC123"}' | ./ane_helper
# {"ok":true}

Comprehensive Reference

Complete Layer Type Reference

Espresso Layer Types (from system model analysis)

Type	Category	Attributes
`activation`	Compute	`type` (relu/gelu/tanh/sigmoid/etc), `alpha`, `beta`
`batch_matmul`	Compute	`transpose_a`, `transpose_b`, `adj_x`, `adj_y`
`concat`	Shape	`axis`
`convolution`	Compute	`kernel_size`, `stride`, `pad`, `C`, `groups`, `dilation`
`dynamic_dequantize`	Quantization	`scale_blob`, `zero_point_blob`
`dynamic_quantize`	Quantization	`axis`, `mode`
`elementwise`	Compute	`operation`, `alpha`, `broadcast`
`expand_dims`	Shape	`axes`
`general_concat`	Shape	`axis`, `interleave`
`general_slice`	Shape	`starts`, `ends`, `strides`, `axes`
`get_shape`	Utility	(no special attributes)
`inner_product`	Compute	`nB`, `nC`, `has_biases`, `quantization_mode`, `is_lookup`
`instancenorm_1d`	Normalization	`C`, `epsilon`
`load_constant`	Memory	`blob_weights`, `shape`
`nonzero`	Utility	(no special attributes)
`reduce`	Compute	`mode` (sum/mean/max/min/prod), `axes`, `keepdims`
`reshape`	Shape	`shape`
`scatter_nd`	Memory	(no special attributes)
`softmax`	Compute	`axis`
`split_nd`	Shape	`axis`, `num_splits`, `split_sizes`
`tile`	Shape	`reps`
`transpose`	Shape	`axes`

ANE Compiler Struct Sizes

Struct	Size (bytes)	Initialize Function
ANECAffineTransformLayerDesc	48	ANECAffineTransformLayerDescInitialize
ANECBatchNormLayerDesc	40	ANECBatchNormLayerDescInitialize
ANECConcatLayerDesc	16	ANECConcatLayerDescInitialize
ANECConvLayerDesc	176	ANECConvLayerDescInitialize
ANECCropResizeLayerDesc	64	ANECCropResizeLayerDescInitialize
ANECCrossCorrelationLayerDesc	96	ANECrossCorrelationLayerDescInitialize
ANECDropoutLayerDesc	16	ANECDropoutLayerDescInitialize
ANECExpandLayerDesc	32	ANECExpandLayerDescInitialize
ANECFillLayerDesc	24	ANECFillLayerDescInitialize
ANECFlattenLayerDesc	16	ANECFlattenLayerDescInitialize
ANECGatherLayerDesc	24	ANECGatherLayerDescInitialize
ANECGatherNDLayerDesc	24	ANECGatherNDLayerDescInitialize
ANECGridSampleLayerDesc	32	ANECGridSampleLayerDescInitialize
ANECGroupNormLayerDesc	40	ANECGroupNormLayerDescInitialize
ANECInputViewLayerDesc	32	ANECInputViewLayerDescInitialize
ANECKernelSize	24	ANECKernelSizeInitialize
ANECLRNLayerDesc	32	ANECLRNLayerDescInitialize
ANECLayerNormLayerDesc	40	ANECLayerNormLayerDescInitialize
ANECLinearLayerDesc	64	ANECLinearLayerDescInitialize
ANECMatrixMultLayerDesc	16	ANECMatrixMultLayerDescInitialize
ANECNMSLayerDesc	48	ANECNMSLayerDescInitialize
ANECNeuronLayerDesc	32	ANECNeuronLayerDescInitialize
ANECNormLayerDesc	40	ANECNormLayerDescInitialize
ANECPadLayerDesc	48	ANECPadLayerDescInitialize
ANECPadding	24	ANECPaddingInitialize
ANECPoolLayerDesc	96	ANECPoolLayerDescInitialize
ANECRandomLayerDesc	32	ANECRandomLayerDescInitialize
ANECReductionLayerDesc	24	ANECReductionLayerDescInitialize
ANECResampleLayerDesc	48	ANECResampleLayerDescInitialize
ANECReshapeLayerDesc	48	ANECReshapeLayerDescInitialize
ANECResizeLayerDesc	40	ANECResizeLayerDescInitialize
ANECRingBufferLayerDesc	32	ANECRingBufferLayerDescInitialize
ANECSDPALayerDesc	8	ANECSDPALayerDescInitialize
ANECScatterLayerDesc	24	ANECScatterLayerDescInitialize
ANECScatterNDLayerDesc	24	ANECScatterNDLayerDescInitialize
ANECShapeLayerDesc	16	ANECShapeLayerDescInitialize
ANECSoftmaxLayerDesc	48	ANECSoftmaxLayerDescInitialize
ANECSortLayerDesc	24	ANECSortLayerDescInitialize
ANECSplitLayerDesc	24	ANECSplitLayerDescInitialize
ANECSqueezeLayerDesc	32	ANECSqueezeLayerDescInitialize
ANECStep	12	ANECStepInitialize
ANECTensorDesc	64	ANECTensorDescInitialize
ANECTensorDims	40	ANECTensorDimsInitialize
ANECTileLayerDesc	32	ANECTileLayerDescInitialize
ANECTopKLayerDesc	24	ANECTopKLayerDescInitialize
ANECTransposeLayerDesc	32	ANECTransposeLayerDescInitialize
ANECUnflattenLayerDesc	24	ANECUnflattenLayerDescInitialize

Espresso Optimization Passes

All discovered Pass_* classes in Espresso.framework:

Pass_add_fp16_fp32_conversions
Pass_batch_matmul_transpose_fusion
Pass_broadcast_optimization
Pass_canonicalize_ops
Pass_constant_folding
Pass_convert_gather_to_slice
Pass_convert_to_ane_layout
Pass_dead_code_elimination
Pass_decompose_complex_ops
Pass_eliminate_identity_ops
Pass_eliminate_redundant_transpose
Pass_fold_constants
Pass_fuse_activation
Pass_fuse_add_mul
Pass_fuse_bias
Pass_fuse_conv_batchnorm
Pass_fuse_conv_bias
Pass_fuse_elementwise
Pass_fuse_gelu
Pass_fuse_layernorm
Pass_fuse_linear_ops
Pass_fuse_matmul_add
Pass_fuse_mul_add
Pass_fuse_pad_conv
Pass_fuse_reshape_transpose
Pass_insert_copies_for_ane
Pass_legalize_for_ane
Pass_lower_to_ane_ops
Pass_optimize_memory_layout
Pass_optimize_reshape_chain
Pass_optimize_transpose
Pass_propagate_shapes
Pass_quantize_weights
Pass_remove_unused_outputs
Pass_replace_div_with_mul
Pass_simplify_arithmetic
Pass_split_large_tensors
Pass_tensor_parallel_partition
Pass_tile_for_ane
Pass_vectorize_ops

ObjC Class Methods Reference

_ANEClient

// Lifecycle
- (instancetype)initWithRestrictedAccessAllowed:(BOOL)allowed;

// Compilation
- (BOOL)compileModel:(id)model options:(id)opts qos:(int)qos error:(NSError**)err;
- (BOOL)compiledModelExistsFor:(id)model;
- (BOOL)compiledModelExistsMatchingHash:(NSData*)hash;
- (BOOL)purgeCompiledModel:(id)model;

// Loading
- (BOOL)loadModel:(id)model options:(id)opts qos:(int)qos error:(NSError**)err;
- (BOOL)loadModelNewInstance:(id)model options:(id)opts modelInstParams:(id)params qos:(int)qos error:(NSError**)err;
- (BOOL)loadRealTimeModel:(id)model options:(id)opts qos:(int)qos error:(NSError**)err;
- (BOOL)unloadModel:(id)model options:(id)opts qos:(int)qos error:(NSError**)err;

// Evaluation
- (BOOL)evaluateWithModel:(id)model options:(id)opts request:(id)req qos:(int)qos error:(NSError**)err;
- (BOOL)evaluateRealTimeWithModel:(id)model options:(id)opts request:(id)req error:(NSError**)err;

// Memory
- (BOOL)mapIOSurfacesWithModel:(id)model request:(id)req cacheInference:(BOOL)cache error:(NSError**)err;
- (void)unmapIOSurfacesWithModel:(id)model request:(id)req;

// Chaining
- (BOOL)prepareChainingWithModel:(id)model options:(id)opts chainingReq:(id)req qos:(int)qos error:(NSError**)err;

_ANEModel

// Initialization
- (instancetype)initWithModelAtURL:(NSURL*)url 
                               key:(NSString*)key
                  identifierSource:(int)src
              cacheURLIdentifier:(NSString*)cacheId
                 modelAttributes:(id)attrs
                  standardizeURL:(BOOL)standardize;
- (instancetype)initWithModelIdentifier:(id)identifier;

// Properties
@property (readonly) NSURL *modelURL;
@property (readonly) NSURL *sourceURL;
@property (readonly) NSString *UUID;
@property (readonly) NSString *key;
@property (readonly) int state;  // 1 = created/unloaded
@property (readonly) uint64_t programHandle;
@property (readonly) uint64_t intermediateBufferHandle;
@property (readonly) int queueDepth;
@property (readonly) uint32_t perfStatsMask;
@property (readonly) id mpsConstants;

_ANERequest

// Initialization
- (instancetype)initWithInputs:(NSArray*)inputs
                  inputIndices:(NSArray*)inputIndices
                       outputs:(NSArray*)outputs
                 outputIndices:(NSArray*)outputIndices
                 weightsBuffer:(id)weights
                     perfStats:(id)stats
                procedureIndex:(int)procIdx
                  sharedEvents:(id)events
             transactionHandle:(uint64_t)handle;

// Properties
@property (readonly) NSArray *inputArray;
@property (readonly) NSArray *inputIndexArray;
@property (readonly) NSArray *outputArray;
@property (readonly) NSArray *outputIndexArray;
@property (readonly) id weightsBuffer;
@property (readonly) int procedureIndex;
@property (readonly) id perfStats;
@property (readonly) NSArray *perfStatsArray;
@property (copy) void (^completionHandler)(BOOL, NSError*);
@property (readonly) id sharedEvents;
@property (readonly) uint64_t transactionHandle;

Running Tests

# Run all tests
pytest tests/test_ane.py -v

# Run specific test class
pytest tests/test_ane.py::TestANECompiler -v

# Run with coverage
pytest tests/test_ane.py --cov=ane --cov-report=term-missing

Test categories:

TestANEStructs - Data structure serialization
TestANECompiler - Framework loading and initialization
TestANEHelpers - Utility functions
TestANESample - Graph building
TestANELayerSizes - Probed struct sizes
TestEspressoDiscovery - ObjC class introspection
TestEspressoFormat - Model file parsing
TestPBZE - Compression/decompression
TestANEXPC - XPC protocol discovery
TestAPITree - Knowledge base API tree

HWX File Format & Execution Research

Working Path: CoreML API

The simplest way to execute models on ANE is through CoreML's public API:

// Objective-C
MLModelConfiguration *config = [[MLModelConfiguration alloc] init];
config.computeUnits = MLComputeUnitsAll;  // Enables ANE

MLModel *model = [MLModel modelWithContentsOfURL:modelURL 
                                   configuration:config 
                                           error:&error];

# Python with coremltools
import coremltools as ct
model = ct.models.MLModel("model.mlpackage", compute_units=ct.ComputeUnit.ALL)

HWX Binary Format

Pre-compiled ANE binaries (.hwx files) have a Mach-O-like structure:

Offset	Value	Description
0x00	`0xBEEFFACE`	Magic number
0x04	varies	Header info
...	`__PAGEZERO`	Zero page segment
...	`__DATA`	Data segment
...	`__FVMLIB`	ANE instructions

Key insight: HWX files cannot be loaded alone - they require a companion .espresso.net file that describes the network structure.

Espresso Model Bundle Structure

A complete Espresso model bundle contains:

File	Description
`model.espresso.net`	Network description (JSON or PBZE)
`model.espresso.weights`	Binary weights data
`model.espresso.shape`	Shape information
`model.H14.espresso.hwx`	Pre-compiled ANE binary (chip-specific)
`model.H14.espresso.precompilation_info`	Compiler metadata (JSON)

Different .hwx files exist for different ANE generations:

.H13.espresso.hwx - A14/M1 generation
.H14.espresso.hwx - A15/M2 generation
.H15.espresso.hwx - A16/M3 generation
.H16.espresso.hwx - A17/M4 generation

API Layer Summary

Layer	Status	Notes
CoreML	✅ Working	Use `MLComputeUnitsAll`, system handles everything
XPC to aned	✅ Working	`_ANEClient.sharedConnection` works
ANEServices	⚠️ Limited	Model loading needs `.espresso.net`
Espresso	⚠️ Limited	Platform 2 (ANE) context crashes
IOKit Direct	❌ Blocked	Requires `com.apple.ane.iokit-user-access`

MLComputePlan Device Masks

When inspecting MLComputePlan.computeDevicesBySupportedComputeUnits:

Mask	Devices
1	CPU only
2	GPU only
3	CPU + GPU
4	Neural Engine only
5	CPU + Neural Engine
6	GPU + Neural Engine
7	CPU + GPU + Neural Engine (all)

Hardware Detection

// Get ANE device info
Class deviceClass = NSClassFromString(@"MLNeuralEngineComputeDevice");
id device = [deviceClass performSelector:@selector(physicalDevice)];
NSInteger cores = [[device valueForKey:@"totalCoreCount"] integerValue];
// Returns 16 on M3 Pro

License

This project contains reverse engineering artifacts for research and interoperability purposes. Use responsibly.

Acknowledgments

Apple's private frameworks documentation from class-dump and dyld_info
The tinygrad community for ANE exploration inspiration

GitHub - mdaiter/ane: Reverse engineered the Apple Neural Engine, with working Python and Objective C samples