Apple Neural Engine (ANE) Reverse Engineering
Reverse engineering artifacts for Apple's Neural Engine stack: ANECompiler, Espresso, and AppleNeuralEngine frameworks.
Target Audience: Performance engineers and security researchers working with Apple silicon ML acceleration.
Table of Contents
- Key Findings
- Architecture Overview
- Repository Structure
- Espresso Engine Teardown
- Compiler Engine Teardown
- ANE Runtime Details
- Security Analysis
- Performance Analysis
- Example Runthrough
- Comprehensive Reference
Key Findings
| Discovery | Details | Significance |
|---|---|---|
| SDPA Layer | ANECSDPALayerDesc is only 8 bytes |
Native transformer attention in ANE hardware |
| 40+ Optimization Passes | Pass_fuse_conv_batchnorm, Pass_fold_constants, etc. |
Full Espresso compiler pipeline discoverable |
| XPC Daemon Architecture | aned at /usr/libexec/aned |
Privilege boundary for ANE access |
| Entitlement Bypass | Struct init functions work without signing | Can probe all layer descriptor layouts |
| PBZE Format | LZFSE-compressed espresso.net | System models decodable with libcompression |
| Silent Failures | compileModel: returns NULL without error |
Operations fail silently without entitlements |
| IOSurface Memory | EspressoANEIOSurface (21 methods) |
Zero-copy tensor sharing with Metal |
| Quantization Modes | quantization_mode:2 on inner_product |
ANE-specific quantization discovered |
| CoreML ANE Path | MLComputeUnitsAll enables ANE |
Working path for ANE execution via public API |
| HWX Binary Format | Magic 0xBEEFFACE, Mach-O-like |
Pre-compiled ANE instructions per chip generation |
| 16 ANE Cores | M3 Pro has 16 neural engine cores | Confirmed via MLNeuralEngineComputeDevice |
Quick Reference: What Works Without Entitlements
| Operation | Works? | Notes |
|---|---|---|
| Load ANECompiler.framework | Yes | All frameworks load |
Call ANEC*Initialize() |
Yes | Can probe struct sizes |
Create EspressoContext (CPU) |
Yes | Platform 0 works |
Load EspressoNetwork |
Yes | CPU inference works |
Create _ANEClient |
Yes | Object created but... |
Call compileModel: |
No | Returns NULL silently |
Call loadModel: |
No | Returns NULL silently |
| ANE inference | No | Requires entitlements |
| CoreML with ANE | Yes | Use MLComputeUnitsAll - working path! |
| XPC to aned | Yes | Connection succeeds, ops need entitlements |
| MLComputePlan | Yes | Can inspect device availability |
Architecture Overview
┌─────────────────────────────────────────────────────────────────┐
│ User Application │
│ (Core ML, Create ML, BNNS) │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────┐ ┌─────────────────────────────┐ │
│ │ Espresso.framework │ │ AppleNeuralEngine.framework │ │
│ │ ───────────────── │ │ ─────────────────────────── │ │
│ │ • EspressoContext │ │ • _ANEClient │ │
│ │ • EspressoNetwork │ │ • _ANEModel │ │
│ │ • 40+ Pass_* classes │ │ • _ANERequest │ │
│ │ • CPU/GPU/ANE dispatch │ │ • _ANEDaemonConnection │ │
│ └───────────┬─────────────┘ └──────────────┬──────────────┘ │
│ │ │ │
├──────────────┴──────────────────────────────────┴────────────────┤
│ ANECompiler.framework │
│ ───────────────────── │
│ • ANECConvLayerDesc (176 bytes) • ANECSDPALayerDesc (8 bytes)│
│ • ANECPoolLayerDesc (96 bytes) • ANECLinearLayerDesc (64 B) │
│ • ANECTensorDims (40 bytes) • 30+ layer descriptors │
├─────────────────────────────────────────────────────────────────┤
│ XPC Transport Layer │
│ Service: com.apple.appleneuralengine │
├─────────────────────────────────────────────────────────────────┤
│ aned (/usr/libexec/aned) │
│ ─────────────────────── │
│ • ANEProgramCreate() • Model cache management │
│ • ANEProgramInstanceCreate() • Garbage collection │
│ • Sandbox extension handling • Telemetry │
├─────────────────────────────────────────────────────────────────┤
│ ANE Hardware (M1/M2/M3+) │
│ ─────────────────────── │
│ • 16 neural engine cores • Dedicated SRAM │
│ • Up to 15.8 TOPS (M1) • IOSurface DMA │
└─────────────────────────────────────────────────────────────────┘
Data Flow: Model Compilation
.mlmodelc/ aned daemon Hardware
──────────── ─────────── ────────
│ │ │
│ 1. _ANEModel create │ │
├──────────────────────────────►│ │
│ │ │
│ 2. compileModel: (XPC) │ │
├──────────────────────────────►│ │
│ │ 3. ANECompiler │
│ ├─────────────────────────►│
│ │ │
│ │ 4. ANEProgramCreate() │
│ ├─────────────────────────►│
│ │ │
│ 5. Return program handle │ │
│◄──────────────────────────────┤ │
│ │ │
│ 6. loadModel: (XPC) │ │
├──────────────────────────────►│ │
│ │ 7. Map to ANE memory │
│ ├─────────────────────────►│
│ │ │
│ 8. evaluateWithModel: │ 9. Execute on ANE │
├──────────────────────────────►├─────────────────────────►│
│ │ │
Repository Structure
ane/
├── __init__.py # Package exports, builds API tree for tooling
├── compiler.py # ANECompiler.framework ctypes bindings
│ # - Layer descriptor structs
│ # - ANEC*Initialize() wrappers
│ # - Struct size probing
│
├── espresso.py # Espresso model format parser
│ # - EspressoNet, EspressoLayer classes
│ # - Layer type documentation
│ # - CPU vs ANE model comparison
│
├── runtime.py # Espresso/ANE runtime bindings
│ # - EspressoContext creation
│ # - EspressoNetwork loading
│ # - ObjC class introspection
│
├── xpc.py # ANE XPC protocol documentation
│ # - _ANEDaemonConnection methods
│ # - _ANEClient methods
│ # - XPC operation categories
│
├── pbze.py # PBZE (compressed espresso.net) decoder
│ # - LZFSE decompression via libcompression
│ # - Header parsing
│ # - Compression statistics
│
├── sample.py # Example graph building code
│ # - SimpleANEGraph class
│ # - CNN and Transformer examples
│
├── tests/
│ └── test_ane.py # Comprehensive pytest suite (623 lines)
│
└── helper/
├── ane_helper.m # Objective-C helper for privileged ANE access
├── ane_helper.entitlements
└── build.sh # Build script
Espresso Engine Teardown
Espresso is Apple's internal ML inference runtime that powers Core ML. It handles model execution across CPU, GPU, and ANE.
Model Format (.espresso.net)
Two formats exist:
- JSON (human-readable):
{
"format_version": 200,
"storage": "model.espresso.weights",
"layers": [
{
"name": "conv1",
"type": "convolution",
"bottom": "input",
"top": "conv1_output",
"kernel_size": 3,
"stride": 1,
"pad": 1,
"C": 64
}
],
"analyses": {},
"properties": {}
}- PBZE (binary, LZFSE-compressed):
Offset Size Description
────── ──── ───────────
0x00 4 Magic: b'pbze'
0x04 4 Version (usually 0)
0x08 8 Unknown (header size?)
0x10 4 Uncompressed size (BIG ENDIAN!)
0x14 4 Unknown
0x18 4 Padding
0x1C ... LZFSE data (starts with b'bvx2')
Layer Types
Compute Layers
| Type | Description | Key Attributes |
|---|---|---|
inner_product |
Dense/fully-connected | nB, nC, quantization_mode, is_lookup, has_biases |
convolution |
2D convolution | kernel_size, stride, pad, C, groups |
batch_matmul |
Batched matrix multiply | transpose_a, transpose_b |
elementwise |
Binary/unary operations | operation (see operation codes below) |
activation |
Nonlinearities | type (relu, gelu, tanh, sigmoid, etc.) |
softmax |
Softmax normalization | axis |
reduce |
Reduction operations | mode (sum, mean, max, min, prod) |
Memory/Shape Layers
| Type | Description | Key Attributes |
|---|---|---|
reshape |
Tensor reshape | shape |
transpose |
Permute dimensions | axes |
concat |
Concatenate tensors | axis |
general_concat |
N-D concatenation | axis, flexible inputs |
split_nd |
Split along axis | axis, num_splits or split_sizes |
general_slice |
Slice tensor | starts, ends, strides |
expand_dims |
Add dimension | axes |
load_constant |
Load constant tensor | blob_weights |
Quantization Layers
| Type | Description | Notes |
|---|---|---|
dynamic_quantize |
Runtime quantization | Converts FP to INT8 |
dynamic_dequantize |
Runtime dequantization | Converts INT8 to FP |
Special Layers
| Type | Description |
|---|---|
instancenorm_1d |
Instance normalization |
get_shape |
Returns tensor shape |
nonzero |
Find nonzero indices |
scatter_nd |
Scatter operation |
tile |
Tile/repeat tensor |
Elementwise Operation Codes
Code Operation Code Operation
──── ───────── ──── ─────────
0 add 25 pow
1 sub 26 exp
2 mul 27 log
3 div 28 abs
4 floor_div
101 select (ternary: a ? b : c)
10 max 105 less_than
11 min 106 less_equal
107 not_equal
20 sqrt 108 equal
21 rsqrt 109 greater_equal
22 square 110 greater_than
23 neg
24 reciprocal 117 floor
118 ceil
CPU vs ANE Model Differences
When a model is compiled for ANE, several transformations occur:
| Aspect | CPU Model | ANE Model |
|---|---|---|
| Layer count | Fewer | More (ops decomposed) |
| Reshape ops | reshape layer |
Often replaced with convolution |
| Embeddings | inner_product |
inner_product with is_lookup:1 |
| FC layers | inner_product |
inner_product with quantization_mode:2 |
| Tensor manipulation | Single ops | split_nd/concat chains |
Example: A model with 50 CPU layers might have 80+ ANE layers due to operation decomposition.
Optimization Passes (40+ discovered)
Espresso includes extensive optimization passes accessible via EspressoCustomPass subclasses:
Pass_fuse_conv_batchnorm # Fuse BN into conv weights
Pass_fold_constants # Constant folding
Pass_eliminate_dead_code # DCE
Pass_fuse_activation # Fuse relu/gelu into preceding op
Pass_optimize_transpose # Eliminate redundant transposes
Pass_convert_to_ane_layout # Convert to ANE memory layout
Pass_quantize_weights # Weight quantization
Pass_split_large_tensors # Split tensors for ANE tile size
... (and 30+ more)
Compiler Engine Teardown
ANECompiler.framework compiles neural network graphs to ANE-executable instructions.
Layer Descriptor Sizes (Runtime Probed)
All sizes determined by calling ANEC*Initialize() with a sentinel-filled buffer:
| Struct | Size | Field Layout (inferred) |
|---|---|---|
ANECKernelSize |
24 | 3x u64: depth, height, width |
ANECStep |
12 | 3x u32: depth, height, width |
ANECPadding |
24 | 6x u32: d_front, d_back, h_front, h_back, w_front, w_back |
ANECTensorDims |
40 | 5x u64: N, C, H, W, D |
ANECTensorDesc |
64 | ptr(8) + dims(48) + flags(8) |
ANECConvLayerDesc |
176 | Kernel, stride, padding, dilation, groups, etc. |
ANECPoolLayerDesc |
96 | Kernel, stride, pool type, etc. |
ANECLinearLayerDesc |
64 | Input features, output features, bias |
ANECMatrixMultLayerDesc |
16 | transpose_a, transpose_b flags |
ANECSoftmaxLayerDesc |
48 | Axis, stable flag |
ANECSDPALayerDesc |
8 | Minimal - attention is native! |
ANECNeuronLayerDesc |
32 | Activation type, params |
ANECReductionLayerDesc |
24 | Reduction mode, axes |
ANECReshapeLayerDesc |
48 | Target shape |
ANECTransposeLayerDesc |
32 | Permutation |
ANECConcatLayerDesc |
16 | Axis |
ANECGatherLayerDesc |
24 | Axis, batch_dims |
Layer Categories (All 40+ Discovered)
Category Layer Types
──────── ───────────
Attention/Transformer SDPA
Convolution Conv, CrossCorrelation, DepthwiseConv
Pooling Pool, GlobalPool, AdaptivePool
Normalization Norm, BatchNorm, LayerNorm, GroupNorm, LRN
Linear/Matrix Linear, MatrixMult, Einsum
Activation Neuron, Softmax, LogSoftmax, Dropout
Reshape/Layout Reshape, Transpose, Flatten, Unflatten,
Concat, Split, Tile, Expand, Squeeze
Spatial Resize, Pad, CropResize, Resample,
AffineTransform, GridSample
Reduction Reduction, TopK, Sort, ArgMax, ArgMin
Scatter/Gather Gather, GatherND, Scatter, ScatterND
Misc Shape, Range, Random, Fill,
RingBuffer, InputView, Copy
Version APIs
from ane import ANECompiler ane = ANECompiler() print(f"MPS Dialect Version: {ane.mps_dialect_version}") print(f"MPS SPI Dialect Version: {ane.mps_spi_dialect_version}") print(f"Validate Network Version: {ane.validate_network_version}") print(f"Analytics Buffer Size: {ane.analytics_buffer_size}")
ANE Runtime Details
XPC Protocol
Communication with ANE hardware goes through the aned daemon via XPC.
Services
| Service | Purpose |
|---|---|
com.apple.appleneuralengine |
Main service (requires entitlements) |
com.apple.appleneuralengine.private |
Private/internal service |
com.apple.aned |
Daemon Mach service |
XPC Operations
Compilation:
-[_ANEDaemonConnection compileModel:sandboxExtension:options:qos:withReply:] -[_ANEDaemonConnection compiledModelExistsFor:withReply:] -[_ANEDaemonConnection compiledModelExistsMatchingHash:withReply:] -[_ANEDaemonConnection purgeCompiledModel:withReply:]
Loading:
-[_ANEDaemonConnection loadModel:sandboxExtension:options:qos:withReply:] -[_ANEDaemonConnection loadModelNewInstance:options:modelInstParams:qos:withReply:] -[_ANEDaemonConnection unloadModel:options:qos:withReply:]
Execution:
-[_ANEDaemonConnection prepareChainingWithModel:options:chainingReq:qos:withReply:]
Real-time:
-[_ANEDaemonConnection beginRealTimeTaskWithReply:] -[_ANEDaemonConnection endRealTimeTaskWithReply:]
Memory Management
ANE uses IOSurface for tensor memory, enabling zero-copy sharing with GPU/Metal.
EspressoANEIOSurface Methods:
-createIOSurfaceWithExtraProperties:
-metalBufferWithDevice:
-setExternalStorage:ioSurface:
-nFrames
-bytesPerFrame
-totalBytes
// ... 21 methods totalEntitlements
| Entitlement | Purpose | Required For |
|---|---|---|
com.apple.aned.private.allow |
Primary ANE access | compile, load, evaluate |
com.apple.aned.private.adapterWeight.allow |
Adapter weights access | Custom weight loading |
com.apple.aned.private.aggressivePowerSaving.allow |
Power saving modes | Low-power inference |
com.apple.ANECompilerService.allow |
Compiler service access | Model compilation |
com.apple.aned.private.processModelShare.allow |
Cross-process model sharing | Shared inference |
com.apple.ane.memoryUnwiringOptOutAccess.allow |
Memory unwiring control | Large model persistence |
com.apple.private.modelPurgeInAllPartitions.allow |
Model cache purging | Cache management |
com.apple.aned.private.secondaryANECompilerServiceAccess.allow |
Secondary compiler | Parallel compilation |
com.apple.private.ANEStorageMaintainer.allow |
Storage maintenance | Cache cleanup |
Boot Arguments (Internal/Debug Builds Only)
On Apple internal builds, these boot-args can bypass entitlement checks:
| Boot Arg | Purpose | Effect |
|---|---|---|
ane_skipAdapterWeightAccessCheck |
Bypass adapter weight entitlement | Skip com.apple.aned.private.adapterWeight.allow check |
ane_vm_allowPrecompiledBinary |
Allow precompiled binaries | Skip binary validation in VM |
ane_vm_debugDumpBootArg |
Enable debug dumps | Dump ANE state on errors |
ane_vm_forceValidationOnGuest |
Force validation in VM | Extra validation for VMs |
Note: These boot-args only work when isInternalBuild returns true (Apple internal builds only). Consumer macOS always returns false for isInternalBuild.
Internal Build Detection
The aned daemon checks for internal builds via _ANEDeviceInfo.isInternalBuild, which:
- Checks for
/AppleInternaldirectory existence - Queries
os_variant_has_internal_content("com.apple.aned") - Checks
os_variant_allows_internal_security_policies("com.apple.aned")
All checks return false on consumer macOS installations.
Model Cache
Compiled models are cached in:
/var/folders/<user_hash>/com.apple.aned/
Cache operations in aned:
com.apple.aned.modelCacheAsyncIOcom.apple.aned.modelCacheGCcom.apple.aned.danglingModelsGC
Runtime Class Reference
Key classes discovered through runtime introspection:
_ANEDeviceInfo (Class Methods)
+ (BOOL)hasANE; // Returns YES on Apple Silicon + (NSInteger)numANEs; // Number of ANE devices (usually 1) + (NSInteger)numANECores; // Number of cores (e.g., 16 for M1) + (NSString *)productName; // "macOS" + (NSString *)buildVersion; // e.g., "25B78" + (NSInteger)aneArchitectureType; // Hardware architecture identifier + (NSInteger)aneSubType; // Hardware subtype + (BOOL)isVirtualMachine; // VM detection + (BOOL)isInternalBuild; // Apple internal build detection + (BOOL)precompiledModelChecksDisabled; + (NSString *)bootArgs; // Current boot arguments + (BOOL)isBootArgPresent:(NSString *)arg; + (BOOL)isBoolBootArgSetTrue:(NSString *)arg;
_ANEStrings (Class Methods - Returns Constant Strings)
+ (NSString *)restrictedAccessEntitlement; // "com.apple.aned.private.allow" + (NSString *)adapterWeightsAccessEntitlement; // "com.apple.aned.private.adapterWeight.allow" + (NSString *)adapterWeightsAccessEntitlementBypassBootArg; // "ane_skipAdapterWeightAccessCheck" + (NSString *)internalLibraryPath; // "/AppleInternal/Library" + (NSString *)systemLibraryPath; // "/System/Library" // ... and many more
Hardware Info Example Output
hasANE = 1
numANEs = 1
numANECores = 16
productName = macOS
buildVersion = 25B78
isVirtualMachine = 0
isInternalBuild = 0
precompiledModelChecksDisabled = 0
Security Analysis
Attack Surface
1. XPC Message Handling
The aned daemon accepts XPC messages from clients. Potential vectors:
- Malformed model paths: Does
compileModel:properly validate URL paths? - Sandbox extensions:
sandboxExtension:parameter passes filesystem access tokens - Memory corruption: Large or malformed layer descriptors
- Race conditions: Concurrent compile/load/unload operations
2. IOSurface Sharing
IOSurface enables shared memory between processes:
Client Process aned Daemon ANE Hardware
────────────── ─────────── ────────────
│ │ │
│ Create IOSurface │ │
├──────────────────────►│ │
│ │ Map to ANE │
│ ├────────────────────►│
│ │ │
│ Write input data │ │
├───────────────────────┼────────────────────►│
│ │ │
│ Read output data │ │
│◄──────────────────────┼─────────────────────┤
Concerns:
- Shared memory lifetime management
- Buffer overflow if sizes mismatch
- Use-after-free on premature unmap
3. Model Cache
The /var/folders/.../com.apple.aned/ cache:
- World-readable in some configurations
- Contains compiled ANE bytecode
- Could leak model architecture details
What Works Without Entitlements
These operations succeed without code signing:
- Framework loading: All three frameworks load via dlopen/ctypes
- Struct initialization: All
ANEC*Initialize()functions callable - Size probing: Can determine struct layouts by sentinel analysis
- CPU inference:
EspressoContext(platform=0)works - Model parsing: Read and parse
.espresso.netfiles - Client creation:
_ANEClientobject creation succeeds
What Fails Without Entitlements
These operations fail silently (no error, just NULL return):
compileModel:options:qos:error:- returns nilloadModel:options:qos:error:- returns nilevaluateWithModel:options:request:qos:error:- returns nil_ANEDeviceController- can't access valid device
Security note: Silent failures make debugging difficult but also prevent enumeration of error conditions.
Performance Analysis
Profiling APIs
Layer-Level Profiling
@interface EspressoProfilingLayerInfo : NSObject @property (readonly) NSString *name; @property (readonly) NSString *debug_name; @property (readonly) double average_runtime; // seconds @property (readonly) int selected_runtime_engine; // 0=CPU, 1=GPU, 2=ANE @property (readonly) NSArray *runtimes; @end
Network-Level ANE Profiling
@interface EspressoProfilingNetworkANEInfo : NSObject @property (readonly) uint64_t total_ane_time_ns; @property (readonly) uint64_t ane_time_per_eval_ns; @end
Request-Level Stats
@interface _ANERequest : NSObject @property uint32_t perfStatsMask; // Bitmask for which stats to collect @property (readonly) id perfStats; @property (readonly) NSArray *perfStatsArray; @end
Operation Mapping
Operations with Native ANE Support
These map 1:1 to ANE instructions:
- Convolution (all variants)
- Matrix multiplication
- Scaled Dot-Product Attention (SDPA)
- Softmax
- Common activations (ReLU, GeLU, Tanh)
- Pooling operations
- Element-wise arithmetic
Operations That Get Decomposed
These are broken into multiple ANE ops:
- LayerNorm → multiple passes
- Complex reductions
- Non-standard activations
- Dynamic shapes
Fallback to CPU/GPU
Operations fall back when:
- Tensor too large for ANE SRAM
- Unsupported operation type
- Dynamic control flow
- Precision requirements exceed INT8/FP16
Example Runthrough
Building a CNN Graph
from ane import SimpleANEGraph # Create graph builder graph = SimpleANEGraph() # Input: (batch=1, channels=3, height=224, width=224) graph.add_conv2d("conv1", (1, 3, 224, 224), out_channels=64, kernel_size=7, stride=2, padding=3) # Output: (1, 64, 112, 112) graph.add_pool2d("pool1", (1, 64, 112, 112), kernel_size=3, stride=2) # Output: (1, 64, 56, 56) graph.add_conv2d("conv2", (1, 64, 56, 56), out_channels=128, kernel_size=3, padding=1) # Output: (1, 128, 56, 56) graph.add_conv2d("conv3", (1, 128, 56, 56), out_channels=256, kernel_size=3, padding=1) # Output: (1, 256, 56, 56) graph.add_pool2d("pool2", (1, 256, 56, 56), kernel_size=2, stride=2) # Output: (1, 256, 28, 28) graph.add_linear("fc1", input_features=256*28*28, output_features=1024) graph.add_linear("fc2", input_features=1024, output_features=1000) graph.add_softmax("softmax", (1, 1000)) print(graph.summary())
Output:
ANE Computation Graph
============================================================
conv1 (conv2d)
Input: (1, 3, 224, 224)
Output: (1, 64, 112, 112)
Desc: 176 bytes
Kernel: 7x7
Stride: 2x2
Pad: 3,3
pool1 (pool2d)
Input: (1, 64, 112, 112)
Output: (1, 64, 56, 56)
Desc: 96 bytes
Kernel: 3x3
Stride: 2x2
...
============================================================
Total layers: 8
Total descriptor bytes: 680
Building Transformer Attention
from ane import build_transformer_attention graph = build_transformer_attention() print(graph.summary())
Output:
ANE Computation Graph
============================================================
proj_qkv (linear)
Input: (512, 512, 1, 1)
Output: (512, 1536, 1, 1)
Desc: 64 bytes
attention (sdpa)
Input: (1, 8, 512, 64)
Output: (1, 8, 512, 64)
Desc: 8 bytes <-- Native transformer attention!
proj_out (linear)
Input: (512, 512, 1, 1)
Output: (512, 512, 1, 1)
Desc: 64 bytes
============================================================
Total layers: 3
Total descriptor bytes: 136
Loading Espresso Models
from ane import ( create_espresso_cpu_context, load_espresso_network, get_network_layer_count, EspressoNet, ) # Method 1: Direct runtime loading (CPU only without entitlements) ctx = create_espresso_cpu_context() print(f"Context: {hex(ctx)}") model_path = "/path/to/model.espresso.net" net = load_espresso_network(model_path, ctx) print(f"Network: {hex(net)}") print(f"Layers: {get_network_layer_count(net)}") # Method 2: Parse the file directly model = EspressoNet.from_file(model_path) print(f"Format version: {model.format_version}") print(f"Layer types: {model.layer_type_counts()}") # Analyze inner_product layers for quantization for ip in model.get_inner_product_info(): print(f" {ip['name']}: {ip['nB']}x{ip['nC']}, " f"quant={ip['quantization_mode']}, lookup={ip['is_lookup']}")
Decoding PBZE Files
from ane import decode_espresso_net, get_pbze_stats, is_pbze_file path = "/System/Library/SomeFramework/model.espresso.net" # Check format if is_pbze_file(path): stats = get_pbze_stats(path) print(f"Compressed size: {stats['compressed_size']} bytes") print(f"Uncompressed size: {stats['uncompressed_size']} bytes") print(f"Compression ratio: {stats['compression_ratio']:.2f}x") # Decode (handles both JSON and PBZE automatically) data = decode_espresso_net(path) print(f"Layers: {len(data['layers'])}")
Using the Native Helper
For full ANE access, use the signed Objective-C helper:
# Build and sign cd helper ./build.sh "Developer ID Application: Your Name (TEAMID)" # Check status echo '{"cmd": "status"}' | ./ane_helper # {"ok":true,"client":true,"model_count":0,"model_ids":[]} # Compile a model echo '{"cmd": "compile", "model_path": "/path/to/model.mlmodelc"}' | ./ane_helper # {"ok":true,"model_id":"ABC123","state":1} # Load into ANE memory echo '{"cmd": "load", "model_id": "ABC123"}' | ./ane_helper # {"ok":true,"model_id":"ABC123","program_handle":12345} # Unload echo '{"cmd": "unload", "model_id": "ABC123"}' | ./ane_helper # {"ok":true}
Comprehensive Reference
Complete Layer Type Reference
Espresso Layer Types (from system model analysis)
| Type | Category | Attributes |
|---|---|---|
activation |
Compute | type (relu/gelu/tanh/sigmoid/etc), alpha, beta |
batch_matmul |
Compute | transpose_a, transpose_b, adj_x, adj_y |
concat |
Shape | axis |
convolution |
Compute | kernel_size, stride, pad, C, groups, dilation |
dynamic_dequantize |
Quantization | scale_blob, zero_point_blob |
dynamic_quantize |
Quantization | axis, mode |
elementwise |
Compute | operation, alpha, broadcast |
expand_dims |
Shape | axes |
general_concat |
Shape | axis, interleave |
general_slice |
Shape | starts, ends, strides, axes |
get_shape |
Utility | (no special attributes) |
inner_product |
Compute | nB, nC, has_biases, quantization_mode, is_lookup |
instancenorm_1d |
Normalization | C, epsilon |
load_constant |
Memory | blob_weights, shape |
nonzero |
Utility | (no special attributes) |
reduce |
Compute | mode (sum/mean/max/min/prod), axes, keepdims |
reshape |
Shape | shape |
scatter_nd |
Memory | (no special attributes) |
softmax |
Compute | axis |
split_nd |
Shape | axis, num_splits, split_sizes |
tile |
Shape | reps |
transpose |
Shape | axes |
ANE Compiler Struct Sizes
| Struct | Size (bytes) | Initialize Function |
|---|---|---|
| ANECAffineTransformLayerDesc | 48 | ANECAffineTransformLayerDescInitialize |
| ANECBatchNormLayerDesc | 40 | ANECBatchNormLayerDescInitialize |
| ANECConcatLayerDesc | 16 | ANECConcatLayerDescInitialize |
| ANECConvLayerDesc | 176 | ANECConvLayerDescInitialize |
| ANECCropResizeLayerDesc | 64 | ANECCropResizeLayerDescInitialize |
| ANECCrossCorrelationLayerDesc | 96 | ANECrossCorrelationLayerDescInitialize |
| ANECDropoutLayerDesc | 16 | ANECDropoutLayerDescInitialize |
| ANECExpandLayerDesc | 32 | ANECExpandLayerDescInitialize |
| ANECFillLayerDesc | 24 | ANECFillLayerDescInitialize |
| ANECFlattenLayerDesc | 16 | ANECFlattenLayerDescInitialize |
| ANECGatherLayerDesc | 24 | ANECGatherLayerDescInitialize |
| ANECGatherNDLayerDesc | 24 | ANECGatherNDLayerDescInitialize |
| ANECGridSampleLayerDesc | 32 | ANECGridSampleLayerDescInitialize |
| ANECGroupNormLayerDesc | 40 | ANECGroupNormLayerDescInitialize |
| ANECInputViewLayerDesc | 32 | ANECInputViewLayerDescInitialize |
| ANECKernelSize | 24 | ANECKernelSizeInitialize |
| ANECLRNLayerDesc | 32 | ANECLRNLayerDescInitialize |
| ANECLayerNormLayerDesc | 40 | ANECLayerNormLayerDescInitialize |
| ANECLinearLayerDesc | 64 | ANECLinearLayerDescInitialize |
| ANECMatrixMultLayerDesc | 16 | ANECMatrixMultLayerDescInitialize |
| ANECNMSLayerDesc | 48 | ANECNMSLayerDescInitialize |
| ANECNeuronLayerDesc | 32 | ANECNeuronLayerDescInitialize |
| ANECNormLayerDesc | 40 | ANECNormLayerDescInitialize |
| ANECPadLayerDesc | 48 | ANECPadLayerDescInitialize |
| ANECPadding | 24 | ANECPaddingInitialize |
| ANECPoolLayerDesc | 96 | ANECPoolLayerDescInitialize |
| ANECRandomLayerDesc | 32 | ANECRandomLayerDescInitialize |
| ANECReductionLayerDesc | 24 | ANECReductionLayerDescInitialize |
| ANECResampleLayerDesc | 48 | ANECResampleLayerDescInitialize |
| ANECReshapeLayerDesc | 48 | ANECReshapeLayerDescInitialize |
| ANECResizeLayerDesc | 40 | ANECResizeLayerDescInitialize |
| ANECRingBufferLayerDesc | 32 | ANECRingBufferLayerDescInitialize |
| ANECSDPALayerDesc | 8 | ANECSDPALayerDescInitialize |
| ANECScatterLayerDesc | 24 | ANECScatterLayerDescInitialize |
| ANECScatterNDLayerDesc | 24 | ANECScatterNDLayerDescInitialize |
| ANECShapeLayerDesc | 16 | ANECShapeLayerDescInitialize |
| ANECSoftmaxLayerDesc | 48 | ANECSoftmaxLayerDescInitialize |
| ANECSortLayerDesc | 24 | ANECSortLayerDescInitialize |
| ANECSplitLayerDesc | 24 | ANECSplitLayerDescInitialize |
| ANECSqueezeLayerDesc | 32 | ANECSqueezeLayerDescInitialize |
| ANECStep | 12 | ANECStepInitialize |
| ANECTensorDesc | 64 | ANECTensorDescInitialize |
| ANECTensorDims | 40 | ANECTensorDimsInitialize |
| ANECTileLayerDesc | 32 | ANECTileLayerDescInitialize |
| ANECTopKLayerDesc | 24 | ANECTopKLayerDescInitialize |
| ANECTransposeLayerDesc | 32 | ANECTransposeLayerDescInitialize |
| ANECUnflattenLayerDesc | 24 | ANECUnflattenLayerDescInitialize |
Espresso Optimization Passes
All discovered Pass_* classes in Espresso.framework:
Pass_add_fp16_fp32_conversions
Pass_batch_matmul_transpose_fusion
Pass_broadcast_optimization
Pass_canonicalize_ops
Pass_constant_folding
Pass_convert_gather_to_slice
Pass_convert_to_ane_layout
Pass_dead_code_elimination
Pass_decompose_complex_ops
Pass_eliminate_identity_ops
Pass_eliminate_redundant_transpose
Pass_fold_constants
Pass_fuse_activation
Pass_fuse_add_mul
Pass_fuse_bias
Pass_fuse_conv_batchnorm
Pass_fuse_conv_bias
Pass_fuse_elementwise
Pass_fuse_gelu
Pass_fuse_layernorm
Pass_fuse_linear_ops
Pass_fuse_matmul_add
Pass_fuse_mul_add
Pass_fuse_pad_conv
Pass_fuse_reshape_transpose
Pass_insert_copies_for_ane
Pass_legalize_for_ane
Pass_lower_to_ane_ops
Pass_optimize_memory_layout
Pass_optimize_reshape_chain
Pass_optimize_transpose
Pass_propagate_shapes
Pass_quantize_weights
Pass_remove_unused_outputs
Pass_replace_div_with_mul
Pass_simplify_arithmetic
Pass_split_large_tensors
Pass_tensor_parallel_partition
Pass_tile_for_ane
Pass_vectorize_ops
ObjC Class Methods Reference
_ANEClient
// Lifecycle - (instancetype)initWithRestrictedAccessAllowed:(BOOL)allowed; // Compilation - (BOOL)compileModel:(id)model options:(id)opts qos:(int)qos error:(NSError**)err; - (BOOL)compiledModelExistsFor:(id)model; - (BOOL)compiledModelExistsMatchingHash:(NSData*)hash; - (BOOL)purgeCompiledModel:(id)model; // Loading - (BOOL)loadModel:(id)model options:(id)opts qos:(int)qos error:(NSError**)err; - (BOOL)loadModelNewInstance:(id)model options:(id)opts modelInstParams:(id)params qos:(int)qos error:(NSError**)err; - (BOOL)loadRealTimeModel:(id)model options:(id)opts qos:(int)qos error:(NSError**)err; - (BOOL)unloadModel:(id)model options:(id)opts qos:(int)qos error:(NSError**)err; // Evaluation - (BOOL)evaluateWithModel:(id)model options:(id)opts request:(id)req qos:(int)qos error:(NSError**)err; - (BOOL)evaluateRealTimeWithModel:(id)model options:(id)opts request:(id)req error:(NSError**)err; // Memory - (BOOL)mapIOSurfacesWithModel:(id)model request:(id)req cacheInference:(BOOL)cache error:(NSError**)err; - (void)unmapIOSurfacesWithModel:(id)model request:(id)req; // Chaining - (BOOL)prepareChainingWithModel:(id)model options:(id)opts chainingReq:(id)req qos:(int)qos error:(NSError**)err;
_ANEModel
// Initialization - (instancetype)initWithModelAtURL:(NSURL*)url key:(NSString*)key identifierSource:(int)src cacheURLIdentifier:(NSString*)cacheId modelAttributes:(id)attrs standardizeURL:(BOOL)standardize; - (instancetype)initWithModelIdentifier:(id)identifier; // Properties @property (readonly) NSURL *modelURL; @property (readonly) NSURL *sourceURL; @property (readonly) NSString *UUID; @property (readonly) NSString *key; @property (readonly) int state; // 1 = created/unloaded @property (readonly) uint64_t programHandle; @property (readonly) uint64_t intermediateBufferHandle; @property (readonly) int queueDepth; @property (readonly) uint32_t perfStatsMask; @property (readonly) id mpsConstants;
_ANERequest
// Initialization - (instancetype)initWithInputs:(NSArray*)inputs inputIndices:(NSArray*)inputIndices outputs:(NSArray*)outputs outputIndices:(NSArray*)outputIndices weightsBuffer:(id)weights perfStats:(id)stats procedureIndex:(int)procIdx sharedEvents:(id)events transactionHandle:(uint64_t)handle; // Properties @property (readonly) NSArray *inputArray; @property (readonly) NSArray *inputIndexArray; @property (readonly) NSArray *outputArray; @property (readonly) NSArray *outputIndexArray; @property (readonly) id weightsBuffer; @property (readonly) int procedureIndex; @property (readonly) id perfStats; @property (readonly) NSArray *perfStatsArray; @property (copy) void (^completionHandler)(BOOL, NSError*); @property (readonly) id sharedEvents; @property (readonly) uint64_t transactionHandle;
Running Tests
# Run all tests pytest tests/test_ane.py -v # Run specific test class pytest tests/test_ane.py::TestANECompiler -v # Run with coverage pytest tests/test_ane.py --cov=ane --cov-report=term-missing
Test categories:
TestANEStructs- Data structure serializationTestANECompiler- Framework loading and initializationTestANEHelpers- Utility functionsTestANESample- Graph buildingTestANELayerSizes- Probed struct sizesTestEspressoDiscovery- ObjC class introspectionTestEspressoFormat- Model file parsingTestPBZE- Compression/decompressionTestANEXPC- XPC protocol discoveryTestAPITree- Knowledge base API tree
HWX File Format & Execution Research
Working Path: CoreML API
The simplest way to execute models on ANE is through CoreML's public API:
// Objective-C MLModelConfiguration *config = [[MLModelConfiguration alloc] init]; config.computeUnits = MLComputeUnitsAll; // Enables ANE MLModel *model = [MLModel modelWithContentsOfURL:modelURL configuration:config error:&error];
# Python with coremltools import coremltools as ct model = ct.models.MLModel("model.mlpackage", compute_units=ct.ComputeUnit.ALL)
HWX Binary Format
Pre-compiled ANE binaries (.hwx files) have a Mach-O-like structure:
| Offset | Value | Description |
|---|---|---|
| 0x00 | 0xBEEFFACE |
Magic number |
| 0x04 | varies | Header info |
| ... | __PAGEZERO |
Zero page segment |
| ... | __DATA |
Data segment |
| ... | __FVMLIB |
ANE instructions |
Key insight: HWX files cannot be loaded alone - they require a companion .espresso.net file that describes the network structure.
Espresso Model Bundle Structure
A complete Espresso model bundle contains:
| File | Description |
|---|---|
model.espresso.net |
Network description (JSON or PBZE) |
model.espresso.weights |
Binary weights data |
model.espresso.shape |
Shape information |
model.H14.espresso.hwx |
Pre-compiled ANE binary (chip-specific) |
model.H14.espresso.precompilation_info |
Compiler metadata (JSON) |
Different .hwx files exist for different ANE generations:
.H13.espresso.hwx- A14/M1 generation.H14.espresso.hwx- A15/M2 generation.H15.espresso.hwx- A16/M3 generation.H16.espresso.hwx- A17/M4 generation
API Layer Summary
| Layer | Status | Notes |
|---|---|---|
| CoreML | ✅ Working | Use MLComputeUnitsAll, system handles everything |
| XPC to aned | ✅ Working | _ANEClient.sharedConnection works |
| ANEServices | Model loading needs .espresso.net |
|
| Espresso | Platform 2 (ANE) context crashes | |
| IOKit Direct | ❌ Blocked | Requires com.apple.ane.iokit-user-access |
MLComputePlan Device Masks
When inspecting MLComputePlan.computeDevicesBySupportedComputeUnits:
| Mask | Devices |
|---|---|
| 1 | CPU only |
| 2 | GPU only |
| 3 | CPU + GPU |
| 4 | Neural Engine only |
| 5 | CPU + Neural Engine |
| 6 | GPU + Neural Engine |
| 7 | CPU + GPU + Neural Engine (all) |
Hardware Detection
// Get ANE device info Class deviceClass = NSClassFromString(@"MLNeuralEngineComputeDevice"); id device = [deviceClass performSelector:@selector(physicalDevice)]; NSInteger cores = [[device valueForKey:@"totalCoreCount"] integerValue]; // Returns 16 on M3 Pro
License
This project contains reverse engineering artifacts for research and interoperability purposes. Use responsibly.
Acknowledgments
- Apple's private frameworks documentation from class-dump and dyld_info
- The tinygrad community for ANE exploration inspiration