[proxy] github.com← back | site home | direct (HTTPS) ↗ | proxy home | ◑ dark◐ light

GitHub - mdaiter/ane: Reverse engineered the Apple Neural Engine, with working Python and Objective C samples

mdaiter

Apple Neural Engine (ANE) Reverse Engineering

Reverse engineering artifacts for Apple's Neural Engine stack: ANECompiler, Espresso, and AppleNeuralEngine frameworks.

Target Audience: Performance engineers and security researchers working with Apple silicon ML acceleration.

Table of Contents


Key Findings

Discovery Details Significance
SDPA Layer ANECSDPALayerDesc is only 8 bytes Native transformer attention in ANE hardware
40+ Optimization Passes Pass_fuse_conv_batchnorm, Pass_fold_constants, etc. Full Espresso compiler pipeline discoverable
XPC Daemon Architecture aned at /usr/libexec/aned Privilege boundary for ANE access
Entitlement Bypass Struct init functions work without signing Can probe all layer descriptor layouts
PBZE Format LZFSE-compressed espresso.net System models decodable with libcompression
Silent Failures compileModel: returns NULL without error Operations fail silently without entitlements
IOSurface Memory EspressoANEIOSurface (21 methods) Zero-copy tensor sharing with Metal
Quantization Modes quantization_mode:2 on inner_product ANE-specific quantization discovered
CoreML ANE Path MLComputeUnitsAll enables ANE Working path for ANE execution via public API
HWX Binary Format Magic 0xBEEFFACE, Mach-O-like Pre-compiled ANE instructions per chip generation
16 ANE Cores M3 Pro has 16 neural engine cores Confirmed via MLNeuralEngineComputeDevice

Quick Reference: What Works Without Entitlements

Operation Works? Notes
Load ANECompiler.framework Yes All frameworks load
Call ANEC*Initialize() Yes Can probe struct sizes
Create EspressoContext (CPU) Yes Platform 0 works
Load EspressoNetwork Yes CPU inference works
Create _ANEClient Yes Object created but...
Call compileModel: No Returns NULL silently
Call loadModel: No Returns NULL silently
ANE inference No Requires entitlements
CoreML with ANE Yes Use MLComputeUnitsAll - working path!
XPC to aned Yes Connection succeeds, ops need entitlements
MLComputePlan Yes Can inspect device availability

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                       User Application                           │
│                  (Core ML, Create ML, BNNS)                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─────────────────────────┐    ┌─────────────────────────────┐ │
│  │   Espresso.framework    │    │ AppleNeuralEngine.framework │ │
│  │   ─────────────────     │    │ ─────────────────────────── │ │
│  │  • EspressoContext      │    │  • _ANEClient               │ │
│  │  • EspressoNetwork      │    │  • _ANEModel                │ │
│  │  • 40+ Pass_* classes   │    │  • _ANERequest              │ │
│  │  • CPU/GPU/ANE dispatch │    │  • _ANEDaemonConnection     │ │
│  └───────────┬─────────────┘    └──────────────┬──────────────┘ │
│              │                                  │                │
├──────────────┴──────────────────────────────────┴────────────────┤
│                     ANECompiler.framework                        │
│                     ─────────────────────                        │
│  • ANECConvLayerDesc (176 bytes)    • ANECSDPALayerDesc (8 bytes)│
│  • ANECPoolLayerDesc (96 bytes)     • ANECLinearLayerDesc (64 B) │
│  • ANECTensorDims (40 bytes)        • 30+ layer descriptors      │
├─────────────────────────────────────────────────────────────────┤
│                    XPC Transport Layer                           │
│            Service: com.apple.appleneuralengine                  │
├─────────────────────────────────────────────────────────────────┤
│                    aned (/usr/libexec/aned)                      │
│                    ───────────────────────                       │
│  • ANEProgramCreate()          • Model cache management          │
│  • ANEProgramInstanceCreate()  • Garbage collection              │
│  • Sandbox extension handling  • Telemetry                       │
├─────────────────────────────────────────────────────────────────┤
│                    ANE Hardware (M1/M2/M3+)                      │
│                    ───────────────────────                       │
│  • 16 neural engine cores      • Dedicated SRAM                  │
│  • Up to 15.8 TOPS (M1)        • IOSurface DMA                   │
└─────────────────────────────────────────────────────────────────┘

Data Flow: Model Compilation

.mlmodelc/                      aned daemon                 Hardware
────────────                    ───────────                 ────────
     │                               │                          │
     │  1. _ANEModel create          │                          │
     ├──────────────────────────────►│                          │
     │                               │                          │
     │  2. compileModel: (XPC)       │                          │
     ├──────────────────────────────►│                          │
     │                               │  3. ANECompiler          │
     │                               ├─────────────────────────►│
     │                               │                          │
     │                               │  4. ANEProgramCreate()   │
     │                               ├─────────────────────────►│
     │                               │                          │
     │  5. Return program handle     │                          │
     │◄──────────────────────────────┤                          │
     │                               │                          │
     │  6. loadModel: (XPC)          │                          │
     ├──────────────────────────────►│                          │
     │                               │  7. Map to ANE memory    │
     │                               ├─────────────────────────►│
     │                               │                          │
     │  8. evaluateWithModel:        │  9. Execute on ANE       │
     ├──────────────────────────────►├─────────────────────────►│
     │                               │                          │

Repository Structure

ane/
├── __init__.py          # Package exports, builds API tree for tooling
├── compiler.py          # ANECompiler.framework ctypes bindings
│                        #   - Layer descriptor structs
│                        #   - ANEC*Initialize() wrappers
│                        #   - Struct size probing
│
├── espresso.py          # Espresso model format parser
│                        #   - EspressoNet, EspressoLayer classes
│                        #   - Layer type documentation
│                        #   - CPU vs ANE model comparison
│
├── runtime.py           # Espresso/ANE runtime bindings
│                        #   - EspressoContext creation
│                        #   - EspressoNetwork loading
│                        #   - ObjC class introspection
│
├── xpc.py               # ANE XPC protocol documentation
│                        #   - _ANEDaemonConnection methods
│                        #   - _ANEClient methods
│                        #   - XPC operation categories
│
├── pbze.py              # PBZE (compressed espresso.net) decoder
│                        #   - LZFSE decompression via libcompression
│                        #   - Header parsing
│                        #   - Compression statistics
│
├── sample.py            # Example graph building code
│                        #   - SimpleANEGraph class
│                        #   - CNN and Transformer examples
│
├── tests/
│   └── test_ane.py      # Comprehensive pytest suite (623 lines)
│
└── helper/
    ├── ane_helper.m     # Objective-C helper for privileged ANE access
    ├── ane_helper.entitlements
    └── build.sh         # Build script

Espresso Engine Teardown

Espresso is Apple's internal ML inference runtime that powers Core ML. It handles model execution across CPU, GPU, and ANE.

Model Format (.espresso.net)

Two formats exist:

  1. JSON (human-readable):
{
  "format_version": 200,
  "storage": "model.espresso.weights",
  "layers": [
    {
      "name": "conv1",
      "type": "convolution",
      "bottom": "input",
      "top": "conv1_output",
      "kernel_size": 3,
      "stride": 1,
      "pad": 1,
      "C": 64
    }
  ],
  "analyses": {},
  "properties": {}
}
  1. PBZE (binary, LZFSE-compressed):
Offset  Size  Description
──────  ────  ───────────
0x00    4     Magic: b'pbze'
0x04    4     Version (usually 0)
0x08    8     Unknown (header size?)
0x10    4     Uncompressed size (BIG ENDIAN!)
0x14    4     Unknown
0x18    4     Padding
0x1C    ...   LZFSE data (starts with b'bvx2')

Layer Types

Compute Layers

Type Description Key Attributes
inner_product Dense/fully-connected nB, nC, quantization_mode, is_lookup, has_biases
convolution 2D convolution kernel_size, stride, pad, C, groups
batch_matmul Batched matrix multiply transpose_a, transpose_b
elementwise Binary/unary operations operation (see operation codes below)
activation Nonlinearities type (relu, gelu, tanh, sigmoid, etc.)
softmax Softmax normalization axis
reduce Reduction operations mode (sum, mean, max, min, prod)

Memory/Shape Layers

Type Description Key Attributes
reshape Tensor reshape shape
transpose Permute dimensions axes
concat Concatenate tensors axis
general_concat N-D concatenation axis, flexible inputs
split_nd Split along axis axis, num_splits or split_sizes
general_slice Slice tensor starts, ends, strides
expand_dims Add dimension axes
load_constant Load constant tensor blob_weights

Quantization Layers

Type Description Notes
dynamic_quantize Runtime quantization Converts FP to INT8
dynamic_dequantize Runtime dequantization Converts INT8 to FP

Special Layers

Type Description
instancenorm_1d Instance normalization
get_shape Returns tensor shape
nonzero Find nonzero indices
scatter_nd Scatter operation
tile Tile/repeat tensor

Elementwise Operation Codes

Code   Operation          Code   Operation
────   ─────────          ────   ─────────
0      add                25     pow
1      sub                26     exp
2      mul                27     log
3      div                28     abs
4      floor_div          
                          101    select (ternary: a ? b : c)
10     max                105    less_than
11     min                106    less_equal
                          107    not_equal
20     sqrt               108    equal
21     rsqrt              109    greater_equal
22     square             110    greater_than
23     neg                
24     reciprocal         117    floor
                          118    ceil

CPU vs ANE Model Differences

When a model is compiled for ANE, several transformations occur:

Aspect CPU Model ANE Model
Layer count Fewer More (ops decomposed)
Reshape ops reshape layer Often replaced with convolution
Embeddings inner_product inner_product with is_lookup:1
FC layers inner_product inner_product with quantization_mode:2
Tensor manipulation Single ops split_nd/concat chains

Example: A model with 50 CPU layers might have 80+ ANE layers due to operation decomposition.

Optimization Passes (40+ discovered)

Espresso includes extensive optimization passes accessible via EspressoCustomPass subclasses:

Pass_fuse_conv_batchnorm          # Fuse BN into conv weights
Pass_fold_constants               # Constant folding
Pass_eliminate_dead_code          # DCE
Pass_fuse_activation              # Fuse relu/gelu into preceding op
Pass_optimize_transpose           # Eliminate redundant transposes
Pass_convert_to_ane_layout        # Convert to ANE memory layout
Pass_quantize_weights             # Weight quantization
Pass_split_large_tensors          # Split tensors for ANE tile size
... (and 30+ more)

Compiler Engine Teardown

ANECompiler.framework compiles neural network graphs to ANE-executable instructions.

Layer Descriptor Sizes (Runtime Probed)

All sizes determined by calling ANEC*Initialize() with a sentinel-filled buffer:

Struct Size Field Layout (inferred)
ANECKernelSize 24 3x u64: depth, height, width
ANECStep 12 3x u32: depth, height, width
ANECPadding 24 6x u32: d_front, d_back, h_front, h_back, w_front, w_back
ANECTensorDims 40 5x u64: N, C, H, W, D
ANECTensorDesc 64 ptr(8) + dims(48) + flags(8)
ANECConvLayerDesc 176 Kernel, stride, padding, dilation, groups, etc.
ANECPoolLayerDesc 96 Kernel, stride, pool type, etc.
ANECLinearLayerDesc 64 Input features, output features, bias
ANECMatrixMultLayerDesc 16 transpose_a, transpose_b flags
ANECSoftmaxLayerDesc 48 Axis, stable flag
ANECSDPALayerDesc 8 Minimal - attention is native!
ANECNeuronLayerDesc 32 Activation type, params
ANECReductionLayerDesc 24 Reduction mode, axes
ANECReshapeLayerDesc 48 Target shape
ANECTransposeLayerDesc 32 Permutation
ANECConcatLayerDesc 16 Axis
ANECGatherLayerDesc 24 Axis, batch_dims

Layer Categories (All 40+ Discovered)

Category              Layer Types
────────              ───────────
Attention/Transformer SDPA
Convolution           Conv, CrossCorrelation, DepthwiseConv
Pooling               Pool, GlobalPool, AdaptivePool
Normalization         Norm, BatchNorm, LayerNorm, GroupNorm, LRN
Linear/Matrix         Linear, MatrixMult, Einsum
Activation            Neuron, Softmax, LogSoftmax, Dropout
Reshape/Layout        Reshape, Transpose, Flatten, Unflatten, 
                      Concat, Split, Tile, Expand, Squeeze
Spatial               Resize, Pad, CropResize, Resample, 
                      AffineTransform, GridSample
Reduction             Reduction, TopK, Sort, ArgMax, ArgMin
Scatter/Gather        Gather, GatherND, Scatter, ScatterND
Misc                  Shape, Range, Random, Fill, 
                      RingBuffer, InputView, Copy

Version APIs

from ane import ANECompiler

ane = ANECompiler()
print(f"MPS Dialect Version: {ane.mps_dialect_version}")
print(f"MPS SPI Dialect Version: {ane.mps_spi_dialect_version}")
print(f"Validate Network Version: {ane.validate_network_version}")
print(f"Analytics Buffer Size: {ane.analytics_buffer_size}")

ANE Runtime Details

XPC Protocol

Communication with ANE hardware goes through the aned daemon via XPC.

Services

Service Purpose
com.apple.appleneuralengine Main service (requires entitlements)
com.apple.appleneuralengine.private Private/internal service
com.apple.aned Daemon Mach service

XPC Operations

Compilation:

-[_ANEDaemonConnection compileModel:sandboxExtension:options:qos:withReply:]
-[_ANEDaemonConnection compiledModelExistsFor:withReply:]
-[_ANEDaemonConnection compiledModelExistsMatchingHash:withReply:]
-[_ANEDaemonConnection purgeCompiledModel:withReply:]

Loading:

-[_ANEDaemonConnection loadModel:sandboxExtension:options:qos:withReply:]
-[_ANEDaemonConnection loadModelNewInstance:options:modelInstParams:qos:withReply:]
-[_ANEDaemonConnection unloadModel:options:qos:withReply:]

Execution:

-[_ANEDaemonConnection prepareChainingWithModel:options:chainingReq:qos:withReply:]

Real-time:

-[_ANEDaemonConnection beginRealTimeTaskWithReply:]
-[_ANEDaemonConnection endRealTimeTaskWithReply:]

Memory Management

ANE uses IOSurface for tensor memory, enabling zero-copy sharing with GPU/Metal.

EspressoANEIOSurface Methods:

-createIOSurfaceWithExtraProperties:
-metalBufferWithDevice:
-setExternalStorage:ioSurface:
-nFrames
-bytesPerFrame
-totalBytes
// ... 21 methods total

Entitlements

Entitlement Purpose Required For
com.apple.aned.private.allow Primary ANE access compile, load, evaluate
com.apple.aned.private.adapterWeight.allow Adapter weights access Custom weight loading
com.apple.aned.private.aggressivePowerSaving.allow Power saving modes Low-power inference
com.apple.ANECompilerService.allow Compiler service access Model compilation
com.apple.aned.private.processModelShare.allow Cross-process model sharing Shared inference
com.apple.ane.memoryUnwiringOptOutAccess.allow Memory unwiring control Large model persistence
com.apple.private.modelPurgeInAllPartitions.allow Model cache purging Cache management
com.apple.aned.private.secondaryANECompilerServiceAccess.allow Secondary compiler Parallel compilation
com.apple.private.ANEStorageMaintainer.allow Storage maintenance Cache cleanup

Boot Arguments (Internal/Debug Builds Only)

On Apple internal builds, these boot-args can bypass entitlement checks:

Boot Arg Purpose Effect
ane_skipAdapterWeightAccessCheck Bypass adapter weight entitlement Skip com.apple.aned.private.adapterWeight.allow check
ane_vm_allowPrecompiledBinary Allow precompiled binaries Skip binary validation in VM
ane_vm_debugDumpBootArg Enable debug dumps Dump ANE state on errors
ane_vm_forceValidationOnGuest Force validation in VM Extra validation for VMs

Note: These boot-args only work when isInternalBuild returns true (Apple internal builds only). Consumer macOS always returns false for isInternalBuild.

Internal Build Detection

The aned daemon checks for internal builds via _ANEDeviceInfo.isInternalBuild, which:

  1. Checks for /AppleInternal directory existence
  2. Queries os_variant_has_internal_content("com.apple.aned")
  3. Checks os_variant_allows_internal_security_policies("com.apple.aned")

All checks return false on consumer macOS installations.

Model Cache

Compiled models are cached in:

/var/folders/<user_hash>/com.apple.aned/

Cache operations in aned:

  • com.apple.aned.modelCacheAsyncIO
  • com.apple.aned.modelCacheGC
  • com.apple.aned.danglingModelsGC

Runtime Class Reference

Key classes discovered through runtime introspection:

_ANEDeviceInfo (Class Methods)

+ (BOOL)hasANE;                    // Returns YES on Apple Silicon
+ (NSInteger)numANEs;              // Number of ANE devices (usually 1)
+ (NSInteger)numANECores;          // Number of cores (e.g., 16 for M1)
+ (NSString *)productName;         // "macOS"
+ (NSString *)buildVersion;        // e.g., "25B78"
+ (NSInteger)aneArchitectureType;  // Hardware architecture identifier
+ (NSInteger)aneSubType;           // Hardware subtype
+ (BOOL)isVirtualMachine;          // VM detection
+ (BOOL)isInternalBuild;           // Apple internal build detection
+ (BOOL)precompiledModelChecksDisabled;
+ (NSString *)bootArgs;            // Current boot arguments
+ (BOOL)isBootArgPresent:(NSString *)arg;
+ (BOOL)isBoolBootArgSetTrue:(NSString *)arg;

_ANEStrings (Class Methods - Returns Constant Strings)

+ (NSString *)restrictedAccessEntitlement;      // "com.apple.aned.private.allow"
+ (NSString *)adapterWeightsAccessEntitlement;  // "com.apple.aned.private.adapterWeight.allow"
+ (NSString *)adapterWeightsAccessEntitlementBypassBootArg;  // "ane_skipAdapterWeightAccessCheck"
+ (NSString *)internalLibraryPath;              // "/AppleInternal/Library"
+ (NSString *)systemLibraryPath;                // "/System/Library"
// ... and many more

Hardware Info Example Output

hasANE = 1
numANEs = 1
numANECores = 16
productName = macOS
buildVersion = 25B78
isVirtualMachine = 0
isInternalBuild = 0
precompiledModelChecksDisabled = 0

Security Analysis

Attack Surface

1. XPC Message Handling

The aned daemon accepts XPC messages from clients. Potential vectors:

  • Malformed model paths: Does compileModel: properly validate URL paths?
  • Sandbox extensions: sandboxExtension: parameter passes filesystem access tokens
  • Memory corruption: Large or malformed layer descriptors
  • Race conditions: Concurrent compile/load/unload operations

2. IOSurface Sharing

IOSurface enables shared memory between processes:

Client Process          aned Daemon           ANE Hardware
──────────────          ───────────           ────────────
     │                       │                     │
     │ Create IOSurface      │                     │
     ├──────────────────────►│                     │
     │                       │ Map to ANE          │
     │                       ├────────────────────►│
     │                       │                     │
     │ Write input data      │                     │
     ├───────────────────────┼────────────────────►│
     │                       │                     │
     │ Read output data      │                     │
     │◄──────────────────────┼─────────────────────┤

Concerns:

  • Shared memory lifetime management
  • Buffer overflow if sizes mismatch
  • Use-after-free on premature unmap

3. Model Cache

The /var/folders/.../com.apple.aned/ cache:

  • World-readable in some configurations
  • Contains compiled ANE bytecode
  • Could leak model architecture details

What Works Without Entitlements

These operations succeed without code signing:

  1. Framework loading: All three frameworks load via dlopen/ctypes
  2. Struct initialization: All ANEC*Initialize() functions callable
  3. Size probing: Can determine struct layouts by sentinel analysis
  4. CPU inference: EspressoContext(platform=0) works
  5. Model parsing: Read and parse .espresso.net files
  6. Client creation: _ANEClient object creation succeeds

What Fails Without Entitlements

These operations fail silently (no error, just NULL return):

  1. compileModel:options:qos:error: - returns nil
  2. loadModel:options:qos:error: - returns nil
  3. evaluateWithModel:options:request:qos:error: - returns nil
  4. _ANEDeviceController - can't access valid device

Security note: Silent failures make debugging difficult but also prevent enumeration of error conditions.


Performance Analysis

Profiling APIs

Layer-Level Profiling

@interface EspressoProfilingLayerInfo : NSObject
@property (readonly) NSString *name;
@property (readonly) NSString *debug_name;
@property (readonly) double average_runtime;        // seconds
@property (readonly) int selected_runtime_engine;   // 0=CPU, 1=GPU, 2=ANE
@property (readonly) NSArray *runtimes;
@end

Network-Level ANE Profiling

@interface EspressoProfilingNetworkANEInfo : NSObject
@property (readonly) uint64_t total_ane_time_ns;
@property (readonly) uint64_t ane_time_per_eval_ns;
@end

Request-Level Stats

@interface _ANERequest : NSObject
@property uint32_t perfStatsMask;    // Bitmask for which stats to collect
@property (readonly) id perfStats;
@property (readonly) NSArray *perfStatsArray;
@end

Operation Mapping

Operations with Native ANE Support

These map 1:1 to ANE instructions:

  • Convolution (all variants)
  • Matrix multiplication
  • Scaled Dot-Product Attention (SDPA)
  • Softmax
  • Common activations (ReLU, GeLU, Tanh)
  • Pooling operations
  • Element-wise arithmetic

Operations That Get Decomposed

These are broken into multiple ANE ops:

  • LayerNorm → multiple passes
  • Complex reductions
  • Non-standard activations
  • Dynamic shapes

Fallback to CPU/GPU

Operations fall back when:

  • Tensor too large for ANE SRAM
  • Unsupported operation type
  • Dynamic control flow
  • Precision requirements exceed INT8/FP16

Example Runthrough

Building a CNN Graph

from ane import SimpleANEGraph

# Create graph builder
graph = SimpleANEGraph()

# Input: (batch=1, channels=3, height=224, width=224)
graph.add_conv2d("conv1", (1, 3, 224, 224), 
                 out_channels=64, kernel_size=7, stride=2, padding=3)
# Output: (1, 64, 112, 112)

graph.add_pool2d("pool1", (1, 64, 112, 112), kernel_size=3, stride=2)
# Output: (1, 64, 56, 56)

graph.add_conv2d("conv2", (1, 64, 56, 56), 
                 out_channels=128, kernel_size=3, padding=1)
# Output: (1, 128, 56, 56)

graph.add_conv2d("conv3", (1, 128, 56, 56), 
                 out_channels=256, kernel_size=3, padding=1)
# Output: (1, 256, 56, 56)

graph.add_pool2d("pool2", (1, 256, 56, 56), kernel_size=2, stride=2)
# Output: (1, 256, 28, 28)

graph.add_linear("fc1", input_features=256*28*28, output_features=1024)
graph.add_linear("fc2", input_features=1024, output_features=1000)
graph.add_softmax("softmax", (1, 1000))

print(graph.summary())

Output:

ANE Computation Graph
============================================================

conv1 (conv2d)
  Input:  (1, 3, 224, 224)
  Output: (1, 64, 112, 112)
  Desc:   176 bytes
  Kernel: 7x7
  Stride: 2x2
  Pad:    3,3

pool1 (pool2d)
  Input:  (1, 64, 112, 112)
  Output: (1, 64, 56, 56)
  Desc:   96 bytes
  Kernel: 3x3
  Stride: 2x2

...

============================================================
Total layers: 8
Total descriptor bytes: 680

Building Transformer Attention

from ane import build_transformer_attention

graph = build_transformer_attention()
print(graph.summary())

Output:

ANE Computation Graph
============================================================

proj_qkv (linear)
  Input:  (512, 512, 1, 1)
  Output: (512, 1536, 1, 1)
  Desc:   64 bytes

attention (sdpa)
  Input:  (1, 8, 512, 64)
  Output: (1, 8, 512, 64)
  Desc:   8 bytes              <-- Native transformer attention!

proj_out (linear)
  Input:  (512, 512, 1, 1)
  Output: (512, 512, 1, 1)
  Desc:   64 bytes

============================================================
Total layers: 3
Total descriptor bytes: 136

Loading Espresso Models

from ane import (
    create_espresso_cpu_context,
    load_espresso_network,
    get_network_layer_count,
    EspressoNet,
)

# Method 1: Direct runtime loading (CPU only without entitlements)
ctx = create_espresso_cpu_context()
print(f"Context: {hex(ctx)}")

model_path = "/path/to/model.espresso.net"
net = load_espresso_network(model_path, ctx)
print(f"Network: {hex(net)}")
print(f"Layers: {get_network_layer_count(net)}")

# Method 2: Parse the file directly
model = EspressoNet.from_file(model_path)
print(f"Format version: {model.format_version}")
print(f"Layer types: {model.layer_type_counts()}")

# Analyze inner_product layers for quantization
for ip in model.get_inner_product_info():
    print(f"  {ip['name']}: {ip['nB']}x{ip['nC']}, "
          f"quant={ip['quantization_mode']}, lookup={ip['is_lookup']}")

Decoding PBZE Files

from ane import decode_espresso_net, get_pbze_stats, is_pbze_file

path = "/System/Library/SomeFramework/model.espresso.net"

# Check format
if is_pbze_file(path):
    stats = get_pbze_stats(path)
    print(f"Compressed size: {stats['compressed_size']} bytes")
    print(f"Uncompressed size: {stats['uncompressed_size']} bytes")
    print(f"Compression ratio: {stats['compression_ratio']:.2f}x")

# Decode (handles both JSON and PBZE automatically)
data = decode_espresso_net(path)
print(f"Layers: {len(data['layers'])}")

Using the Native Helper

For full ANE access, use the signed Objective-C helper:

# Build and sign
cd helper
./build.sh "Developer ID Application: Your Name (TEAMID)"

# Check status
echo '{"cmd": "status"}' | ./ane_helper
# {"ok":true,"client":true,"model_count":0,"model_ids":[]}

# Compile a model
echo '{"cmd": "compile", "model_path": "/path/to/model.mlmodelc"}' | ./ane_helper
# {"ok":true,"model_id":"ABC123","state":1}

# Load into ANE memory
echo '{"cmd": "load", "model_id": "ABC123"}' | ./ane_helper
# {"ok":true,"model_id":"ABC123","program_handle":12345}

# Unload
echo '{"cmd": "unload", "model_id": "ABC123"}' | ./ane_helper
# {"ok":true}

Comprehensive Reference

Complete Layer Type Reference

Espresso Layer Types (from system model analysis)

Type Category Attributes
activation Compute type (relu/gelu/tanh/sigmoid/etc), alpha, beta
batch_matmul Compute transpose_a, transpose_b, adj_x, adj_y
concat Shape axis
convolution Compute kernel_size, stride, pad, C, groups, dilation
dynamic_dequantize Quantization scale_blob, zero_point_blob
dynamic_quantize Quantization axis, mode
elementwise Compute operation, alpha, broadcast
expand_dims Shape axes
general_concat Shape axis, interleave
general_slice Shape starts, ends, strides, axes
get_shape Utility (no special attributes)
inner_product Compute nB, nC, has_biases, quantization_mode, is_lookup
instancenorm_1d Normalization C, epsilon
load_constant Memory blob_weights, shape
nonzero Utility (no special attributes)
reduce Compute mode (sum/mean/max/min/prod), axes, keepdims
reshape Shape shape
scatter_nd Memory (no special attributes)
softmax Compute axis
split_nd Shape axis, num_splits, split_sizes
tile Shape reps
transpose Shape axes

ANE Compiler Struct Sizes

Struct Size (bytes) Initialize Function
ANECAffineTransformLayerDesc 48 ANECAffineTransformLayerDescInitialize
ANECBatchNormLayerDesc 40 ANECBatchNormLayerDescInitialize
ANECConcatLayerDesc 16 ANECConcatLayerDescInitialize
ANECConvLayerDesc 176 ANECConvLayerDescInitialize
ANECCropResizeLayerDesc 64 ANECCropResizeLayerDescInitialize
ANECCrossCorrelationLayerDesc 96 ANECrossCorrelationLayerDescInitialize
ANECDropoutLayerDesc 16 ANECDropoutLayerDescInitialize
ANECExpandLayerDesc 32 ANECExpandLayerDescInitialize
ANECFillLayerDesc 24 ANECFillLayerDescInitialize
ANECFlattenLayerDesc 16 ANECFlattenLayerDescInitialize
ANECGatherLayerDesc 24 ANECGatherLayerDescInitialize
ANECGatherNDLayerDesc 24 ANECGatherNDLayerDescInitialize
ANECGridSampleLayerDesc 32 ANECGridSampleLayerDescInitialize
ANECGroupNormLayerDesc 40 ANECGroupNormLayerDescInitialize
ANECInputViewLayerDesc 32 ANECInputViewLayerDescInitialize
ANECKernelSize 24 ANECKernelSizeInitialize
ANECLRNLayerDesc 32 ANECLRNLayerDescInitialize
ANECLayerNormLayerDesc 40 ANECLayerNormLayerDescInitialize
ANECLinearLayerDesc 64 ANECLinearLayerDescInitialize
ANECMatrixMultLayerDesc 16 ANECMatrixMultLayerDescInitialize
ANECNMSLayerDesc 48 ANECNMSLayerDescInitialize
ANECNeuronLayerDesc 32 ANECNeuronLayerDescInitialize
ANECNormLayerDesc 40 ANECNormLayerDescInitialize
ANECPadLayerDesc 48 ANECPadLayerDescInitialize
ANECPadding 24 ANECPaddingInitialize
ANECPoolLayerDesc 96 ANECPoolLayerDescInitialize
ANECRandomLayerDesc 32 ANECRandomLayerDescInitialize
ANECReductionLayerDesc 24 ANECReductionLayerDescInitialize
ANECResampleLayerDesc 48 ANECResampleLayerDescInitialize
ANECReshapeLayerDesc 48 ANECReshapeLayerDescInitialize
ANECResizeLayerDesc 40 ANECResizeLayerDescInitialize
ANECRingBufferLayerDesc 32 ANECRingBufferLayerDescInitialize
ANECSDPALayerDesc 8 ANECSDPALayerDescInitialize
ANECScatterLayerDesc 24 ANECScatterLayerDescInitialize
ANECScatterNDLayerDesc 24 ANECScatterNDLayerDescInitialize
ANECShapeLayerDesc 16 ANECShapeLayerDescInitialize
ANECSoftmaxLayerDesc 48 ANECSoftmaxLayerDescInitialize
ANECSortLayerDesc 24 ANECSortLayerDescInitialize
ANECSplitLayerDesc 24 ANECSplitLayerDescInitialize
ANECSqueezeLayerDesc 32 ANECSqueezeLayerDescInitialize
ANECStep 12 ANECStepInitialize
ANECTensorDesc 64 ANECTensorDescInitialize
ANECTensorDims 40 ANECTensorDimsInitialize
ANECTileLayerDesc 32 ANECTileLayerDescInitialize
ANECTopKLayerDesc 24 ANECTopKLayerDescInitialize
ANECTransposeLayerDesc 32 ANECTransposeLayerDescInitialize
ANECUnflattenLayerDesc 24 ANECUnflattenLayerDescInitialize

Espresso Optimization Passes

All discovered Pass_* classes in Espresso.framework:

Pass_add_fp16_fp32_conversions
Pass_batch_matmul_transpose_fusion
Pass_broadcast_optimization
Pass_canonicalize_ops
Pass_constant_folding
Pass_convert_gather_to_slice
Pass_convert_to_ane_layout
Pass_dead_code_elimination
Pass_decompose_complex_ops
Pass_eliminate_identity_ops
Pass_eliminate_redundant_transpose
Pass_fold_constants
Pass_fuse_activation
Pass_fuse_add_mul
Pass_fuse_bias
Pass_fuse_conv_batchnorm
Pass_fuse_conv_bias
Pass_fuse_elementwise
Pass_fuse_gelu
Pass_fuse_layernorm
Pass_fuse_linear_ops
Pass_fuse_matmul_add
Pass_fuse_mul_add
Pass_fuse_pad_conv
Pass_fuse_reshape_transpose
Pass_insert_copies_for_ane
Pass_legalize_for_ane
Pass_lower_to_ane_ops
Pass_optimize_memory_layout
Pass_optimize_reshape_chain
Pass_optimize_transpose
Pass_propagate_shapes
Pass_quantize_weights
Pass_remove_unused_outputs
Pass_replace_div_with_mul
Pass_simplify_arithmetic
Pass_split_large_tensors
Pass_tensor_parallel_partition
Pass_tile_for_ane
Pass_vectorize_ops

ObjC Class Methods Reference

_ANEClient

// Lifecycle
- (instancetype)initWithRestrictedAccessAllowed:(BOOL)allowed;

// Compilation
- (BOOL)compileModel:(id)model options:(id)opts qos:(int)qos error:(NSError**)err;
- (BOOL)compiledModelExistsFor:(id)model;
- (BOOL)compiledModelExistsMatchingHash:(NSData*)hash;
- (BOOL)purgeCompiledModel:(id)model;

// Loading
- (BOOL)loadModel:(id)model options:(id)opts qos:(int)qos error:(NSError**)err;
- (BOOL)loadModelNewInstance:(id)model options:(id)opts modelInstParams:(id)params qos:(int)qos error:(NSError**)err;
- (BOOL)loadRealTimeModel:(id)model options:(id)opts qos:(int)qos error:(NSError**)err;
- (BOOL)unloadModel:(id)model options:(id)opts qos:(int)qos error:(NSError**)err;

// Evaluation
- (BOOL)evaluateWithModel:(id)model options:(id)opts request:(id)req qos:(int)qos error:(NSError**)err;
- (BOOL)evaluateRealTimeWithModel:(id)model options:(id)opts request:(id)req error:(NSError**)err;

// Memory
- (BOOL)mapIOSurfacesWithModel:(id)model request:(id)req cacheInference:(BOOL)cache error:(NSError**)err;
- (void)unmapIOSurfacesWithModel:(id)model request:(id)req;

// Chaining
- (BOOL)prepareChainingWithModel:(id)model options:(id)opts chainingReq:(id)req qos:(int)qos error:(NSError**)err;

_ANEModel

// Initialization
- (instancetype)initWithModelAtURL:(NSURL*)url 
                               key:(NSString*)key
                  identifierSource:(int)src
              cacheURLIdentifier:(NSString*)cacheId
                 modelAttributes:(id)attrs
                  standardizeURL:(BOOL)standardize;
- (instancetype)initWithModelIdentifier:(id)identifier;

// Properties
@property (readonly) NSURL *modelURL;
@property (readonly) NSURL *sourceURL;
@property (readonly) NSString *UUID;
@property (readonly) NSString *key;
@property (readonly) int state;  // 1 = created/unloaded
@property (readonly) uint64_t programHandle;
@property (readonly) uint64_t intermediateBufferHandle;
@property (readonly) int queueDepth;
@property (readonly) uint32_t perfStatsMask;
@property (readonly) id mpsConstants;

_ANERequest

// Initialization
- (instancetype)initWithInputs:(NSArray*)inputs
                  inputIndices:(NSArray*)inputIndices
                       outputs:(NSArray*)outputs
                 outputIndices:(NSArray*)outputIndices
                 weightsBuffer:(id)weights
                     perfStats:(id)stats
                procedureIndex:(int)procIdx
                  sharedEvents:(id)events
             transactionHandle:(uint64_t)handle;

// Properties
@property (readonly) NSArray *inputArray;
@property (readonly) NSArray *inputIndexArray;
@property (readonly) NSArray *outputArray;
@property (readonly) NSArray *outputIndexArray;
@property (readonly) id weightsBuffer;
@property (readonly) int procedureIndex;
@property (readonly) id perfStats;
@property (readonly) NSArray *perfStatsArray;
@property (copy) void (^completionHandler)(BOOL, NSError*);
@property (readonly) id sharedEvents;
@property (readonly) uint64_t transactionHandle;

Running Tests

# Run all tests
pytest tests/test_ane.py -v

# Run specific test class
pytest tests/test_ane.py::TestANECompiler -v

# Run with coverage
pytest tests/test_ane.py --cov=ane --cov-report=term-missing

Test categories:

  • TestANEStructs - Data structure serialization
  • TestANECompiler - Framework loading and initialization
  • TestANEHelpers - Utility functions
  • TestANESample - Graph building
  • TestANELayerSizes - Probed struct sizes
  • TestEspressoDiscovery - ObjC class introspection
  • TestEspressoFormat - Model file parsing
  • TestPBZE - Compression/decompression
  • TestANEXPC - XPC protocol discovery
  • TestAPITree - Knowledge base API tree

HWX File Format & Execution Research

Working Path: CoreML API

The simplest way to execute models on ANE is through CoreML's public API:

// Objective-C
MLModelConfiguration *config = [[MLModelConfiguration alloc] init];
config.computeUnits = MLComputeUnitsAll;  // Enables ANE

MLModel *model = [MLModel modelWithContentsOfURL:modelURL 
                                   configuration:config 
                                           error:&error];
# Python with coremltools
import coremltools as ct
model = ct.models.MLModel("model.mlpackage", compute_units=ct.ComputeUnit.ALL)

HWX Binary Format

Pre-compiled ANE binaries (.hwx files) have a Mach-O-like structure:

Offset Value Description
0x00 0xBEEFFACE Magic number
0x04 varies Header info
... __PAGEZERO Zero page segment
... __DATA Data segment
... __FVMLIB ANE instructions

Key insight: HWX files cannot be loaded alone - they require a companion .espresso.net file that describes the network structure.

Espresso Model Bundle Structure

A complete Espresso model bundle contains:

File Description
model.espresso.net Network description (JSON or PBZE)
model.espresso.weights Binary weights data
model.espresso.shape Shape information
model.H14.espresso.hwx Pre-compiled ANE binary (chip-specific)
model.H14.espresso.precompilation_info Compiler metadata (JSON)

Different .hwx files exist for different ANE generations:

  • .H13.espresso.hwx - A14/M1 generation
  • .H14.espresso.hwx - A15/M2 generation
  • .H15.espresso.hwx - A16/M3 generation
  • .H16.espresso.hwx - A17/M4 generation

API Layer Summary

Layer Status Notes
CoreML ✅ Working Use MLComputeUnitsAll, system handles everything
XPC to aned ✅ Working _ANEClient.sharedConnection works
ANEServices ⚠️ Limited Model loading needs .espresso.net
Espresso ⚠️ Limited Platform 2 (ANE) context crashes
IOKit Direct ❌ Blocked Requires com.apple.ane.iokit-user-access

MLComputePlan Device Masks

When inspecting MLComputePlan.computeDevicesBySupportedComputeUnits:

Mask Devices
1 CPU only
2 GPU only
3 CPU + GPU
4 Neural Engine only
5 CPU + Neural Engine
6 GPU + Neural Engine
7 CPU + GPU + Neural Engine (all)

Hardware Detection

// Get ANE device info
Class deviceClass = NSClassFromString(@"MLNeuralEngineComputeDevice");
id device = [deviceClass performSelector:@selector(physicalDevice)];
NSInteger cores = [[device valueForKey:@"totalCoreCount"] integerValue];
// Returns 16 on M3 Pro

License

This project contains reverse engineering artifacts for research and interoperability purposes. Use responsibly.

Acknowledgments

  • Apple's private frameworks documentation from class-dump and dyld_info
  • The tinygrad community for ANE exploration inspiration