NPU (Intel NPU 5, OpenVINO) — blocked at the software-stack level, twice over:
qwen3_5_moe: a hybrid Gated-DeltaNet (linear-attention/SSM) ×
256-expert sparse MoE vision-language model. The NPU is only reachable through OpenVINO, and
optimum-intel/OpenVINO cannot export this architecture — OpenVINO GenAI's VLM pipeline supports only
dense Qwen3-VL; no MoE-VLM, no DeltaNet ops.iGPU (Xe3, llama.cpp Vulkan) — three independent, measured failure layers:
vk::DeviceLostError inside
clip_image_batch_encode → GPU reset. Driver/kernel immaturity on brand-new Xe3 silicon (mesa anv).-ngl 99) yields deterministic wrong tokens at temp 0;
isolated to the 248,320-vocab lm_head matmul — all 40 transformer blocks alone are numerically correct
(-ngl 40). A silent Vulkan kernel bug (likely workgroup/dimension overflow on the huge vocab matrix);
narrower repro than upstream issue #21888.Proposed solution (concise). Now, on this box: stay on CPU and recover the broken KV prefix-cache — restructure the agent's message history so screenshot eviction stops invalidating the prefix (measured ~25K-token re-prefills at ~60 tok/s dominate step latency; est. 2–3× faster steps), plus a loop-detector that early-FAILs runaway episodes. Upstream: file the two minimal repros (lm_head-only corruption; clip DeviceLost) — either fix unlocks vision-prefill offload to the iGPU. Right long-term fix: heterogeneous placement once OpenVINO gains MoE-VLM support — vision tower on NPU, compute-bound prefill on iGPU, bandwidth-bound expert decode on CPU — or edge silicon with real bandwidth headroom (≥200 GB/s class) where the GPU path pays off.