NPU
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
FlexNPU: Transparent NPU Virtualization for Dynamic LLM Prefill-Decode Co-location
arXiv:2606.04415v2 Announce Type: replace Abstract: Modern AI serving increasingly relies on NPUs for conventional inference and large language model serving. However, current NPU deployments commonly expose physical devices directly to applications, which limits runtime control over scheduling and makes it difficult to adapt execution to phase-level workload behavior. This limitation is particularly evident in LLM serving, where the prefill phase is compute-intensive while the decode phase...
FlexNPU: Transparent NPU Virtualization for Dynamic LLM Prefill-Decode Co-location
Announce Type: new Abstract: Modern AI serving increasingly relies on NPUs for conventional inference and large language model serving. However, current NPU deployments commonly expose physical devices directly to applications, which limits runtime control over scheduling and makes it difficult to adapt execution to phase-level workload behavior. This limitation is particularly evident in LLM serving, where the prefill phase is compute-intensive while the decode phase is often constrained by...
LLM Inference at the Edge: Mobile, NPU, and GPU Performance Efficiency Trade-offs Under Sustained Load
arXiv:2603.23640v2 Announce Type: replace Abstract: Deploying large language models on-device for always-on personal agents demands sustained inference from hardware tightly constrained in power, thermal envelope, and memory. We benchmark Qwen 2.5 1.5B (4-bit quantised) across four platforms: a Raspberry Pi 5 with Hailo-10H NPU, a Samsung Galaxy S24 Ultra, an iPhone 16 Pro, and a laptop NVIDIA RTX 4050 GPU. Using a fixed 258-token prompt over 20 warm-condition iterations per device, we...
AcOrch: Accelerating Sampling-based GNN Training under CPU-NPU Heterogeneous Environments
arXiv:2606.01161v1 Announce Type: new Abstract: Graph Neural Networks (GNNs) have achieved remarkable success in various applications. Sampling-based GNN training, which conducts mini-batch training on sampled subgraphs, has become a promising solution for large-scale graphs. Given the resource-intensive nature of sampling-based GNN training, Neural Processing Units (NPUs), such as the Ascend AI processor, offer a promising alternative due to their high throughput and energy efficiency,...
Implementation and Optimization of HQC Decoding on NPU-Integrated Devices
arXiv:2606.01968v1 Announce Type: new Abstract: Hamming Quasi-Cyclic (HQC) has been selected by NIST for standardization as an additional code-based key-encapsulation mechanism, providing algorithmic diversity alongside lattice-based post-quantum cryptography. Efficient deployment of HQC on mobile and embedded platforms, however, requires careful optimization of its decoding procedure, whose Reed-Muller and Reed-Solomon components dominate the computational cost.
ASUS's ExpertBook B5 Flip G2 is a 2.9 pound 360 touchscreen laptop
ASUS's ExpertBook B5 Flip G2 is a 2.9 pound 360 touchscreen laptop The company also revealed the Zenbook 14 with three different processor options. Along with the upcoming ProArt P-series laptops that will run NVIDIA's new RTX Spark processor, ASUS has unveiled several other Windows laptops at Computex. Those include a new ExpertBook convertible model and Expertbook business laptop, along with three new Zenbook 14 models running Intel, AMD and Qualcomm Snapdragon processors.
NVIDIA's RTX Spark is an AI "superchip" that will power Windows laptops and desktops
NVIDIA's RTX Spark is an AI "superchip" that will power Windows laptops and desktops The company claims it offers 1 petaflop of AI computing power. It was only a matter of time before NVIDIA released a powerful system-on-a-chip (SOC) to take on AMD's Ryzen AI Max and Qualcomm's latest Snapdragon X2 chips. At Computex today, NVIDIA unveiled the RTX Spark, a "superchip" meant to give both laptops and small desktops fast AI and graphics performance.
Gigabyte packs 40 Intel Lunar Lake PCs in a pizza box
Gigabyte showed off a high density server platform at Computex this week that crams 40 low-power compute nodes into a pizza box. Amid a sea of nearly identical MGX and NVL blades, the R1C7-KOA-AS1 was one of the more unusual systems on this year’s show floor. Rather than using Intel or AMD's datacenter class Xeon or Epyc, the machine is powered by dozens of notebook processors.