Home Knowledge Base NPU

NPU

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

FlexNPU: Transparent NPU Virtualization for Dynamic LLM Prefill-Decode Co-location

arXiv:2606.04415v2 Announce Type: replace Abstract: Modern AI serving increasingly relies on NPUs for conventional inference and large language model serving. However, current NPU deployments commonly expose physical devices directly to applications, which limits runtime control over scheduling and makes it difficult to adapt execution to phase-level workload behavior. This limitation is particularly evident in LLM serving, where the prefill phase is compute-intensive while the decode phase...

arXiv CS 1d ago

FlexNPU: Transparent NPU Virtualization for Dynamic LLM Prefill-Decode Co-location

Announce Type: new Abstract: Modern AI serving increasingly relies on NPUs for conventional inference and large language model serving. However, current NPU deployments commonly expose physical devices directly to applications, which limits runtime control over scheduling and makes it difficult to adapt execution to phase-level workload behavior. This limitation is particularly evident in LLM serving, where the prefill phase is compute-intensive while the decode phase is often constrained by...

arXiv CS 6d ago

LLM Inference at the Edge: Mobile, NPU, and GPU Performance Efficiency Trade-offs Under Sustained Load

arXiv:2603.23640v2 Announce Type: replace Abstract: Deploying large language models on-device for always-on personal agents demands sustained inference from hardware tightly constrained in power, thermal envelope, and memory. We benchmark Qwen 2.5 1.5B (4-bit quantised) across four platforms: a Raspberry Pi 5 with Hailo-10H NPU, a Samsung Galaxy S24 Ultra, an iPhone 16 Pro, and a laptop NVIDIA RTX 4050 GPU. Using a fixed 258-token prompt over 20 warm-condition iterations per device, we...

arXiv CS 1d ago

AcOrch: Accelerating Sampling-based GNN Training under CPU-NPU Heterogeneous Environments

arXiv:2606.01161v1 Announce Type: new Abstract: Graph Neural Networks (GNNs) have achieved remarkable success in various applications. Sampling-based GNN training, which conducts mini-batch training on sampled subgraphs, has become a promising solution for large-scale graphs. Given the resource-intensive nature of sampling-based GNN training, Neural Processing Units (NPUs), such as the Ascend AI processor, offer a promising alternative due to their high throughput and energy efficiency,...

arXiv CS 8d ago

Implementation and Optimization of HQC Decoding on NPU-Integrated Devices

arXiv:2606.01968v1 Announce Type: new Abstract: Hamming Quasi-Cyclic (HQC) has been selected by NIST for standardization as an additional code-based key-encapsulation mechanism, providing algorithmic diversity alongside lattice-based post-quantum cryptography. Efficient deployment of HQC on mobile and embedded platforms, however, requires careful optimization of its decoding procedure, whose Reed-Muller and Reed-Solomon components dominate the computational cost.

arXiv CS 8d ago

ASUS's ExpertBook B5 Flip G2 is a 2.9 pound 360 touchscreen laptop

ASUS's ExpertBook B5 Flip G2 is a 2.9 pound 360 touchscreen laptop The company also revealed the Zenbook 14 with three different processor options. Along with the upcoming ProArt P-series laptops that will run NVIDIA's new RTX Spark processor, ASUS has unveiled several other Windows laptops at Computex. Those include a new ExpertBook convertible model and Expertbook business laptop, along with three new Zenbook 14 models running Intel, AMD and Qualcomm Snapdragon processors.

Engadget 8d ago

NVIDIA's RTX Spark is an AI "superchip" that will power Windows laptops and desktops

NVIDIA's RTX Spark is an AI "superchip" that will power Windows laptops and desktops The company claims it offers 1 petaflop of AI computing power. It was only a matter of time before NVIDIA released a powerful system-on-a-chip (SOC) to take on AMD's Ryzen AI Max and Qualcomm's latest Snapdragon X2 chips. At Computex today, NVIDIA unveiled the RTX Spark, a "superchip" meant to give both laptops and small desktops fast AI and graphics performance.

Engadget 9d ago

Gigabyte packs 40 Intel Lunar Lake PCs in a pizza box

Gigabyte showed off a high density server platform at Computex this week that crams 40 low-power compute nodes into a pizza box. Amid a sea of nearly identical MGX and NVL blades, the R1C7-KOA-AS1 was one of the more unusual systems on this year’s show floor. Rather than using Intel or AMD's datacenter class Xeon or Epyc, the machine is powered by dozens of notebook processors.

The Register 4d ago