Profile Decoding
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Fast-dLLM++: Fr\'{e}chet Profile Decoding for Faster Diffusion LLM Inference
Announce Type: new Abstract: Diffusion large language models promise parallel token generation, yet inference remains bottlenecked by deciding which masked tokens can be safely committed together. Fast-dLLM addressed this with KV caching and confidence-guided parallel decoding, but its decoding theory uses a homogeneous high-confidence assumption that effectively reduces each candidate set to its weakest selected token. We argue that this leaves speed on the table because real decoding steps...
Heterogeneous Mapping for Analog In-Memory Computing Accelerators: A Unified Workflow
arXiv:2606.02672v1 Announce Type: new Abstract: Analog In-Memory Computing (AIMC) accelerators execute matrix-vector multiplications directly within memory arrays, reducing data movement and improving DNN inference efficiency. Their limited effective precision motivates heterogeneous architectures that combine analog compute tiles with digital processing units. This letter classifies existing methods for partitioning DNN workloads across these resources by mapping granularity, optimization...
Pyro Caml Continuous Profiler for OCaml
The core SAST engine of Semgrep is written in OCaml. There are a lot of good technical and historical reasons for this that I’ll leave for another time. An important consequence of using a language with a (relatively) small ecosystem like OCaml is that there aren’t a lot of libraries for things like observability, which are critical for running industrial software like Semgrep on hundreds of thousands of code repositories, and keeping it both reliable and performant.
Don't be so Stief! Learning KV Cache low-rank approximation over the Stiefel manifold
Announce Type: replace Abstract: Key-value (KV) caching enables fast autoregressive decoding but at long contexts becomes a dominant bottleneck in High Bandwidth Memory (HBM) capacity and bandwidth. A common mitigation is to compress cached keys and values by projecting per-head matrices to a lower rank, storing only the projections in the HBM. However, existing post-training approaches typically fit these projections using SVD-style proxy objectives, which may poorly reflect end-to-end...
Intel's mysterious new datacenter GPU is what Nvidia's Rubin CPX nearly was
Intel offered new insights into its next-gen datacenter GPU codenamed Crescent Island. Alongside supporting enterprise AI deployments, the GPU could fill the void left by Nvidia's Rubin CPX GPUs, which were seemingly shelved late last year following its acquisition of Groq. As datacenter GPUs go, Intel's Crescent Island is certainly an odd duck.
How Transformers Reject Wrong Answers: Rotational Dynamics of Factual Constraint Processing
arXiv:2603.13259v2 Announce Type: replace Abstract: When a decoder-only transformer is forced to process matched correct and incorrect single-token continuations of a factual query, the two pathways through hidden-state space diverge in a specific way: displacement vectors from the query-only representation maintain approximately equal magnitude but rotate apart in direction. The angular separation grows through mid-depth, and late layers resolve the asymmetric outcome -a logit-lens...
KVarN: Native vLLM backend for KV-cache quantization by Huawei
⚡️ Built for agentic and long-context workloads. 💡 KVarN delivers 3-5x more KV-cache capacity and up to ~1.3x the throughput of FP16, so you fit far longer contexts and serve more concurrent requests, with FP16-level accuracy. 🔌 Calibration-free, plug-and-play with vLLM.
BatteryMFormer: Multi-level Learning for Battery Degradation Trajectory Forecasting
Announce Type: replace Abstract: Early battery degradation trajectory forecasting (BDTF), which predicts the full-life state-of-health trajectory from early operational data, is critical for battery optimization, manufacturing, and deployment. Battery degradation data exhibit two key characteristics. First, degradation data present a multi-level structure, including regularities shared within aging conditions and trajectory patterns shared across batteries.
A 1000-hour EEG-EMG-audio dataset of Japanese speech production
Announce Type: cross Abstract: We present a multimodal dataset of 1020 hours of simultaneously recorded scalp electroencephalography (EEG), facial electromyography (EMG), and speech audio from three healthy native Japanese speakers during open-vocabulary overt speech. Recordings were acquired with three EEG systems-an ultra-high-density system (g.Pangolin) and two cap-type systems (g.SCARABEO and eegosports), spanning 62-128 channels-across many sessions over several months. Each session...
Single-cell spatial transcriptomics and snRNA-seq decoding the organizational principles of functional modules in the mouse amygdala
The amygdala is a functionally heterogeneous nuclear complex comprising multiple subnuclei that orchestrate diverse behaviors, making it essential for survival and reproduction. However, the precise neural mechanisms underlying these heterogeneous functions remain elusive, primarily due to limited knowledge of the amygdala's cellular heterogeneity, developmental origins, spatial organization, and gene expression profiles. Here, we integrate single-cell-resolution spatial transcriptomics with...