LUT
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
OASIS: Outlier-Aware LUT-Based GEMM with Dual-Side Quantization for LLM Inference Acceleration
arXiv:2507.23035v4 Announce Type: replace Abstract: Large language models (LLMs) have demonstrated impressive capabilities across a wide range of applications, but demand substantial memory and compute resources during inference. Existing quantization methods expose a trade-off between efficiency and accuracy: weight-only quantization (WOQ) incurs costly dequantization overheads, while integer weight-and-activation quantization (INT-WAQ) reduces precision and degrades model quality....
Ablation Study of Block Size, Weight Precision, and Scale Precision in NVFP4 Inference for Low-Power Edge-Efficient Neural Networks
Announce Type: new Abstract: Energy-efficient edge inference requires reducing arithmetic cost, memory traffic, and hardware overhead. This paper presents an ablation-focused study of NVFP4 LUT-based inference for edge-efficient neural networks. The proposed NVLUT framework combines 4-bit NVFP4 activations, two-level scaling, LUT-based mantissa computation, voltage-scaled storage, and selective ECC protection.
PALUTE: Processing-In-Memory Acceleration via Lookup Table for Edge LLM Inference
arXiv:2606.08891v1 Announce Type: new Abstract: Large language models are increasingly deployed on edge devices with tight power and area budgets. While mixed-precision GEMM reduces arithmetic complexity, quantized inference is often dominated by dequantization and nonlinear operators. Lookup Table (LUT)-based method mitigates these costs by precomputing outputs and replacing repeated arithmetic with table lookups, but existing designs incur significant capacity and lookup-latency overheads.
Ultrafast machine learning on FPGAs via Kolmogorov-Arnold Networks
Ultrafast machine learning on FPGAs via Kolmogorov-Arnold Networks This post is a high-level explainer for my Master’s thesis, which involves designing hardware architectures for ultrafast inference and online learning using the Kolmogorov-Arnold Network (KAN) architecture. I’ll assume familiarity with standard machine learning concepts, as well as some understanding of hardware and digital circuits; read my previous post here for the latter. Please read the two papers below for more...
DaVinci Resolve 21
DaVinci Resolve 21 introduces the Photo page, bringing Hollywood's most advanced color tools to still photography! A new generation of AI tools let you search media by content, read slate data, perform de-aging, blemish removal and more. The Edit and Cut pages have improved keyframing and greater graphic format support.
Flexible FTN-Aided OTFS Modulation for High-Mobility LEO Satellite-to-Ground Communications
arXiv:2601.22526v2 Announce Type: replace Abstract: In low Earth orbit (LEO) satellite communications, the link quality fluctuates drastically during a satellite pass, exhibiting a wide dynamic range from the horizon to the zenith. Moreover, the high relative velocity induces severe Doppler shifts. While orthogonal time frequency space (OTFS) modulation effectively resolves the doubly-selective fading, its spectral efficiency is fundamentally bounded by the Nyquist limit.
Energy-Efficient Implementation of Spiking Recurrent Cells on FPGA
arXiv:2605.10679v3 Announce Type: replace Abstract: Spiking Neural Networks (SNNs) can reduce energy consumption compared to conventional Artificial Neural Networks (ANNs) when spiking activity is sparse and the neuron model is hardware-friendly. However, biologically faithful models are often too costly to implement on FPGAs, whereas very simple models (e.g., IR/LIF) sacrifice part of the neuronal dynamics. In this work, we present an FPGA accelerator for an SNN using Spiking Recurrent Cell...
AMS-HD: Hyperdimensional Computing for Real-Time and Energy-Efficient Acute Mountain Sickness Detection
arXiv:2602.08916v3 Announce Type: replace Abstract: Objective: Acute mountain sickness (AMS) is the most prevalent altitude illness, affecting unacclimatized individuals ascending above 2,500 m and potentially escalating to life threatening cerebral or pulmonary edema. Conventional machine learning (ML) methods for AMS detection from wearable physiological signals often fail to meet real-time hardware efficiency requirements of continuous monitoring. Methods: We present AMS-HD, the first...
IntAttention: A Fully Integer Attention Pipeline for Efficient Edge Inference
arXiv:2511.21513v2 Announce Type: replace Abstract: Deploying Transformer models on edge devices is limited by latency and energy budgets. While INT8 quantization effectively accelerates the primary matrix multiplications, it exposes the softmax-related path as the dominant bottleneck. This stage incurs a costly dequantize -> softmax -> requantize detour, which can account for up to 65% of total attention latency and disrupts the end-to-end integer dataflow critical for edge hardware efficiency.
Precomputed 1D-CNNs for Atrial Fibrillation Detection on Tiny Smart Sensor Systems
Announce Type: replace Abstract: 1D-CNNs play a crucial role for time-series analysis on tiny smart sensor systems, e.g. for biosignal analysis, predictive maintenance, or structural health monitoring. LUTbased precomputation has emerged as an interesting optimization technique to implement such neural networks on FPGAs. The core idea is to precompute all possible outputs of a neural network layer and store them directly in the lookup tables of the FPGAs.