Home Knowledge Base LUT

LUT

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

OASIS: Outlier-Aware LUT-Based GEMM with Dual-Side Quantization for LLM Inference Acceleration

arXiv:2507.23035v4 Announce Type: replace Abstract: Large language models (LLMs) have demonstrated impressive capabilities across a wide range of applications, but demand substantial memory and compute resources during inference. Existing quantization methods expose a trade-off between efficiency and accuracy: weight-only quantization (WOQ) incurs costly dequantization overheads, while integer weight-and-activation quantization (INT-WAQ) reduces precision and degrades model quality....

arXiv CS 7d ago

Ablation Study of Block Size, Weight Precision, and Scale Precision in NVFP4 Inference for Low-Power Edge-Efficient Neural Networks

Announce Type: new Abstract: Energy-efficient edge inference requires reducing arithmetic cost, memory traffic, and hardware overhead. This paper presents an ablation-focused study of NVFP4 LUT-based inference for edge-efficient neural networks. The proposed NVLUT framework combines 4-bit NVFP4 activations, two-level scaling, LUT-based mantissa computation, voltage-scaled storage, and selective ECC protection.

arXiv CS 2d ago

PALUTE: Processing-In-Memory Acceleration via Lookup Table for Edge LLM Inference

arXiv:2606.08891v1 Announce Type: new Abstract: Large language models are increasingly deployed on edge devices with tight power and area budgets. While mixed-precision GEMM reduces arithmetic complexity, quantized inference is often dominated by dequantization and nonlinear operators. Lookup Table (LUT)-based method mitigates these costs by precomputing outputs and replacing repeated arithmetic with table lookups, but existing designs incur significant capacity and lookup-latency overheads.

arXiv CS 1d ago

Ultrafast machine learning on FPGAs via Kolmogorov-Arnold Networks

Ultrafast machine learning on FPGAs via Kolmogorov-Arnold Networks This post is a high-level explainer for my Master’s thesis, which involves designing hardware architectures for ultrafast inference and online learning using the Kolmogorov-Arnold Network (KAN) architecture. I’ll assume familiarity with standard machine learning concepts, as well as some understanding of hardware and digital circuits; read my previous post here for the latter. Please read the two papers below for more...

Hacker News 20h ago

DaVinci Resolve 21

DaVinci Resolve 21 introduces the Photo page, bringing Hollywood's most advanced color tools to still photography! A new generation of AI tools let you search media by content, read slate data, perform de-aging, blemish removal and more. The Edit and Cut pages have improved keyframing and greater graphic format support.

Hacker News 7d ago

Flexible FTN-Aided OTFS Modulation for High-Mobility LEO Satellite-to-Ground Communications

arXiv:2601.22526v2 Announce Type: replace Abstract: In low Earth orbit (LEO) satellite communications, the link quality fluctuates drastically during a satellite pass, exhibiting a wide dynamic range from the horizon to the zenith. Moreover, the high relative velocity induces severe Doppler shifts. While orthogonal time frequency space (OTFS) modulation effectively resolves the doubly-selective fading, its spectral efficiency is fundamentally bounded by the Nyquist limit.

arXiv CS 1d ago

Energy-Efficient Implementation of Spiking Recurrent Cells on FPGA

arXiv:2605.10679v3 Announce Type: replace Abstract: Spiking Neural Networks (SNNs) can reduce energy consumption compared to conventional Artificial Neural Networks (ANNs) when spiking activity is sparse and the neuron model is hardware-friendly. However, biologically faithful models are often too costly to implement on FPGAs, whereas very simple models (e.g., IR/LIF) sacrifice part of the neuronal dynamics. In this work, we present an FPGA accelerator for an SNN using Spiking Recurrent Cell...

arXiv CS 6d ago

AMS-HD: Hyperdimensional Computing for Real-Time and Energy-Efficient Acute Mountain Sickness Detection

arXiv:2602.08916v3 Announce Type: replace Abstract: Objective: Acute mountain sickness (AMS) is the most prevalent altitude illness, affecting unacclimatized individuals ascending above 2,500 m and potentially escalating to life threatening cerebral or pulmonary edema. Conventional machine learning (ML) methods for AMS detection from wearable physiological signals often fail to meet real-time hardware efficiency requirements of continuous monitoring. Methods: We present AMS-HD, the first...

arXiv CS 1d ago

IntAttention: A Fully Integer Attention Pipeline for Efficient Edge Inference

arXiv:2511.21513v2 Announce Type: replace Abstract: Deploying Transformer models on edge devices is limited by latency and energy budgets. While INT8 quantization effectively accelerates the primary matrix multiplications, it exposes the softmax-related path as the dominant bottleneck. This stage incurs a costly dequantize -> softmax -> requantize detour, which can account for up to 65% of total attention latency and disrupts the end-to-end integer dataflow critical for edge hardware efficiency.

arXiv CS 9d ago

Precomputed 1D-CNNs for Atrial Fibrillation Detection on Tiny Smart Sensor Systems

Announce Type: replace Abstract: 1D-CNNs play a crucial role for time-series analysis on tiny smart sensor systems, e.g. for biosignal analysis, predictive maintenance, or structural health monitoring. LUTbased precomputation has emerged as an interesting optimization technique to implement such neural networks on FPGAs. The core idea is to precompute all possible outputs of a neural network layer and store them directly in the lookup tables of the FPGAs.

arXiv CS 9d ago