Programming Domain-Specific FPGA Hardblocks from HLS: An RTL Blackbox Approach

arXiv CS Tuesday 09 June 2026, 04:00 UTC By Ruthwik Reddy Sunketa, Jeevesh Choudhury, Aman Arora 1 min read

Key Points

Announce Type: new Abstract: Domain-specific Field Programmable Gate Array (FPGA) architectures increasingly integrate specialized hardblocks, such as Tensor Slices, to accelerate artificial intelligence and machine learning workloads. Despite their efficiency benefits, these architectures remain difficult to program because designers typically rely on manual Register-Transfer Level (RTL) integration to access these hardblocks. This paper presents a compiler-agnostic methodology that enables...

arXiv:2606.08380v1 Announce Type: new Abstract: Domain-specific Field Programmable Gate Array (FPGA) architectures increasingly integrate specialized hardblocks, such as Tensor Slices, to accelerate artificial intelligence and machine learning workloads. Despite their efficiency benefits, these architectures remain difficult to program because designers typically rely on manual Register-Transfer Level (RTL) integration to access these hardblocks. This paper presents a compiler-agnostic methodology that enables high-level synthesis (HLS) tools to target custom FPGA hardblocks directly from C/C++ code. Architectural hardblocks are exposed as schedulable C-level operators using an RTL blackbox abstraction with explicit latency and initiation-interval contracts, allowing the HLS scheduler to optimize around specialized hardware without manual RTL orchestration. Unlike traditional uses of HLS blackboxes for external IP integration, our approach treats blackboxes as architectural abstractions, enabling scalable composition of C-level operators that target custom FPGA hardblocks without compiler modification. We evaluate the proposed flow using a Tensor Slice-based FPGA architecture with AMD Vitis HLS and the Verilog-to-Routing (VTR) toolchain. Across multiple matrix sizes, designs generated using the proposed C-Blackbox flow achieve lower area-delay product than behavioral HLS baselines while providing substantially higher productivity-adjusted efficiency than handwritten RTL implementations. These results demonstrate that domain-specific FPGA architectures can be made accessible through HLS while maintaining competitive hardware efficiency.

RTL Blackbox Approach (ORG) Field Programmable Gate Array (LOCATION) FPGA (ORG) Tensor Slices (ORG) Register-Transfer Level (ORG) HLS (ORG) C/C++ (ORG) RTL (ORG) Tensor Slice (ORG) VTR (ORG)

Originally published by arXiv CS Read original →

Programming Domain-Specific FPGA Hardblocks from HLS: An RTL Blackbox Approach

Related Stories

Google will save your Lens photos, Search Live recordings, and Translate audio for AI training

ASML to Cut Fewer Jobs Than Planned After Union Negotiations

Engadget Podcast: WWDC 2026 thoughts from Apple Park

German court holds Google liable for false AI Overview answers