Tensor Slices
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Programming Domain-Specific FPGA Hardblocks from HLS: An RTL Blackbox Approach
Announce Type: new Abstract: Domain-specific Field Programmable Gate Array (FPGA) architectures increasingly integrate specialized hardblocks, such as Tensor Slices, to accelerate artificial intelligence and machine learning workloads. Despite their efficiency benefits, these architectures remain difficult to program because designers typically rely on manual Register-Transfer Level (RTL) integration to access these hardblocks. This paper presents a compiler-agnostic methodology that enables...
Parallelizing Large-Scale Tensor Network Contraction on Multiple GPUs
arXiv:2606.01852v1 Announce Type: new Abstract: Exact tensor network contraction underpins quantum circuit simulation, quantum error correction, combinatorial optimization, and many-body dynamics. The dominant parallelization strategy, slicing, scales exponentially and incurs redundant computation. We present a multi-GPU framework that instead distributes intermediate tensors across devices with explicit communication, converting a fixed contraction path into a communication-efficient...
Asymptotic tensor rank is characterized by polynomials
arXiv:2411.15789v2 Announce Type: replace Abstract: Asymptotic tensor rank is notoriously difficult to determine. Indeed, determining its value for the $2\times 2$ matrix multiplication tensor would determine the matrix multiplication exponent, a long-standing open problem. On the other hand, Strassen's asymptotic rank conjecture makes the bold claim that asymptotic tensor rank equals the largest dimension of the tensor and is thus as easy to compute as matrix rank.
Factorizing binary tensors into quantics tensor trains
new Abstract: The conversion of functions to quantics tensor trains is a well-established procedure and can either be done analytically or numerically. Numerical conversion schemes are based on singular value decompositions, where access to the full tensor is necessary, or on cross interpolations, which only depend on sampling a function. When dealing with large binary tensors, the first approach becomes prohibitively expensive while the second approach might fail to converge due to the...
A 10 year old Xeon is all you need (for 26B-A4B MTP Drafters without GPU)
A 10 year old Xeon is all you need 17 minutes read The previous post covered getting Gemma 4’s MTP drafters quantized and paired with a verifier. This one is about running the result on a machine that has no business running it. I have a recycled server.
Bringing Up DeepSeek-V4-Flash on AMD MI300X
Bringing up DeepSeek-V4-Flash on AMD MI300X At Doubleword we are building an inference cloud designed for volume. To do that we have to reckon with the enveloping compute shortage. AMD’s MI300X launched in December 2023At AMD’s “Advancing AI” event, 6 December 2023.
Graph Traversal on Tensor Cores: A BFS Framework for Modern GPUs
arXiv:2606.05081v1 Announce Type: new Abstract: Modern GPUs have Tensor Cores (TCs) capable of extremely high-throughput matrix operations, yet graph algorithms remain difficult to accelerate because of their irregular and data-dependent execution patterns. This work presents BLEST, a TC-accelerated framework that reformulates Breadth-First Search (BFS) as a bit-level sparse matrix-vector computation while addressing the load imbalance, memory inefficiency, and synchronization overheads that...
Finite-Iteration Local Dynamics and Warm Starts for Alternating Power Iteration in Spiked Tensor PCA
Announce Type: cross Abstract: We study simultaneous alternating power iteration for fixed-order asymmetric rank-one spiked tensor models. Our main contribution is a finite-iteration local theory that is independent of any particular initialization. Once the iterates enter a sufficiently small neighborhood of the planted rank-one direction, their error decomposes into a geometrically decaying transient and an intrinsic noise floor caused by fixed orthogonal noise contractions at the planted...