Home Knowledge Base Tensor Slice

Tensor Slice

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Programming Domain-Specific FPGA Hardblocks from HLS: An RTL Blackbox Approach

Announce Type: new Abstract: Domain-specific Field Programmable Gate Array (FPGA) architectures increasingly integrate specialized hardblocks, such as Tensor Slices, to accelerate artificial intelligence and machine learning workloads. Despite their efficiency benefits, these architectures remain difficult to program because designers typically rely on manual Register-Transfer Level (RTL) integration to access these hardblocks. This paper presents a compiler-agnostic methodology that enables...

arXiv CS 1d ago

Parallelizing Large-Scale Tensor Network Contraction on Multiple GPUs

arXiv:2606.01852v1 Announce Type: new Abstract: Exact tensor network contraction underpins quantum circuit simulation, quantum error correction, combinatorial optimization, and many-body dynamics. The dominant parallelization strategy, slicing, scales exponentially and incurs redundant computation. We present a multi-GPU framework that instead distributes intermediate tensors across devices with explicit communication, converting a fixed contraction path into a communication-efficient...

arXiv CS 8d ago

Asymptotic tensor rank is characterized by polynomials

arXiv:2411.15789v2 Announce Type: replace Abstract: Asymptotic tensor rank is notoriously difficult to determine. Indeed, determining its value for the $2\times 2$ matrix multiplication tensor would determine the matrix multiplication exponent, a long-standing open problem. On the other hand, Strassen's asymptotic rank conjecture makes the bold claim that asymptotic tensor rank equals the largest dimension of the tensor and is thus as easy to compute as matrix rank.

arXiv CS 1d ago

Factorizing binary tensors into quantics tensor trains

new Abstract: The conversion of functions to quantics tensor trains is a well-established procedure and can either be done analytically or numerically. Numerical conversion schemes are based on singular value decompositions, where access to the full tensor is necessary, or on cross interpolations, which only depend on sampling a function. When dealing with large binary tensors, the first approach becomes prohibitively expensive while the second approach might fail to converge due to the...

arXiv Physics 6d ago

A 10 year old Xeon is all you need (for 26B-A4B MTP Drafters without GPU)

A 10 year old Xeon is all you need 17 minutes read The previous post covered getting Gemma 4’s MTP drafters quantized and paired with a verifier. This one is about running the result on a machine that has no business running it. I have a recycled server.

Hacker News 9d ago

Bringing Up DeepSeek-V4-Flash on AMD MI300X

Bringing up DeepSeek-V4-Flash on AMD MI300X At Doubleword we are building an inference cloud designed for volume. To do that we have to reckon with the enveloping compute shortage. AMD’s MI300X launched in December 2023At AMD’s “Advancing AI” event, 6 December 2023.

Hacker News 8d ago

Graph Traversal on Tensor Cores: A BFS Framework for Modern GPUs

arXiv:2606.05081v1 Announce Type: new Abstract: Modern GPUs have Tensor Cores (TCs) capable of extremely high-throughput matrix operations, yet graph algorithms remain difficult to accelerate because of their irregular and data-dependent execution patterns. This work presents BLEST, a TC-accelerated framework that reformulates Breadth-First Search (BFS) as a bit-level sparse matrix-vector computation while addressing the load imbalance, memory inefficiency, and synchronization overheads that...

arXiv CS 6d ago

Finite-Iteration Local Dynamics and Warm Starts for Alternating Power Iteration in Spiked Tensor PCA

Announce Type: cross Abstract: We study simultaneous alternating power iteration for fixed-order asymmetric rank-one spiked tensor models. Our main contribution is a finite-iteration local theory that is independent of any particular initialization. Once the iterates enter a sufficiently small neighborhood of the planted rank-one direction, their error decomposes into a geometrically decaying transient and an intrinsic noise floor caused by fixed orthogonal noise contractions at the planted...

arXiv CS 6d ago