Compute and Communication Co-design
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
CUCo: An Agentic Framework for Compute and Communication Co-design
Announce Type: replace Abstract: Computation and communication in distributed LLM training and inference are traditionally optimized in isolation; expert-crafted systems such as DeepEP, FLUX, and TokenWeave show the potential of co-design but require deep systems expertise and hardware-specific tuning; CUCo is an agentic framework that automates compute-communication co-design of CUDA kernels by combining a structured design-space formalization with a correctness-first fast-path agent for...
Building user-driven climate adaptation products
Abstract Climate adaptation products have traditionally been developed using a supply-driven model reliant on available climate information, leading to usability gaps1,2,3,4. To better meet user needs, the climate services field has recognized a need to shift towards a demand-driven model emphasizing co-production, that is, user-driven, scientifically informed products created through shared knowledge practices1,2,3,4,5. However, co-production can be challenging, especially for researchers...
An Asynchronous Two-Speed Kalman Filter for Real-Time UUV Cooperative Navigation Under Acoustic Delays
arXiv:2604.02878v2 Announce Type: replace Abstract: In Global Navigation Satellite System (GNSS)-denied underwater environments, individual unmanned underwater vehicles (UUVs) suffer from unbounded dead-reckoning drift, making collaborative navigation (CN) crucial for accurate state estimation. However, the severe communication delay inherent in underwater acoustic channels poses serious challenges to real-time state estimation. Traditional filters, such as Extended Kalman Filters (EKFs) or...
In-Memory Computing Enabled Deep MIMO Detection to Support Ultra-Low-Latency Communications
arXiv:2508.17820v2 Announce Type: replace Abstract: The development of sixth-generation (6G) mobile networks imposes unprecedented latency and reliability demands on multiple-input multiple-output (MIMO) communication systems, a key enabler of high-speed radio access. Recently, deep unfolding-based detectors, which map iterative algorithms onto neural network architectures, have emerged as a promising approach, combining the strengths of model-driven and data-driven methods to achieve high...
HE^2: A Communication-Light Heterogeneous Architecture for Efficient Fully Homomorphic Encryption
arXiv:2605.31004v1 Announce Type: new Abstract: CKKS, an emerging fully homomorphic encryption (FHE) scheme, has been promising in privacy-preserving applications by enabling SIMD fixed-point computations on ciphertexts. Despite its strong security guarantees, CKKS involves both compute-intensive operators (ComOps) with high computational cost and memory-intensive operators (MemOps) with large memory footprints, making existing ASIC-based or NMP-based acceleration approaches suffer from high...
Microsoft’s AI chief says superintelligence is near, but won’t take your job
Today I’m talking with Mustafa Suleyman, the CEO of Microsoft AI. And I’m actually going to keep today’s intro short — I’m working from my wife’s family farm this week, as you’ll see in the video, but also this is a real burner of an episode. We covered everything from Mustafa’s approach to training new models to his criticisms of Anthropic talking about Claude as though it is conscious.
UltraEP: Unleash MoE Training and Inference on Rack-Scale Nodes with Near-Optimal Load Balancing
arXiv:2606.04101v1 Announce Type: new Abstract: Large-scale expert parallelism (EP) is becoming pivotal for training and serving frontier MoE models, but it also amplifies device-level expert load imbalance into compute stragglers, token all-to-all bottlenecks, and activation-memory spikes. Existing balancers redistribute experts periodically based on historical load, which becomes unreliable for production deployments with non-stationary load patterns. We present UltraEP, the first...
UltraEP: Unleash MoE Training and Inference on Rack-Scale Nodes with Near-Optimal Load Balancing
arXiv:2606.04101v2 Announce Type: replace Abstract: Large-scale expert parallelism (EP) is becoming pivotal for training and serving frontier MoE models, but it also amplifies device-level expert load imbalance into compute stragglers, token all-to-all bottlenecks, and activation-memory spikes. Existing balancers redistribute experts periodically based on historical load, which becomes unreliable for production deployments with non-stationary load patterns. We present UltraEP, the first...
Ahoy, DECmate II the little PDP-8 that could
Now, that's a lot of word processing. But under the hood it's still at least PDP-8 adjacent, even considering its oddities and incompatibilities, and you can make it do many of the things a full-size Eight can. We'll take this basic unit, convert the floppy drives to solid state, tap the video output, and put it through its paces.
Nvidia's entrance into the PC market gives investors another reason to own the stock
Nvidia has added another leg to its investment case, planted far away from the data center. It's on your desk at the office and at home. At the influential Computex conference in Taiwan, CEO Jensen Huang focused the first half of his keynote address on the data center and the wonders of Nvidia's Vera computing platform for agentic AI workloads.