Data Compression
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Data Compression with Stochastic Codes
arXiv:2602.07635v2 Announce Type: replace Abstract: Machine learning has had a major impact on data compression over the last decade and opened up many new theoretical and applied fields of inquiry. This paper describes one such direction -- relative entropy coding -- which focuses on constructing stochastic codes, mainly as an alternative to quantisation and entropy coding in lossy source coding. Our primary aim is to provide a broad overview of the topic, with an emphasis on the...
When Entropy Is Not Enough: Multi-Modal Classification of Encrypted and Compressed Data Fragments
arXiv:2605.31337v1 Announce Type: new Abstract: Reliable identification of encrypted data fragments is essential in cybersecurity, with applications to ransomware detection, digital forensics, and large-scale data analysis. Distinguishing encrypted from compressed fragments is particularly challenging, as short fragments lack structural data and exhibit low statistical redundancy. Traditional statistical methods based on byte-level distributions show limited effectiveness on this task.
Residual Modeling for High-Fidelity Learned Compression of Scientific Data
arXiv:2606.05389v1 Announce Type: new Abstract: Lossy compression is essential for massive spatiotemporal data from scientific simulations. Learned compressors can achieve high compression ratios at moderate accuracy targets, but their aggregate reconstruction losses do not guarantee accuracy for each block. Existing Guaranteed Autoencoder (GAE) methods add a per-block residual correction by retaining SVD/PCA-style coefficients until the target is met.
Tensor Network Lattice Boltzmann Method for Data-Compressed Fluid Simulations
Announce Type: replace Abstract: Resolving unsteady transport phenomena in geometrically complex domains is traditionally constrained by polynomial scaling of computational cost with spatial resolution. While methods based on tensor-network data representations or matrix-product states (MPS) data encodings have emerged as a technique to systematically reduce degrees of freedom, existing formulations do not extend to complex geometries and complex flow physics. Both capabilities are offered...
A Geometric Lens on Physics-Aligned Data Compression
arXiv:2606.03279v1 Announce Type: new Abstract: In AI for Science, physics-informed losses are increasingly used to train learned compressors for scientific data, but their rate-distortion implications remain poorly understood. At fixed bitrate, these objectives often improve preservation of a target physical observable while degrading standard reconstruction fidelity. We develop a local geometric theory showing that this tradeoff is governed by the interaction of latent-space sensitivities...
Float8@2bits: Entropy Coding Enables Data-Free Model Compression
arXiv:2601.22787v2 Announce Type: replace Abstract: Post-training compression is currently divided into two contrasting regimes. On the one hand, fast, data-free, and model-agnostic methods (e.g., NF4 or HQQ) offer maximum accessibility but suffer from functional collapse at extreme bit-rates below 4 bits. On the other hand, techniques leveraging calibration data or extensive recovery training achieve superior fidelity but impose high computational constraints and face uncertain robustness...
Kore: Binary File Format Optimized for Modern Data Systems (Open Source)
The fastest, most compressed columnar format for big data | v0.1.0 KORE is a high-performance binary file format optimized for analytical workloads. It provides: - 38% compression ratio (vs 63% for Parquet) - 131x query speedup with column pruning & predicate pushdown - Zero data loss verification (400K+ cells tested) - Native Spark integration — read/write with PySpark Add this crate as a dependency (when published) or include from path: use kore_fileformat::*; // Write data...
RAVQ-HoloNet: Rate-Adaptive Vector-Quantized Hologram Compression
arXiv:2511.21035v2 Announce Type: replace Abstract: Holography offers significant potential for AR/VR applications. However, its adoption is limited by the high demand for data compression. Existing deep learning approaches generally lack rate adaptivity within a single network and often require multiple models to cover different bandwidth requirements.
TextEconomizer: Enhancing Lossy Text Compression with Denoising Transformers and Entropy Coding
Announce Type: new Abstract: Lossy text compression reduces data size while preserving core meaning, making it well-suited for summarization, automated analysis, and digital archives. Despite the dominance of transformer-based models in language modeling, integrating context vectors and entropy coding into Sequence-to-Sequence (Seq2Seq) generation remains underexplored. A key challenge lies in identifying the most informative context vectors from encoder output and incorporating entropy...
Leveraging Soft Distributions of SSL-Derived Discrete Speech Tokens for Downstream Inference
arXiv:2606.06806v1 Announce Type: new Abstract: Discrete speech tokens obtained from self-supervised learning (SSL) models provide efficient data compression while maintaining strong performance, and have been widely used as intermediate representations in various tasks. However, discretization inevitably causes information loss, leading to degraded performance compared with continuous SSL features. In this work, we propose to apply soft token assignment only during downstream inference.