Home Knowledge Base Modern Data Systems

Modern Data Systems

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Kore: Binary File Format Optimized for Modern Data Systems (Open Source)

The fastest, most compressed columnar format for big data | v0.1.0 KORE is a high-performance binary file format optimized for analytical workloads. It provides: - 38% compression ratio (vs 63% for Parquet) - 131x query speedup with column pruning & predicate pushdown - Zero data loss verification (400K+ cells tested) - Native Spark integration — read/write with PySpark Add this crate as a dependency (when published) or include from path: use kore_fileformat::*; // Write data...

Hacker News 10d ago

Who Gets Credit or Blame? Attributing Accountability in Modern AI Systems

Announce Type: replace Abstract: Modern AI systems are typically developed through multiple stages-pretraining, fine-tuning rounds, and subsequent adaptation or alignment, where each stage builds on the previous ones and updates the model in distinct ways. This raises a critical question of accountability: when a deployed model succeeds or fails, which stage is responsible, and to what extent? We pose the accountability attribution problem for tracing model behavior back to specific stages...

arXiv CS 9d ago

Architectural Evolution and Selection Framework for Database Systems in AI-Ready Data Platforms

arXiv:2606.08317v1 Announce Type: new Abstract: The rise of polyglot data management and AI-ready database architectures has created a complex design space across diverse database paradigms. However, architecture selection in modern enterprise environments continues to rely heavily on ad-hoc engineering intuition, with limited systematic frameworks to guide decision-making across heterogeneous database systems.

arXiv CS 1d ago

TeeDAO: A Decentralized Autonomous Organization for Heterogeneous TEEs

Announce Type: new Abstract: Trusted Execution Environments (TEEs) have emerged as a critical technology for safeguarding sensitive data and ensuring code integrity in modern computing systems. However, relying on a single TEE implementation makes systems vulnerable to a central point of attack. Building distributed-trust systems leveraging heterogeneous TEEs helps disperse trust but still faces threats from centralized management and adaptive mobile adversaries.

arXiv CS 6d ago

S3TS: Stochastic Scenario-Structured Tree Search for Advanced Planning Under Uncertainty

arXiv:2606.02151v1 Announce Type: new Abstract: Effective scheduling in the energy sector is essential to ensure the reliable operation of electrical grids and their connected assets by, for instance, optimizing the dispatch of generation units and storage systems. An effective planning strategy must (a) accommodate advanced and potentially non-linear system models -- exploiting the increasing data availability of modern grids, and (b) explicitly handle uncertainties arising, for instance,...

arXiv CS 8d ago

Scalable Temporal Anomaly Causality Discovery in Large Systems: Achieving Computational Efficiency with Binary Anomaly Flag Data

arXiv:2412.11800v4 Announce Type: replace Abstract: Extracting anomaly causality facilitates diagnostics once monitoring systems detect system faults. Identifying anomaly causes in large systems involves investigating a broader set of monitoring variables across multiple subsystems. However, learning graphical causal models (GCMs) comes with a significant computational burden that restrains the applicability of most existing methods in real-time and large-scale deployments.

arXiv CS 5d ago

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

Announce Type: new Abstract: Retrieval systems underpin modern AI applications -- spanning visual search, recommendation engines, and multi-modal question answering. Modern multi-stage retrieval systems require the joint optimization of highly coupled parameters, yet traditional hyperparameter optimization (HPO) methods -- including Tree-structured Parzen Estimators (TPE) and Gaussian Process Bayesian Optimization -- rely on an independence assumption that fundamentally prevents them from...

arXiv CS 5d ago

A Geometric View of Counterfactual Behavior: Interaction of Boundary Proximity and Local Support

arXiv:2606.04209v1 Announce Type: new Abstract: Counterfactual explanations seek small, semantically meaningful changes to an input that alter a model's prediction, and are widely used to interpret and audit machine learning systems. In modern vision, language, and multimodal systems, pretrained encoders map inputs to representation spaces, and downstream classifier heads impose decision boundaries within those spaces. As a result, the feasibility and distance of nearby counterfactuals...

arXiv CS 6d ago

RelGT-AC: A Relational Graph Transformer for Autocomplete Tasks in Relational Databases

arXiv:2606.03040v1 Announce Type: new Abstract: Relational databases underpin modern enterprise, scientific, and healthcare systems, yet predictive machine learning on such data remains challenging due to their multi-table, heterogeneous, and temporal structure. Relational Deep Learning (RDL) addresses this by representing databases as heterogeneous graphs and applying graph neural networks (GNNs) directly. RelBench v2 recently introduced autocomplete tasks -- a practically motivated task...

arXiv CS 7d ago

Peer-to-Peer Cloud Service Market for Data Centers Oriented to Computation-Electricity Coordination

Announce Type: new Abstract: Energy-intensive data centers (DCs) have emerged as substantial and flexible loads in modern power systems, underscoring the critical need for computation-electricity coordination. Harnessing the spatio-temporal flexibility of DC workloads is a promising approach to facilitate this coordination. However, existing studies overlook the collaborative potential of computational resource sharing among geo-distributed DCs, thereby failing to fully unlock this flexibility.

arXiv CS 6d ago