Home Business & Finance Kore: Binary File Format Optimized for Modern Data...
Business & Finance

Kore: Binary File Format Optimized for Modern Data Systems (Open Source)

Key Points

The fastest, most compressed columnar format for big data | v0.1.0 KORE is a high-performance binary file format optimized for analytical workloads. It provides: - 38% compression ratio (vs 63% for Parquet) - 131x query speedup with column pruning & predicate pushdown - Zero data loss verification (400K+ cells tested) - Native Spark integration — read/write with PySpark Add this crate as a dependency (when published) or include from path: use kore_fileformat::*; // Write data...

The fastest, most compressed columnar format for big data | v0.1.0 KORE is a high-performance binary file format optimized for analytical workloads. It provides: - 38% compression ratio (vs 63% for Parquet) - 131x query speedup with column pruning & predicate pushdown - Zero data loss verification (400K+ cells tested) - Native Spark integration — read/write with PySpark Add this crate as a dependency (when published) or include from path: use kore_fileformat::*; // Write data kore_write_simple("output.kore", schema_json, data_json)?; // Read data let data = kore_read_simple("output.kore")?; // Read specific column let col = kore_read_col_simple("output.kore", "column_name")?; // Get file info let info = kore_info_simple("output.kore")?; from pyspark.sql import SparkSession from kore import KoreDataFrameReader, KoreDataFrameWriter spark = SparkSession.builder.appName("KoreExample").getOrCreate() # Read Kore file df = KoreDataFrameReader(spark).load("data.kore") # Write to Kore (38% compression!) KoreDataFrameWriter(df).mode("overwrite").save("output.kore") # Spark SQL support (3.5+) spark.read.format("kore").load("file.kore").show() See python/README.md for full PySpark documentation. Publishing checklist - Ensure Cargo.toml metadata is correct (authors, repository, keywords). - Add LICENSE file if required (MIT by default here). - Replace any unimplemented!() stubs with full implementations if you need runtime functionality. - Run cargo build --release andcargo test to verify compilation and tests. - Optionally add CI configuration (GitHub Actions) for cargo test andcargo clippy . Notes This workspace contains copies of the original KORE source files. Some long implementations were stubbed out in this initial export; if you want the full original source code included verbatim, I can replace the stubs with the complete implementations from the upstream project files.
Modern Data Systems (ORG) KORE (ORG) PySpark Add (ORG) SparkSession (ORG) kore import KoreDataFrameReader (LOCATION) Spark SQL (ORG) MIT (ORG) CI (ORG)
Originally published by Hacker News Read original →