Home Knowledge Base the Model Openness Framework

the Model Openness Framework

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

VLA-Arena: An Open-Source Framework for Benchmarking Vision-Language-Action Models

Announce Type: replace Abstract: While Vision-Language-Action models (VLAs) are rapidly advancing towards generalist robot policies, it remains difficult to quantitatively understand their limits and failure modes. To address this, we introduce a comprehensive benchmark called VLA-Arena. We propose a novel structured task design framework to quantify difficulty across three orthogonal axes: (1) Task Structure, (2) Language Command, and (3) Visual Observation.

arXiv CS 7d ago

Revisiting Vul-RAG: Reproducibility and Replicability of RAG-based Vulnerability Detection with Open-Weight Models

arXiv:2606.04739v1 Announce Type: new Abstract: Large language models (LLMs) have shown strong potential for automated software vulnerability detection, particularly in retrieval-augmented generation (RAG) settings. However, for approaches relying on proprietary models and APIs, reproducibility and replicability remain largely unexplored, raising the question of whether reported results generalize or depend primarily on specific model choices. In this work, we present a reproducibility study...

arXiv CS 6d ago

Anthropic's open-source framework for AI-powered vulnerability discovery

A reference implementation for autonomous vulnerability discovery and remediation with Claude, based on our learnings from partnering with security teams at several organizations since launching Claude Mythos Preview. For a write up of these learnings along with best practices, see the accompanying blog post (also available in blog-post.md ). For a lightweight SDK-only walkthrough of the same recon → find → triage → report → patch loop, see the companion cookbook.

Hacker News 5d ago

Magenta RealTime 2: Open and Local Live Music Models

We’re excited to share Magenta RealTime 2 (MRT2), a state-of-the-art open model and efficient real-time inference engine that enables you to build and play AI musical instruments on your laptop! To get started, download the apps on your MacBook (requires Apple Silicon). Unlike other large generative music models that work offline to turn a prompt into a track, MRT2 is a live, interactive model that you can control with MIDI and audio, in addition to text.

Hacker News 5d ago

Benchmarking Open-Source Layout Detection Models for Data Snapshot Extraction from Institutional Documents

arXiv:2606.06242v1 Announce Type: new Abstract: Institutional documents contain substantial amounts of operational and analytical information embedded within figures and tables. Current approaches for extracting visual content from documents are largely built around generic document layout analysis, where figures and tables are treated as uniformly relevant document objects rather than semantically meaningful analytical artifacts. In this work, we introduce a benchmark dataset and evaluation...

arXiv CS 5d ago

A Judge-Aware Ranking Framework for Evaluating Large Language Models without Ground Truth

arXiv:2601.21817v2 Announce Type: replace-cross Abstract: Evaluating large language models (LLMs) on open-ended tasks without ground-truth labels is increasingly done via the LLM-as-a-judge paradigm. A critical but under-modeled issue is that judge LLMs differ substantially in reliability; treating all judges equally can yield biased leaderboards and misleading uncertainty estimates. More data can make evaluation more confidently wrong under misspecified aggregation.

arXiv CS 5d ago

ROTS 2.0: A reproducibility-driven framework for robust statistical modeling across diverse high-throughput omics study designs

Reproducibility is fundamental to reliable scientific discoveries. The reproducibility-optimized test statistic (ROTS) is a robust framework designed to identify reproducible features (e.g. genes or proteins) in high-dimensional differential expression analyses such as transcriptomics and proteomics. This is achieved by optimizing the reproducibility of feature rankings under resampling.

bioRxiv 7d ago

A Latent Variable Framework for Scaling Laws in Large Language Models

Announce Type: replace-cross Abstract: We propose a statistical framework built on latent variable modeling for scaling laws of large language models (LLMs). Our work is motivated by the rapid emergence of numerous new LLM families with distinct architectures and training strategies, evaluated on an increasing number of benchmarks. This heterogeneity makes a single global scaling curve inadequate for capturing how performance varies across families and benchmarks.

arXiv CS 6d ago

A Theoretical Framework for Statistical Evaluability of Generative Models

arXiv:2604.05324v2 Announce Type: replace Abstract: Statistical evaluation aims to estimate the generalization performance of a model using held-out i.i.d. test data sampled from the ground-truth distribution. In supervised learning settings such as classification, performance metrics such as error rate are well-defined, and test error reliably approximates population error given sufficiently large datasets.

arXiv CS 8d ago

Ultra-Fast Implementation of Multivariate GWAS in Genomic SEM Using Flexible Analytic Estimation

Many medical, physiological, and psychiatric traits and disorders are highly polygenic and exhibit complex patterns of genetic sharing and differentiation. In 2018, we introduced Genomic Structural Equation Modelling (Genomic SEM) as a formal framework and free, open source, R-based software for modelling the multivariate genetic architecture of both continuous and binary Genome-Wide Association Study (GWAS) phenotypes, interrogating their joint and distinct functional genomic pathways, and...

bioRxiv 6d ago