the Model Openness Framework
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
VLA-Arena: An Open-Source Framework for Benchmarking Vision-Language-Action Models
Announce Type: replace Abstract: While Vision-Language-Action models (VLAs) are rapidly advancing towards generalist robot policies, it remains difficult to quantitatively understand their limits and failure modes. To address this, we introduce a comprehensive benchmark called VLA-Arena. We propose a novel structured task design framework to quantify difficulty across three orthogonal axes: (1) Task Structure, (2) Language Command, and (3) Visual Observation.
Revisiting Vul-RAG: Reproducibility and Replicability of RAG-based Vulnerability Detection with Open-Weight Models
arXiv:2606.04739v1 Announce Type: new Abstract: Large language models (LLMs) have shown strong potential for automated software vulnerability detection, particularly in retrieval-augmented generation (RAG) settings. However, for approaches relying on proprietary models and APIs, reproducibility and replicability remain largely unexplored, raising the question of whether reported results generalize or depend primarily on specific model choices. In this work, we present a reproducibility study...
Anthropic's open-source framework for AI-powered vulnerability discovery
A reference implementation for autonomous vulnerability discovery and remediation with Claude, based on our learnings from partnering with security teams at several organizations since launching Claude Mythos Preview. For a write up of these learnings along with best practices, see the accompanying blog post (also available in blog-post.md ). For a lightweight SDK-only walkthrough of the same recon → find → triage → report → patch loop, see the companion cookbook.
Magenta RealTime 2: Open and Local Live Music Models
We’re excited to share Magenta RealTime 2 (MRT2), a state-of-the-art open model and efficient real-time inference engine that enables you to build and play AI musical instruments on your laptop! To get started, download the apps on your MacBook (requires Apple Silicon). Unlike other large generative music models that work offline to turn a prompt into a track, MRT2 is a live, interactive model that you can control with MIDI and audio, in addition to text.
Benchmarking Open-Source Layout Detection Models for Data Snapshot Extraction from Institutional Documents
arXiv:2606.06242v1 Announce Type: new Abstract: Institutional documents contain substantial amounts of operational and analytical information embedded within figures and tables. Current approaches for extracting visual content from documents are largely built around generic document layout analysis, where figures and tables are treated as uniformly relevant document objects rather than semantically meaningful analytical artifacts. In this work, we introduce a benchmark dataset and evaluation...
A Judge-Aware Ranking Framework for Evaluating Large Language Models without Ground Truth
arXiv:2601.21817v2 Announce Type: replace-cross Abstract: Evaluating large language models (LLMs) on open-ended tasks without ground-truth labels is increasingly done via the LLM-as-a-judge paradigm. A critical but under-modeled issue is that judge LLMs differ substantially in reliability; treating all judges equally can yield biased leaderboards and misleading uncertainty estimates. More data can make evaluation more confidently wrong under misspecified aggregation.
ROTS 2.0: A reproducibility-driven framework for robust statistical modeling across diverse high-throughput omics study designs
Reproducibility is fundamental to reliable scientific discoveries. The reproducibility-optimized test statistic (ROTS) is a robust framework designed to identify reproducible features (e.g. genes or proteins) in high-dimensional differential expression analyses such as transcriptomics and proteomics. This is achieved by optimizing the reproducibility of feature rankings under resampling.
A Latent Variable Framework for Scaling Laws in Large Language Models
Announce Type: replace-cross Abstract: We propose a statistical framework built on latent variable modeling for scaling laws of large language models (LLMs). Our work is motivated by the rapid emergence of numerous new LLM families with distinct architectures and training strategies, evaluated on an increasing number of benchmarks. This heterogeneity makes a single global scaling curve inadequate for capturing how performance varies across families and benchmarks.
A Theoretical Framework for Statistical Evaluability of Generative Models
arXiv:2604.05324v2 Announce Type: replace Abstract: Statistical evaluation aims to estimate the generalization performance of a model using held-out i.i.d. test data sampled from the ground-truth distribution. In supervised learning settings such as classification, performance metrics such as error rate are well-defined, and test error reliably approximates population error given sufficiently large datasets.
Ultra-Fast Implementation of Multivariate GWAS in Genomic SEM Using Flexible Analytic Estimation
Many medical, physiological, and psychiatric traits and disorders are highly polygenic and exhibit complex patterns of genetic sharing and differentiation. In 2018, we introduced Genomic Structural Equation Modelling (Genomic SEM) as a formal framework and free, open source, R-based software for modelling the multivariate genetic architecture of both continuous and binary Genome-Wide Association Study (GWAS) phenotypes, interrogating their joint and distinct functional genomic pathways, and...