Model Families
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Esoteric Language Models: A Family of Any-Order Diffusion LLMs
arXiv:2506.01928v4 Announce Type: replace Abstract: Diffusion-based language models offer a compelling alternative to autoregressive (AR) models by enabling parallel and controllable generation. Within this family, Masked Diffusion Models (MDMs) currently perform best but still underperform AR models in perplexity and lack key inference-time efficiency features, most notably KV caching. We introduce Eso-LMs, a new family of models that fuses AR and MDM paradigms, smoothly interpolating...
Breaking the Tokenizer Barrier: On-Policy Distillation across Model Families
arXiv:2606.09456v1 Announce Type: new Abstract: On-Policy Distillation (OPD) has become a core technique in the post-training of Large Language Models (LLMs) for transferring knowledge from domain experts to student models. However, existing OPD distillation methods require teacher and student models to share the same tokenizer, restricting the applicability of OPD within the model series. Current mainstream practice typically employs Supervised Fine-Tuning (SFT) on teacher-generated...
Identifying unique developers in OSS projects: A family of models
arXiv:2606.08096v1 Announce Type: new Abstract: Organizational and logical coupling metrics require reliable identification of unique developers. In OSS, commit metadata is limited to names and emails, and the same developer may appear under multiple aliases, which can distort coupling measurements if de-duplication is missing. We aim to build a scalable and accurate pipeline for OSS developer de-duplication and to provide guidance on choosing a model based on precision vs. computational effort.
Identifiable Markov Switching Models with Instantaneous Effects and Exponential Families
arXiv:2606.02231v1 Announce Type: cross Abstract: Temporal systems often exhibit non-stationary behaviour, such as seasonal climate variation or glucose fluctuations in patients with type-1 diabetes. One way to model non-stationarity is through discrete latent regimes, i.e., stationary segments of time. Such systems induce a Markov Switching Model (MSM), a class of Hidden Markov Models with autoregressive dependencies among latent regimes and observed variables.
ProtGPT3: an Open-source family of Promptable and Aligned Protein Language Models
Generative protein language models (pLMs) enable exploration of vast sequence spaces for protein design, but reliably controlling generation toward desired functional families remains challenging. While protein generation has broadly followed trends in NLP, two directions remain underexplored: alignment methods that optimize model behavior toward design objectives, and prompting-based control at inference time without fine-tuning. We introduce ProtGPT3, an open-source family of protein...
Probabilistic Data-Driven Modelling of Astrophysical Transients: The Neural Process Family for Ultrafast and Class-Agnostic Light Curve Reconstruction with NightLANP
Announce Type: replace-cross Abstract: Astrophysical observations from Earth are subject to weather, environmental, and scientific constraints that lead to sparse, irregular light curves. On the eve of the Vera C. Rubin Observatory Legacy Survey of Space and Time, its dataset offers unprecedented opportunities for transient science. Yet a key challenge remains its cadence, sparse and irregular across six bands, limiting inference.
Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO
arXiv:2605.30789v1 Announce Type: new Abstract: We identify a new dimension for enhancing rollout diversity in Group Relative Policy Optimization (GRPO) for LLMs. While GRPO relies on diverse rollouts, prevailing strategies primarily increase diversity by injecting more token-level randomness, which may introduce step-wise noise and lead to incoherent trajectories. We uncover that smaller models within the same model family inherently exhibit higher policy-level diversity, indicated by their...
Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO
arXiv:2605.30789v2 Announce Type: replace Abstract: We identify a new dimension for enhancing rollout diversity in Group Relative Policy Optimization (GRPO) for LLMs. While GRPO relies on diverse rollouts, prevailing strategies primarily increase diversity by injecting more token-level randomness, which may introduce step-wise noise and lead to incoherent trajectories. We uncover that smaller models within the same model family inherently exhibit higher policy-level diversity, indicated by...
NutriMLLM: Multimodal Large Language Models for Dietary Micronutrient Analysis
arXiv:2606.08948v1 Announce Type: new Abstract: Comprehensive estimation of dietary micronutrients from food images could improve clinical nutrition care, but training such models requires large multimodal datasets linking diverse foods to complete nutrient profiles. We first show that existing multimodal large language models (MLLMs), including leading proprietary models, are unreliable for this task. Across five model families and four independent evaluation benchmarks (ASA24, SNAPMe,...
ELF: A Family of Encoder-Free ECG-Language Models
arXiv:2601.18798v2 Announce Type: replace Abstract: ECG-Language Models (ELMs) extend recent advances in Multimodal Large Language Models (MLLMs) to automated ECG interpretation. However, most existing ELMs inherit Vision-Language Model (VLM) design choices and rely on pretrained ECG encoders, introducing substantial architectural and training complexity. Inspired by encoder-free VLMs, we introduce ELF, a family of three encoder-free ELM architectures that remain competitive with, and often...