Home › Knowledge Base › LMM

LMM

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

LMM-IR: Large-Scale Netlist-Aware Multimodal Framework for Static IR-Drop Prediction

arXiv:2511.12581v2 Announce Type: replace Abstract: Static IR drop analysis is a fundamental and critical task in the field of chip design. Nevertheless, this process can be quite time-consuming, potentially requiring several hours. Moreover, addressing IR drop violations frequently demands iterative analysis, thereby causing the computational burden.

arXiv CS 6d ago

Multimodal Function Vectors for Visual Relations

arXiv:2510.02528v2 Announce Type: replace Abstract: Large Multimodal Models (LMMs) demonstrate impressive in-context learning abilities from few multimodal demonstrations, yet the internal mechanisms supporting such task learning remain opaque. Building on prior work of Large Language Models, we show that a small subset of attention heads in Large Multimodal Models is responsible for transmitting representations of visual relations. The activations of these attention heads, termed function...

arXiv CS 8d ago

Video Reasoning without Training

arXiv:2510.17045v2 Announce Type: replace Abstract: Video reasoning using Large Multimodal Models (LMMs) relies on costly reinforcement learning (RL) and verbose chain-of-thought, resulting in substantial computational overhead during both training and inference. Moreover, the mechanisms that control the thinking process in these reasoning models are very limited. In this paper, we use the entropy of the model's output distribution as a signal to study and guide reasoning behavior.

arXiv CS 8d ago

Repeatability and Heritability of UAV-Derived Canopy Traits in a Cassava Breeding Population Using Time-Series Data from Two Consecutive Growing Seasons

Cassava is a major staple crop in tropical regions, particularly in Sub-Saharan Africa, yet its productivity remains constrained by genetic and agronomic limitations. A major bottleneck in cassava breeding is the difficulty of accurately phenotyping agronomic traits under field conditions using conventional, labor-intensive methods. Here, we evaluated the potential of uncrewed aerial vehicle (UAV)-based phenotyping to quantify canopy growth traits and assess their genetic relevance under...

bioRxiv 4d ago

Latent Implicit Visual Reasoning

arXiv:2512.21218v2 Announce Type: replace Abstract: While Large Multimodal Models (LMMs) have made significant progress, they remain largely text-centric, relying on language as their core reasoning modality. As a result, they are limited in their ability to handle reasoning tasks that are predominantly visual. Recent approaches have sought to address this by supervising intermediate visual steps with helper images, depth maps, or image crops.

arXiv CS 5d ago

Beyond Generative Decoding: Discriminative Hidden-State Readout from a Native Omni-Modal LLM for Multimodal Sentiment Analysis

arXiv:2606.05713v1 Announce Type: new Abstract: Multimodal sentiment analysis (MSA) infers human affect from language, acoustic, and visual signals. Recent methods increasingly adapt large multimodal models (LMMs) via generative readout: prompting the model to emit a sentiment score as a text string. While convenient, this ties continuous regression to discrete autoregressive decoding, incurring unmeasured costs.

arXiv CS 5d ago

Whole-genome duplication shaped cell-type evolution in the vertebrate brain

Abstract The complex brains of vertebrates have more cell types than those of their closest relatives. Whole-genome duplications (WGDs) occurred during early vertebrate evolution1, but it is unclear whether the duplicated genes (ohnologues) facilitated cell-type evolution. Here using brain single-cell transcriptomes from five chordates—human2, mouse3, lizard4, lamprey5 and amphioxus—we report that many cell-type families with conserved core transcription factors in vertebrates do not show...

Nature 18h ago

SRUG: Shadow-Guided Relightable Urban Scene with Generation Model

arXiv:2605.24700v3 Announce Type: replace Abstract: Creating relightable urban scenes from images or videos is widely useful but highly ill-posed. Urban environments are typically unbounded and extend beyond the visible regions. As a result, many portions of the scene remain unobserved, yet these invisible regions can cast shadows onto visible areas.

arXiv CS 9d ago