Home › Knowledge Base › LLM/VLM

LLM/VLM

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

DAST: A VLM-LLM Framework for Cross-Interface Anomaly Detection in O-RAN

Announce Type: new Abstract: O-RAN enables a disaggregated baseband stack with programmable functions that communicate over standardized open interfaces. The same openness that enables multi-vendor composition also expands the attack surface across logically decoupled tiers that make up the compute continuum. Among these threats, Denial-of-Service and performance-degradation attacks, which account for the majority of catalogued O-RAN threats, are particularly difficult to detect.

arXiv CS 5d ago

Activation-Informed Pareto-Guided Low-Rank Compression for Efficient LLM/VLM

Announce Type: replace Abstract: Large language models (LLM) and vision-language models (VLM) have achieved state-of-the-art performance, but they impose significant memory and computing challenges in deployment. We present a novel low-rank compression framework to address this challenge. First, we upper bound the change of network loss via layer-wise activation-based compression errors, filling a theoretical gap in the literature.

arXiv CS 5d ago

Function2Scene: 3D Indoor Scene Layout from Functional Specifications

arXiv:2605.30819v1 Announce Type: new Abstract: Most text-driven 3D indoor scene synthesis methods generate rooms from object-centric prompts, asking what furniture should be placed rather than how the space is used. Yet in real interior design, a layout is judged by how well it supports its occupants, e.g., their activities and physical needs.

arXiv CS 9d ago

An Analysis Focused on Womens Safety: Can VAD Models Be Enhanced by a Multi-modal Dataset?

arXiv:2605.25806v2 Announce Type: replace Abstract: Women's safety and security are paramount for a modern society. Crimes against women occur in daylight as well as in low-light conditions. Often, such events are captured through real-world surveillance cameras that operate at lower resolutions.

arXiv CS 2d ago

Deep Interest Mining for Intent-Enriched Semantic IDs in Multimodal Generative Recommendation

Announce Type: replace Abstract: Semantic IDs (SIDs) provide the discrete item vocabulary used by generative recommendation, but their quality depends on what item evidence is preserved before quantization. In product recommendation, surface metadata often misses latent usage intent, visual evidence may be only weakly reflected in text, and downstream policy learning provides sparse feedback about whether a generated SID corresponds to a semantically useful item. We introduce...

arXiv CS 8d ago

CRAFT: Coaching Reinforcement Learning Autonomously using Foundation Models for Multi-Robot Coordination Tasks

Announce Type: replace Abstract: Multi-Agent Reinforcement Learning (MARL) provides a powerful framework for learning coordination in multi-agent systems. However, applying MARL to robotics remains challenging due to their high-dimensional continuous joint action spaces, complex reward design, and non-stationarity from concurrently learning agents. On the other hand, humans often learn complex coordination with the help of coaches, who guide learning through carefully designed curricula and...

arXiv CS 2d ago

"\^{I}n\c{t}elegi Rom\^ane\c{s}te?'' A Recipe for Romanian Vision-Language Models

arXiv:2605.31401v2 Announce Type: replace Abstract: Vision-Language Models (VLMs) largely follow the text-only LLM trajectory, excelling on English benchmarks but sharply degrading on low-resource languages, where neither large-scale image-text corpora nor culturally grounded evaluations exist. We present a systematic study of building a language-specific VLM for Romanian, covering the full pipeline from data construction to architectural choices. We translate established English VLM...

arXiv CS 8d ago

"In\^{t}elegi Rom\^ane\c{s}te?'' A Recipe for Romanian Vision-Language Models

arXiv:2605.31401v1 Announce Type: new Abstract: Vision-Language Models (VLMs) largely follow the text-only LLM trajectory, excelling on English benchmarks but sharply degrading on low-resource languages, where neither large-scale image-text corpora nor culturally grounded evaluations exist. We present a systematic study of building a language-specific VLM for Romanian, covering the full pipeline from data construction to architectural choices. We translate established English VLM training...

arXiv CS 9d ago

WildRoadBench: A Wild Aerial Road-Damage Grounding Benchmark for Vision-Language Models and Autonomous Agents

arXiv:2605.20306v2 Announce Type: replace Abstract: We introduce WildRoadBench, a wild aerial road-damage grounding benchmark that couples direct visual grounding by vision-language models with autonomous research-and-engineering by LLM-driven agents on a single professionally annotated UAV corpus. The same image set and the same per-class AP_50 metric are evaluated under two protocols. The VLM Track measures whether a fixed VLM can localise domain-specific damage from one image and one...

arXiv CS 7d ago

Enginuity: A Dataset and Benchmark for Vision-Language Understanding of Engineering Diagrams

Announce Type: new Abstract: Engineering diagrams pose a distinct challenge for vision-language models: unlike natural images or general documents, they encode information through dense spatial layouts, domain-specific symbols, and cross-references between visual callouts and structured parts tables. Despite their centrality to service, repair, and design workflows, there is no public benchmark for measuring VLM capabilities in this domain; existing datasets primarily focus on flowcharts,...

arXiv CS 7d ago