Home Knowledge Base Visual Prompt Engineering

Visual Prompt Engineering

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Imagine Before You Draw: Visual Prompt Engineering for Image Generation

arXiv:2606.04457v1 Announce Type: new Abstract: Incorporating visual semantic representations as an intermediate step before image generation can reduce the modeling difficulty between text and images, thereby improving generation quality. Recent works such as X-Omni and BLIP3o-Next have explored this direction, but they typically use a two-stage external pipeline: a separate autoregressive model first generates semantic tokens, which are then fed as conditioning to an independent diffusion...

arXiv CS 6d ago

Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models

arXiv:2503.08434v5 Announce Type: replace Abstract: Recent advances in large-scale text-to-image models have revolutionized creative fields by generating visually captivating outputs from textual prompts; however, while traditional photography offers precise control over camera settings to shape visual aesthetics - such as depth-of-field via aperture - current diffusion models typically rely on prompt engineering to mimic such effects. This approach often results in crude approximations and...

arXiv CS 1d ago

LLM-Guided Evolution for Medical Decision Pipelines

Announce Type: new Abstract: Adapting large language models (LLMs) to clinical workflows often requires costly fine-tuning or manual prompt and pipeline engineering. We study LLM-guided MAP-Elites evolution as an inference-time alternative for discovering medical decision strategies and provide an implementation repository at https://github.com/univanxx/llm_guided_evo_medical. We formulate urgency triage, interactive consultation, and medical image classification as evolutionary searches...

arXiv CS 2d ago

Anatomy-Anchored Self-Supervision: Distilling Vision Foundation Models for Invariant Ultrasound Representation

arXiv:2605.25402v3 Announce Type: replace Abstract: Self-supervised pre-training paradigm has gained increasing prominence for learning transferable representations in medical imaging, yet existing methods for ultrasound (US) images operate at the image or frame level, overlooking the anatomical context for clinical-aligned representation learning. In this work, we propose an anatomy-anchored ultrasound self-supervision framework ANAUS that shifts representation learning from generic visual...

arXiv CS 6d ago

Anatomy-Anchored Self-Supervision: Distilling Vision Foundation Models for Invariant Ultrasound Representation

arXiv:2605.25402v2 Announce Type: replace Abstract: Self-supervised pre-training paradigm has gained increasing prominence for learning transferable representations in medical imaging, yet existing methods for ultrasound (US) images operate at the image or frame level, overlooking the anatomical context for clinical-aligned representation learning. In this work, we propose an anatomy-anchored ultrasound self-supervision framework ANAUS that shifts representation learning from generic visual...

arXiv CS 7d ago

WildRoadBench: A Wild Aerial Road-Damage Grounding Benchmark for Vision-Language Models and Autonomous Agents

arXiv:2605.20306v2 Announce Type: replace Abstract: We introduce WildRoadBench, a wild aerial road-damage grounding benchmark that couples direct visual grounding by vision-language models with autonomous research-and-engineering by LLM-driven agents on a single professionally annotated UAV corpus. The same image set and the same per-class AP_50 metric are evaluated under two protocols. The VLM Track measures whether a fixed VLM can localise domain-specific damage from one image and one...

arXiv CS 7d ago

Did Claude increase bugs in rsync?

A simple distributional analysis of every rsync release with bug data. Nothing complicated, answers only one question: are the Claude-assisted releases unusually buggy? In order to avoid accuastions of this "just being Claude defending Claude," "AI slop," "probably all hallucinations," etc., I've decided it's probably worth explaining a few key points about how this report was created: In late May 2026, rsync blew up.

Hacker News 5d ago

Beyond Generative Decoding: Discriminative Hidden-State Readout from a Native Omni-Modal LLM for Multimodal Sentiment Analysis

arXiv:2606.05713v1 Announce Type: new Abstract: Multimodal sentiment analysis (MSA) infers human affect from language, acoustic, and visual signals. Recent methods increasingly adapt large multimodal models (LMMs) via generative readout: prompting the model to emit a sentiment score as a text string. While convenient, this ties continuous regression to discrete autoregressive decoding, incurring unmeasured costs.

arXiv CS 5d ago

Agentic AI arrives for Delphi and C++ Builder

Embarcadero has released Kai, an agentic AI assistant for RAD Studio, an IDE (integrated development environment) for Delphi and C++ Builder. Kai is offered as an extension, which means that by default RAD Studio lacks AI capabilities. The extension provides chat, code completion, and an MCP (model context protocol) server to enable other AI agents to communicate with the IDE.

The Register 9d ago

From Zero to Hero: Training-Free Custom Concept Spawning in World Models

arXiv:2606.02575v1 Announce Type: new Abstract: Autoregressive world models have emerged as a powerful paradigm for interactive video generation, allowing users to navigate dynamically generated environments through actions. These models are typically conditioned on a text prompt and/or a single reference frame, from which the entire world is generated. Yet the moment the user navigates beyond what is visible in that frame, the unseen regions are populated by the base model's priors, with no...

arXiv CS 8d ago