Home › Knowledge Base › GPT OSS

GPT OSS

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

LLM Consortium for Software Design Refinement: A Controlled Experiment on Multi-Agent Collaboration Topologies

Announce Type: new Abstract: We present a controlled experiment evaluating 12 multi-agent LLM collaboration topologies for software architecture design. Using a $2\times2\times2$ factorial design (Authority $\times$ Roles $\times$ Dynamics), we conducted 520 experimental runs across 8 design tasks of varying complexity, with 5 repetitions each. Designs were evaluated on a 12-dimensional rubric by three independent automated evaluators (GPT-OSS 120B, Claude Opus 4.6, Claude Sonnet 4.6).

arXiv CS 8d ago

Beyond Pass/Fail: Using Process Mining to Understand How LLMs Resist (and Fail) Red Team Attacks

Announce Type: new Abstract: Standard AI red teaming evaluations reduce adversarial campaigns to a single binary outcome, attack success rate (ASR), not taking into account the sequential structure of how models resist or yield to attacks. We propose applying process mining, a discipline for discovering and analyzing process models from event logs, to red teaming traces. We conduct a controlled experiment pitting 60 HarmBench prompts against two LLMs, GPT-OSS 120B and Llama 3.3 70B, using 10...

arXiv CS 1d ago

LLM-Based Porting of Optimized C++ to CUDA Through Deoptimization and Reoptimization

arXiv:2606.06063v1 Announce Type: new Abstract: When porting high-performance computing (HPC) code from CPU to GPU, CPU-oriented optimizations may obstruct LLM-based CUDA translation. We design and evaluate a Deopt-Reopt workflow that first simplifies the input C++ code and then retranslates and reoptimizes it for CUDA, comparing it against direct translation (Direct) on twelve HPC kernels with two LLMs (gpt-oss-120b (O120) and qwen-3-235b-a22b-instruct-2507 (Q235)) in Single-shot (one pass)...

arXiv CS 5d ago

Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs

arXiv:2603.24511v2 Announce Type: replace Abstract: We show that AI agents are capable of discovering novel algorithms for adversarial attacks against LLMs, advancing the state of the art on white-box jailbreaking and prompt injection evaluations. We deploy frontier agents, such as Claude Code and Codex, in an autoresearch loop with access to a library of 30+ prior methods and an evaluation script with a fixed compute budget. We show this pipeline to be effective in jailbreaking OpenAI's...

arXiv CS 8d ago

Compress-Distill: Reasoning Trace Compression for Efficient Knowledge Distillation

arXiv:2606.05988v1 Announce Type: new Abstract: Reasoning models produce long chain-of-thought traces that are costly to distill and encourage verbose student outputs. We study post-hoc compression of such traces before knowledge distillation.

arXiv CS 5d ago

Aryabhata 2: Scaling Reinforcement Learning for Advanced STEM Reasoning

arXiv:2605.28829v2 Announce Type: replace Abstract: Competitive STEM examinations such as JEE and NEET require multi-step symbolic reasoning, precise numerical computation, and deep conceptual understanding across physics, chemistry, and mathematics. Recent large language models perform strongly on common reasoning benchmarks, yet they remain difficult to deploy at scale, where millions of student doubts demand domain-specific, consistently structured problem solving. We introduce Aryabhata...

arXiv CS 6d ago

How the hell is Groq raising more money?

How the hell is Groq raising more money? Somehow, Palpatine returned. Axios just dropped a bizarre scoop.

Hacker News 8d ago

Steering LLM Viewpoints through Fabricated Evidence Injection

arXiv:2606.06244v1 Announce Type: new Abstract: As chatbots increasingly influence daily decision-making, their potential to produce misleading responses poses substantial risks to users. This paper investigates a critical cognitive vulnerability in LLMs: their tendency to uncritically trust external context when presented with fabricated evidence bearing markers of credibility. We introduce Ghostwriter, a two-phase attack framework that first repackages misleading statements with fabricated...

arXiv CS 5d ago

Supervision versus Demonstration-Based In-Context Learning for Multiword Expression Classification

arXiv:2606.07479v1 Announce Type: new Abstract: Turkish idiomatic light verb constructions (LVCs) are challenging for multiword expression processing because they often share the same surface form as fully literal verb-object combinations while functioning as a single, partially idiomatic predicate. We frame Turkish LVC detection as a binary classification task (literal meaning vs. idiomatic meaning) and evaluate on a manually created controlled set (N=147) with matched negatives:...

arXiv CS 2d ago

Argonne flexes spare supercompute to build private AI inference service

The Register 14d ago