Home Knowledge Base Survival Reinforcement Learning

Survival Reinforcement Learning

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

SVL: Goal-Conditioned Reinforcement Learning as Survival Learning

arXiv:2604.17551v2 Announce Type: replace Abstract: Standard approaches to goal-conditioned reinforcement learning (GCRL) that rely on temporal-difference learning can be unstable and sample-inefficient due to bootstrapping. While recent work has explored contrastive and supervised formulations to improve stability, we present a probabilistic alternative, called survival value learning (SVL), that reframes GCRL as a survival learning problem by modeling the time-to-goal from each state as a...

arXiv CS 9d ago

Survival Reinforcement Learning: Toward Scalable Self-Supervised RL

arXiv:2605.31273v1 Announce Type: new Abstract: While self-supervised Contrastive Reinforcement Learning (CRL) has shown remarkable depth-scaling capabilities, successfully using networks over 64 layers, scaled CRL still struggles with long-horizon goal-conditioned planning due to the uniformity-tolerance dilemma inherent in contrastive losses. We introduce Survival Reinforcement Learning (SRL), an online classification-based alternative that extends the survival value learning framework by...

arXiv CS 9d ago

Belief-Space Quantum-Inspired Reinforcement Learning for Partially Observable Autonomous Cyber Defense in the Internet of Vehicles

arXiv:2606.07796v1 Announce Type: new Abstract: The Internet of Vehicles (IoV) faces a dynamic, adversarial security environment where attackers adapt to defenses. Existing intrusion detection systems rely on static classifiers that fail to capture sequential decision-making, attacker adaptation, and uncertainty. We formulate IoV security as a sequential attacker-defender interaction and model defense as a reinforcement learning problem under partial observability.

arXiv CS 1d ago

Learning to See via Epiretinal Implant Stimulation in silico with Model-Based Deep Reinforcement Learning

arXiv:2606.03118v1 Announce Type: new Abstract: Objective: Diseases such as age-related macular degeneration and retinitis pigmentosa cause the degradation of the photoreceptor layer. One approach to restore vision is to electrically stimulate the surviving retinal ganglion cells with a microelectrode array such as epiretinal implants.

arXiv CS 7d ago

SecRL-Prune: Structured Reinforcement Learning-Based Pruning of CodeLLMs for Preserving Adversarial Code Mutation

Announce Type: new Abstract: Large code language models (CodeLLMs) can generate and rewrite programs, enabling functionality-preserving code mutation that may be used to create diverse malware variants and evade signature-based detection. A key security question is whether this mutation capability survives model compression, which would make deployment feasible under limited hardware budgets. We propose SecRL-Prune, a structured pruning framework for CodeLLMs that operates on feed-forward...

arXiv CS 5d ago

Human-Like Neural Nets by Catapulting

Human-like Neural Nets by Catapulting Speculative proposal to create artificial neural nets with human-like performance by high-learning-rate/regularization training of overparameterized NNs to trigger catapulting/grokking. Over-parameterization as a route to true generalization would resolve many outstanding mysteries of artificial versus natural intelligence. There are many mysteries about deep learning and human intelligence, but we could describe the biggest anomaly this way: why are...

Hacker News 3d ago

VentAgent: When LLMs Learn to Breathe -- Multi-Objective Arbitration for ARDS Ventilation

Announce Type: new Abstract: Mechanical ventilation for Acute Respiratory Distress Syndrome (ARDS) requires balancing competing physiological goals, including oxygenation, lung protection, and acid-base homeostasis. However, current data-driven methods, especially those imitating retrospective Electronic Health Records (EHR), often suffer from imitation bias. They may capture superficial correlations from inconsistent clinical demonstrations, such as associating passive ventilator settings...

arXiv CS 6d ago

OncoReason: Structuring Clinical Reasoning in LLMs for Robust and Interpretable Survival Prediction

Announce Type: replace Abstract: Predicting cancer treatment outcomes requires models that are both accurate and interpretable, particularly in the presence of heterogeneous clinical data. While large language models (LLMs) have shown strong performance in biomedical NLP, they often lack structured reasoning capabilities critical for high-stakes decision support. We present a unified, multi-task learning framework that aligns autoregressive LLMs with clinical reasoning for outcome prediction...

arXiv CS 8d ago

Ev-Trust: An Evolutionarily Stable Trust Mechanism for Decentralized LLM-Based Multi-Agent Service Economies

arXiv:2512.16167v3 Announce Type: replace Abstract: Decentralized LLM-based multi-agent service economies face three vulnerabilities that undermine traditional trust mechanisms: reduced cost of fraud, difficulty in evaluating service quality, and instability of service content. These compounding vulnerabilities can trigger population-level trust collapse and the proliferation of short-sighted strategies. We propose Ev-Trust, an evolutionarily stable trust mechanism that addresses these...

arXiv CS 8d ago

Supporting complex care with confidence and compassion

Supporting complex care with confidence and compassion After years of working with children and youths with intellectual disabilities, senior psychologist Melody Tan Hui Shan sought deeper clinical training through NIE’s Master of Arts (Applied Psychology) to deliver more tailored care. For close to 15 years, Ms Melody Tan Hui Shan worked with young people aged seven to 18 years old at MINDS Fernvale Gardens School, providing interventions, caregiver guidance and school-based services for...

Channel News Asia 21h ago