The Long-Term Effects of Data Selection
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
The Long-Term Effects of Data Selection in LLM Fine-Tuning
arXiv:2605.30537v1 Announce Type: new Abstract: Data selection is increasingly used to reduce the cost of large language model (LLM) fine-tuning, with recent methods prioritizing samples by current utility, diversity, quality, or influence. This paper studies a different question: when fine-tuning occurs over multiple stages, can selection strategies that look optimal now make the model less adaptable later? We introduce a long-horizon view of LLM data selection in which a selector is...
BLISS: A Lightweight Bilevel Influence Scoring Method for Data Selection in Language Model Pretraining
Announce Type: replace Abstract: Effective data selection is essential for pretraining large language models (LLMs), enhancing efficiency and improving generalization to downstream tasks. However, existing approaches often require leveraging external pretrained models, making it difficult to disentangle the effects of data selection from those of the external pretrained models. In addition, they often overlook the long-term impact of selected data if the model is trained to convergence,...
Photooxidative ageing of 3D printed polymers PLA, ABS, PET, HIPS and PC induced by long-term UV radiation
arXiv:2606.00068v1 Announce Type: new Abstract: This article focuses on the influence of long-term UV radiation exposure on mechanical and structural properties of selected polymeric materials (PLA, ABS, PC, PETG, HIPS) prepared using 3D-print based Fused Filament Fabrication (FFF) method. Existing research in the field of polymers weathering has been focused more on the combined effects so far, moreover on time scales not exceeding units of months. However, it is important to separate...
A prognostic human brain network for diffuse midline glioma
Abstract Diffuse midline gliomas (DMGs) are near-universally lethal tumours of the childhood central nervous system1,2. In animal models, DMGs form brain-wide integrated networks through neuron-to-glioma synapses3,4,5,6 and glioma-to-glioma gap junctional coupling3. This extensive connectivity robustly promotes the growth and invasion of DMG3,4,5,6,7,8,9 and other glial malignancies10,11,12 through paracrine mechanisms and direct neuron-to-glioma synapses.
Deep learning four decades of human migration
Abstract Human migration is a fundamental driver of global demographic change, shaping population structure, labour markets and social policy across countries1,2,3. Although long-term migration patterns are often linked to economic development4, they can shift rapidly in response to shocks such as conflict, environmental crises and political change5. Despite its importance, migration remains difficult to measure consistently: existing data are sparse, concentrated in high-income settings and...
Amplified Arctic iceberg traffic reshapes benthic biodiversity
Abstract The Arctic is undergoing rapid warming, resulting in retreating sea ice and glaciers1, yet how cryospheric changes propagate into the deep ocean remains poorly understood2. Here we identify a climate-driven mechanism linking accelerating glacier disintegration to an increase in deep-sea hard-bottom habitats far beyond calving fronts. Seafloor observations in Fram Strait show a localized increase in the density and patchiness of dropstones delivered by debris-laden icebergs.
VikingMem: A Memory Base Management System for Stateful LLM-based Applications
arXiv:2605.29640v2 Announce Type: replace Abstract: Large Language Models have revolutionized interactive applications; however, their finite context windows pose a critical data management challenge for maintaining stateful, long-term interactions. Existing memory approaches often rely on simplistic extraction methods that lead to incomplete memories or use rigid, single-purpose memory extraction prompts tailored to a single use case, such as chatbots. Consequently, they lack...
ITR filing FY 2025-26: Top 10 points to check before submitting tax return
It is that time of the year again, when focus shifts to Income Tax Return(‘ITR’) filing. The tax filing deadline for the Financial Year (‘FY’) 2025–26 is approaching and taxpayers have commenced to gather documents like Form No. 16, other tax deduction statements, investment proofs, bank interest statements, capital gains reports, and other necessary financial information for preparing and filing their tax returns. Besides compliance, ITR also serves as a vital document frequently reviewed...
What Trump's Green Card changes mean for millions of Indians seeking permanent residency in US
"An alien who is in the US temporarily and wants a Green Card" must return to their home country to apply, except in extraordinary circumstances, the US Citizenship and Immigration Services (USCIS) announced last week. Though the word "alien" in the announcement may have sounded routine in immigration law, but for hundreds of thousands of immigrants waiting for permanent residency, the message triggered immediate panic. For decades, one of the biggest attractions of the US immigration system...
Human-Like Neural Nets by Catapulting
Human-like Neural Nets by Catapulting Speculative proposal to create artificial neural nets with human-like performance by high-learning-rate/regularization training of overparameterized NNs to trigger catapulting/grokking. Over-parameterization as a route to true generalization would resolve many outstanding mysteries of artificial versus natural intelligence. There are many mysteries about deep learning and human intelligence, but we could describe the biggest anomaly this way: why are...