Home › Knowledge Base › FineWeb

FineWeb

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Attention Calibration for Position-Fair Dense Information Retrieval

arXiv:2606.02737v1 Announce Type: new Abstract: Dense retrieval models exhibit positional bias: retrieval effectiveness degrades when relevant information appears later in a passage (Zeng et al., 2025). We ask whether this bias can be reduced at inference time, without retraining and without sacrificing overall retrieval effectiveness. To this end, we adapt inference-time attention calibration (Schuhmacher et al., 2026) to downstream retrieval and extend it with a strength coefficient lambda...

arXiv CS 7d ago

Child-directed speech facilitates production, not comprehension, in BabyLMs

arXiv:2606.01045v1 Announce Type: new Abstract: Recent studies suggest that child-directed speech is not conducive to language learning in BabyLMs. However, current evaluations focus predominantly on comprehension and not production, which is central to usage-based theories of language acquisition which argue how CDS facilitates early language use through constructional ''frames'' (frequent lexical patterns with open slots). We introduce a novel generation-based evaluation inspired by such...

arXiv CS 8d ago

q0: Primitives for Hyper-Epoch Pretraining

arXiv:2606.03938v2 Announce Type: replace Abstract: Multi-epoch training is becoming the standard now that compute is growing faster than the supply of high-quality text. But pretraining a single model saturates within a few passes, long before the compute budget is exhausted. We argue this calls for a conceptual shift from training a single model toward exploring a population of models and aggregating their predictions.

arXiv CS 6d ago

Model Parallelism With Subnetwork Data Parallelism

Announce Type: replace Abstract: Pre-training large neural networks at scale imposes heavy memory demands on accelerators and often requires costly communication. We introduce Subnetwork Data Parallelism (SDP), a distributed training framework that partitions a model into structured subnetworks trained across workers without exchanging activations. We study two complementary masking regimes: backward masking, which applies sparsity only in the backward step to retain unbiased gradients, and...

arXiv CS 8d ago

WebKnoGraph: GNN-Powered Internal Linking

arXiv:2606.06106v1 Announce Type: new Abstract: Internal link optimization is a recurring task in search engine optimization, yet many production workflows rely on manual judgment, fixed page templates, or generic tool recommendations. Practitioners need ways to evaluate candidate links before deployment because link changes can redistribute authority and affect semantic coherence in ways that are difficult to isolate after release. We present WebKnoGraph, an open-source framework for...

arXiv CS 5d ago

q0: Primitives for Hyper-Epoch Pretraining

arXiv:2606.03938v1 Announce Type: new Abstract: Multi-epoch training is becoming the standard now that compute is growing faster than the supply of high-quality text. But pretraining a single model saturates within a few passes, long before the compute budget is exhausted. We argue this calls for a conceptual shift from training a single model toward exploring a population of models and aggregating their predictions.

arXiv CS 7d ago