Home Knowledge Base k$-mer

k$-mer

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

KDM: embedding DNA/RNA motifs and sequences in a shared k-mer space for unified discovery, analysis and binding prediction

Motif discovery and binding-site prediction in DNA and RNA sequences are central tasks in regulatory genomics, yet the methodological landscape is split between interpretable but rigid position weight matrices (PWMs) and high-performing but opaque machine-learning models. We present KDM, a unifying framework in which both motifs and sequences are represented as probability distributions over a shared k-mer dictionary, embedded via the Hellinger transformation. This common geometry enables...

bioRxiv 3d ago

The anti-lexicographic SUS-anchor: a near-optimal k=1 sampling scheme

Announce Type: new Abstract: In recent years, there has been a renewed interest in the search for low density minimizer schemes. These schemes take a window of $w$ consecutive $k$-mers, and sample one of them: the smallest under some specific order. Schemes such as the mod-minimizer provide a low density (fraction of sampled $k$-mers) when $k \gg w$, while schemes such as the greedy minimizer work well for explicit small parameters roughly in the regime $k \leq 2w$, for $k$ and $w$ up to...

arXiv CS 8d ago

The anti-lexicographic SUS-anchor: a near-optimal k=1 sampling scheme

arXiv:2606.01190v2 Announce Type: replace Abstract: In recent years, there has been a renewed interest in the search for low density minimizer schemes. These schemes take a window of $w$ consecutive $k$-mers, and sample one of them: the smallest under some specific order. Schemes such as the mod-minimizer provide a low density (fraction of sampled $k$-mers) when $k \gg w$, while schemes such as the greedy minimizer work well for explicit small parameters roughly in the regime $k \leq 2w$,...

arXiv CS 7d ago

$p$-adic Bi-Filtrations for Topological Machine Learning on Genomic Sequences

arXiv:2606.06117v1 Announce Type: cross Abstract: We introduce pVR, a topological machine learning framework for alignment-free genomic sequence classification that combines $p$-adic numbers with topological data analysis. Each DNA sequence is encoded along two complementary axes: a $p$-adic distance on $k$-mer prefixes, which captures hierarchical positional structure, and a compositional $L_1$ distance on $k$-mer frequencies, which captures local sequence content. The two distances jointly...

arXiv CS 5d ago

LDARNet: DNA Adaptive Representation Network with Learnable Tokenization for Genomic Modeling

Announce Type: new Abstract: Genomic foundation models increasingly adopt large language model architectures, yet almost universally rely on fixed tokenization schemes such as $k$-mers, BPE, or single nucleotides, which impose arbitrary sequence boundaries that may obscure biologically relevant structure. We present LDARNet, a 120M-parameter hierarchical genomic foundation model that adapts H-Net-style dynamic chunking from autoregressive generation to masked language modeling, combining...

arXiv CS 6d ago