Science
Imputed graph-genotyped structural variants identify regulatory haplotypes associated with gene expression in Atlantic salmon
Key Points
Structural variants (SVs) can affect gene regulation, but they are difficult to include in expression genetic studies when large RNA-seq cohorts lack whole-genome sequencing. This is common in non-human and non-model systems, where whole-genome sequencing at population scale remains costly. As a result, expression quantitative trait locus (eQTL) studies often rely on single nucleotide polymorphism (SNP) markers.
Structural variants (SVs) can affect gene regulation, but they are difficult to include in expression genetic studies when large RNA-seq cohorts lack whole-genome sequencing. This is common in non-human and non-model systems, where whole-genome sequencing at population scale remains costly. As a result, expression quantitative trait locus (eQTL) studies often rely on single nucleotide polymorphism (SNP) markers. These analyses can identify expression-associated regions, but often provide limited biological interpretation of the underlying regulatory mechanisms. Here, we used Atlantic salmon as a study system to test whether graph-genotyped SVs can be imputed into a SNP-array-genotyped RNA-seq cohort and used to interpret regulatory haplotypes. SVs were discovered from two long-read-sequenced individuals, supplemented with short-read SV and SNP calls from a 112-individual whole-genome-sequenced reference panel, graph-genotyped, jointly phased with SNPs, and imputed into 906 offspring with gill RNA-seq and SNP-array genotypes. After size filtering, the imputed SV catalogue contained 100,269 variants and showed nonuniform genomic distributions associated with sex-specific recombination landscapes. Association testing identified 51 SV-eQTL candidates, including 35 cis and 16 trans associations. These candidates were enriched for short-read-derived variants, indicating that short-read supplementation can recover regulatory variants missed by small-scale long-read discovery. SV-eQTL candidates were more strongly tagged by nearby SNPs than non-associated variants generally, but individual SNP lead markers often failed to capture the same eQTL signals in conditional regression. Retained candidates after the conditional analysis included target-gene-overlapping deletions, nearby local variants without target-gene overlap, trans associations, and short insertions with opposite effects on gene expression. These results show that imputed graph-genotyped SVs can add biological interpretation to possible regulatory haplotypes.