Health
Multivariate integration of histological images and gene expression data: a comparative review
Key Points
Integrating histological images with gene expression data offers a promising approach for linking tissue morphologies to molecular signatures and improving disease subtyping. However, such integration remains challenging due to the high dimensionality of these datasets, cross-modal heterogeneity, and limited interpretability. Multivariate methods such as Sparse Canonical Correlation Analysis (Sparse CCA), Joint Nonnegative Matrix Factorisation (Joint NMF), and Angle-based Joint and...
Integrating histological images with gene expression data offers a promising approach for linking tissue morphologies to molecular signatures and improving disease subtyping. However, such integration remains challenging due to the high dimensionality of these datasets, cross-modal heterogeneity, and limited interpretability. Multivariate methods such as Sparse Canonical Correlation Analysis (Sparse CCA), Joint Nonnegative Matrix Factorisation (Joint NMF), and Angle-based Joint and Individual Variation Explained (AJIVE), have been used to address these challenges by reducing dimensionality while identifying features associated with latent factors, thereby enhancing biological interpretability. Despite increasing application in imaging-omics research, systematic comparisons of their methodological properties remain limited. Consequently, users often lack guidance on how to appropriately select these methods in practice, and these approaches are frequently treated as interchangeable despite differing modelling assumptions. Here, we use paired H&E images and gene expression data from breast cancer as a representative case study to examine the methodological characteristics, interpretability, and complementary properties of these integration approaches. Our results show that each method captures distinct yet complementary aspects of the underlying information. Although the biological findings are derived from the TCGA-BRCA datasets, the methodological insights identified here extend more broadly to imaging-omics integration studies. Overall, this comparative review highlights the strengths and limitations of each approach and outlines considerations for future methodological development.