[Paper Note] Multi-omics Integration

Paper Title here (e.g., “Deep multi-omics integration for cancer subtyping”)
Authors First Author et al.
Venue / Year NeurIPS 2024 · or · Nature 2023
Link Paper · Code
Topic multi-omics integration, representation learning, cancer, pathology
Views:

1. Overview

This note summarizes a representative paper on multi-omics integration for cancer, where histology images are combined with bulk RNA-seq and other molecular profiles. The core idea is to learn a shared latent space that captures complementary information from each modality while controlling for batch effects and clinical confounders.

I focus less on every implementation detail and more on: (i) how they formalize the integration problem, (ii) how they design the objective, and (iii) what lessons might transfer to computational pathology + spatial / single-cell data.

Key idea (my own words)
Learn a shared latent representation z that aligns image and multi-omics views, is predictive of clinical outcomes, and disentangles biological signal from batch / technical noise.

2. Data & Modalities

2.1 Cohort

The study uses a cohort of several hundred cancer patients, each with:

  • FFPE or frozen H&E whole-slide images (WSIs)
  • Bulk RNA-seq (TPM / counts) and basic clinical variables
  • Optional: copy number profiles or mutation data (used in a subset of analyses)

2.2 Preprocessing

WSIs are tiled into patches and fed into a pretrained histology encoder (e.g., a ResNet or a pathology foundation model). Patch features are aggregated into slide-level representations (attention pooling / simple mean pooling).

For RNA, the authors use log-transformed expression of selected genes (either highly variable genes or a curated panel). All omics features are z-score normalized across samples.

Cohort and data modalities overview
Sketch of the cohort and modalities (WSI, bulk RNA, clinical). — my own redraw of the figure.

3. Method & Objective

3.1 Latent space

The method learns a shared latent vector z for each patient. Two encoders map image features and omics features into this common space:

  • f_img(x_img) → z
  • f_omics(x_omics) → z

The goal is that z captures biology that is consistent across modalities, while also being informative for downstream tasks such as subtype classification or prognosis.

My simplified formulation
Learn encoders f_img, f_omics and task head g s.t. z_img = f_img(x_img), z_omics = f_omics(x_omics) and we encourage z_img ≈ z_omics, while g(z) predicts labels y (subtype / survival).

3.2 Loss design

The training objective typically includes:

  • Alignment loss (e.g., contrastive loss, cosine similarity, or CCA-style loss) to bring image- and omics-derived embeddings of the same patient close in the latent space.
  • Reconstruction or prediction loss, e.g. predicting gene expression from image embedding or vice versa, to encourage cross-modal predictability.
  • Task-specific loss (e.g., cross-entropy for subtype labels, Cox loss for survival) so that the latent space is clinically meaningful.
  • Optional regularization terms to control batch effects or known confounders (age, site, technical batch).

Conceptually, this is close to a supervised or semi-supervised multi-view representation learning framework.

Model Alignment loss Task loss Notes
Image-only CE / Cox Baseline WSI model
Omics-only CE / Cox Baseline transcriptomics model
Joint (paper) Contrastive CE + Cox Multi-omics integrated latent space

4. Key Results

  • The integrated latent representations outperform unimodal baselines (image-only or RNA-only) on classification of molecular subtypes.
  • Survival models built on the integrated space show improved risk stratification compared to clinical covariates alone.
  • When visualizing the latent space (t-SNE / UMAP), clusters often align with both morphology patterns and expression-defined subgroups.
  • Cross-modal prediction (e.g., predicting RNA from image embeddings) is not perfect but recovers major axes such as immune vs. stromal vs. tumor signals.
Latent space visualization (UMAP)
Latent space (UMAP) colored by subtype. — from the paper or my own re-plot.

5. My Notes & Takeaways

  • The framework is flexible: in principle, additional modalities (ATAC, methylation, spatial transcriptomics) could be added as more encoders.
  • The choice of loss terms is crucial. A stronger alignment loss can oversmooth real modality-specific signals; a weak one can fail to remove technical noise.
  • For pathology, patch-level signals are heterogeneous. Slide-level aggregation might hide local patterns that are important for prognosis. This suggests combining multi-omics integration with multiple instance learning or region-of-interest modeling.
  • It is still unclear how much of the survival gain comes from “true biology” vs. better regularization and feature compression.
Connection to my work
Good reference for: (1) how to combine latent alignment + task loss; (2) how to design the evaluation protocol (subtype + survival); (3) potential extension to spatial / single-cell by replacing bulk encoder with graph / cell-level encoders.

6. Open Questions for My Future Work

  1. How to extend this framework to spatial and single-cell data, where each slide contains many cells/spots with partially matched omics?
  2. Can we design a latent space that explicitly separates shared vs. modality-specific components (e.g., via structured VAEs or flow matching)?
  3. For foundation models in pathology, is it better to: (a) first pretrain a strong image-only encoder, then align with omics; or (b) train a multimodal foundation model from scratch?
  4. How to evaluate whether the learned space truly captures causal biology, instead of correlational patterns tied to the cohort?

I would like to revisit this paper when designing my own framework for multi-omics + histology integration, especially the loss design and the way they handle confounders.