Phenocoder¶

A machine-learning framework that combines conditional variational autoencoders with spatial graph analysis to learn unsupervised phenotypic embeddings of complex tissue architectures from microscopy images.

Phenocoder learns compressed morphological representations of cells/nuclei directly from microscopy images using (conditional) convolutional variational autoencoders, then analyzes how those representations are organized in space. It is built around the SpatialData ecosystem: images, segmentation labels and per-object tables live in a single SpatialData object, and the Phenocoder class drives the full workflow on top of it.

The workflow at a glance¶

generate_dataset() — extract image patches centered on each segmented object and write them (plus per-channel intensity statistics) to disk.
initialize_model() — build a CVAE or conditional CondCVAE and the train/validation data generators.
train() — fit the model with early stopping, learning-rate scheduling and TensorBoard logging.
encode() — embed every object into the learned latent space, optionally smoothing the latents over each object’s spatial neighborhood (message passing).
spatialgraph_stats() — compute spatial neighborhood-graph statistics per sample (or per spatial subunit) from clustered latents.
spatialgraph_embedding() — embed the per-sample/per-subunit statistics (PCA + UMAP, with optional batch correction) for sample-level comparison.

Features¶

Convolutional VAE (CVAE): encode multi-channel image patches into a compact latent space.
Conditional VAE (CondCVAE): condition the encoder/decoder on one or more metadata columns (e.g. dataset, z-slice, donor) via one-hot encoding.
SpatialData-native: works directly on SpatialData images, labels and tables.
Flexible patch extraction: configurable patch size, 2D or per-z-slice 3D sampling, and global or per-sample intensity normalization.
Spatial message passing: aggregate latents over a physical-distance neighborhood graph.
Spatial graph analysis: interaction matrices, Moran’s I, centrality, connectivity and convex-hull statistics at sample or subunit resolution.
Beta-VAE support: tune the KL weight (beta) for more disentangled representations.
Built on Keras 3 with the TensorFlow backend.

Reference

API reference