Data generation¶
Patch extraction and the Keras data generators / dataset loaders used for training and encoding.
- class phenocoder.generator.PatchGenerator(sdata, image_key, spatial_key, table_key, sample_key, scale, patch_size=(128, 128), metadata_keys=None, scale_percentile=1, scale_per_sample=True)[source]¶
Bases:
objectGenerator for image patches and image patch datasets from spatial data.
This class handles the extraction of image patches and statistics from spatial data objects, primarily for use in deep learning workflows.
- Parameters:
- init_patches()[source]¶
Initialize patch positions from spatial coordinates.
Extracts spatial coordinates from the data, filters positions that would result in patches extending beyond image boundaries, and assigns batch IDs.
- Return type:
None
- extract_patch(img, id)[source]¶
Extract a patch from an image centered on specified coordinates.
- Parameters:
img (ndarray) – Input image array
id (int) – Batch ID corresponding to patch position
- Returns:
Extracted image patch
- Return type:
ndarray
- get_scaling_percentiles()[source]¶
Extract and set scaling percentiles from computed statistics.
Aggregates the per-slice
percentile_low/percentile_highvalues indf_statsinto a conservative range – minimum of lows (darkest) and maximum of highs (brightest) – used to normalize patches inextract_patch. The grouping depends onscale_per_sample:scale_per_sample=True(default): aggregate per (sample, channel), so each sample is scaled to its own intensity range. Stored insample_percentiles_low/sample_percentiles_highkeyed by sample;select_patchesactivates the right one per sample.scale_per_sample=False: aggregate per channel across all samples/slices (the original global behaviour). Stored directly inpercentiles_low/percentiles_high.
- Raises:
ValueError – If statistics have not been computed yet (df_stats is None or empty)
- Return type:
None
- generate_dataset(dataset, dir_output, n_samples=None, n_patches=None)[source]¶
Generate complete dataset with patches and statistics.
- Parameters:
dataset (str) – Name/identifier for the dataset being generated
dir_output (str) – Directory path for storing the generated dataset
n_samples (int, optional) – Number of samples to randomly select for processing. If None, processes all samples.
n_patches (int, optional) – Number of patches to randomly sample from all available patches. If None, uses all patches.
- Return type:
None
- class phenocoder.generator.SequenceGenerator(*args, **kwargs)[source]¶
Bases:
SequenceKeras Sequence generator for loading image patches from disk during training.
This generator loads patches from disk and applies optional data augmentation and normalization for training deep learning models.
- Parameters:
- class phenocoder.generator.DatasetLoader(datasets, dir_datasets, sample_key)[source]¶
Bases:
objectUtility class for merging multiple datasets and their statistics.
This class combines statistics from multiple dataset directories and provides unified access to files and scaling parameters.
- load_datasets()[source]¶
Loads and merge statistics from all specified datasets.
Combines stats.csv files from each dataset directory and creates unified dataframes with file paths.
- Return type:
None
- set_train_val_split(batch_size=64, split=0.8)[source]¶
Assign each patch to a train or validation split.
Splits are made at the sample level (grouped by
sample_keyanddataset) so all patches of a sample land in the same split, then each split is truncated to a whole number of batches. Addssplitandfile_pathcolumns toself.patches.
- get_generators(conditions, batch_size=64, dim=(128, 128), n_channels=4, shuffle=True, flip=False, n_workers=1)[source]¶
Build the training and validation Keras Sequence generators.
Requires
set_train_val_splitto have been called (patches must havesplitandfile_pathcolumns).- Parameters:
conditions (list of str) – obs/patch columns to one-hot encode and feed as conditions. If empty, plain (non-conditional) generators are returned
batch_size (int) – Number of patches per batch. Defaults to 64.
dim (tuple) – Spatial (height, width) of patches. Defaults to (128, 128).
n_channels (int) – Number of image channels. Defaults to 4.
shuffle (bool) – Whether to shuffle patch order each epoch. Defaults to True.
n_workers (int) – Number of worker processes for the Keras Sequence. Defaults to 1.
flip (bool)
- Returns:
(train_generator, val_generator, one_hot_encoder)ifconditionsis non-empty,otherwise
(train_generator, val_generator)
- Return type: