Sampling¶

Spatial subunit sampling used to partition large or 3D samples into cubes for subunit-level statistics.

class phenocoder.sampling.SpatialSubunitSampler(adata, dim_subunit, min_obs, spatial_key, verbose=False)[source]¶

Bases: object

Parameters:

adata (ad.AnnData)
dim_subunit (tuple[int])
min_obs (int)
spatial_key (str)
verbose (bool)

partition()[source]¶

Partition the observations into a uniform grid of cubic spatial subunits.

Divides the bounding box of the spatial coordinates (adata.obsm[spatial_key]) into cubes of edge length dim_subunit and assigns each observation to the cube it falls in. The result is stored on self.subunits as a dict keyed by the integer grid index of each cube, where each value holds the member observation indices, their spatial coordinates, a bounding box and an integer subunit id.

Returns:: None
Return type:: None

filter()[source]¶

Filter subunits based on minimum number of observations.

Drops any subunit with fewer than self.min_obs observations (the threshold set at construction).

Returns:: None
Return type:: None

sample(max_obs)[source]¶

Sample observations within each subunit based on max_obs threshold.

Randomly subsamples observations in subunits that exceed the max_obs threshold. Subunits with fewer observations than max_obs are left unchanged.

Parameters:: max_obs (int) – Maximum number of observations per subunit. Subunits exceeding this threshold will be randomly subsampled to this size.
Returns:: None
Return type:: None

to_df()[source]¶

Build a per-observation table mapping each object to its spatial subunit.

Flattens self.subunits into one row per observation, with the subunit assignment as a column and the observation index as the (string) index.

Returns:

One row per observation, with a subunit_id column and: the observation index (obs_index) as the DataFrame index.

Return type:

pd.DataFrame