Sampling

Spatial subunit sampling used to partition large or 3D samples into cubes for subunit-level statistics.

class phenocoder.sampling.SpatialSubunitSampler(adata, dim_subunit, min_obs, spatial_key, verbose=False)[source]

Bases: object

Parameters:
  • adata (ad.AnnData)

  • dim_subunit (tuple[int])

  • min_obs (int)

  • spatial_key (str)

  • verbose (bool)

partition()[source]

Partition the observations into a uniform grid of cubic spatial subunits.

Divides the bounding box of the spatial coordinates (adata.obsm[spatial_key]) into cubes of edge length dim_subunit and assigns each observation to the cube it falls in. The result is stored on self.subunits as a dict keyed by the integer grid index of each cube, where each value holds the member observation indices, their spatial coordinates, a bounding box and an integer subunit id.

Returns:

None

Return type:

None

filter()[source]

Filter subunits based on minimum number of observations.

Drops any subunit with fewer than self.min_obs observations (the threshold set at construction).

Returns:

None

Return type:

None

sample(max_obs)[source]

Sample observations within each subunit based on max_obs threshold.

Randomly subsamples observations in subunits that exceed the max_obs threshold. Subunits with fewer observations than max_obs are left unchanged.

Parameters:

max_obs (int) – Maximum number of observations per subunit. Subunits exceeding this threshold will be randomly subsampled to this size.

Returns:

None

Return type:

None

to_df()[source]

Build a per-observation table mapping each object to its spatial subunit.

Flattens self.subunits into one row per observation, with the subunit assignment as a column and the observation index as the (string) index.

Returns:

One row per observation, with a subunit_id column and

the observation index (obs_index) as the DataFrame index.

Return type:

pd.DataFrame