Sampling¶
Spatial subunit sampling used to partition large or 3D samples into cubes for subunit-level statistics.
- class phenocoder.sampling.SpatialSubunitSampler(adata, dim_subunit, min_obs, spatial_key, verbose=False)[source]¶
Bases:
object- Parameters:
- partition()[source]¶
Partition the observations into a uniform grid of cubic spatial subunits.
Divides the bounding box of the spatial coordinates (
adata.obsm[spatial_key]) into cubes of edge lengthdim_subunitand assigns each observation to the cube it falls in. The result is stored onself.subunitsas a dict keyed by the integer grid index of each cube, where each value holds the member observation indices, their spatial coordinates, a bounding box and an integer subunit id.- Returns:
None
- Return type:
None
- filter()[source]¶
Filter subunits based on minimum number of observations.
Drops any subunit with fewer than
self.min_obsobservations (the threshold set at construction).- Returns:
None
- Return type:
None
- sample(max_obs)[source]¶
Sample observations within each subunit based on max_obs threshold.
Randomly subsamples observations in subunits that exceed the max_obs threshold. Subunits with fewer observations than max_obs are left unchanged.
- Parameters:
max_obs (int) – Maximum number of observations per subunit. Subunits exceeding this threshold will be randomly subsampled to this size.
- Returns:
None
- Return type:
None
- to_df()[source]¶
Build a per-observation table mapping each object to its spatial subunit.
Flattens
self.subunitsinto one row per observation, with the subunit assignment as a column and the observation index as the (string) index.- Returns:
- One row per observation, with a
subunit_idcolumn and the observation index (
obs_index) as the DataFrame index.
- One row per observation, with a
- Return type:
pd.DataFrame