Reference

`experiment.py`

class isogroup.base.experiment.Experiment(dataset: DataFrame, tracer: str, ppm_tol: float, rt_tol: float, max_atoms: int | None = None, database: DataFrame | None = None)[source]

Bases: object

Represents a mass spectrometry experiment with experimental features.

initialize_experimental_features()[source]: Initialize Feature objects from the dataset and organize them by sample. Each feature is created with its retention time, m/z, tracer, intensity, and sample name.

property ppm_tol: float: Returns the m/z tolerance (in ppm) used for feature annotation.

property rt_tol: float: Returns the retention time tolerance used for feature annotation.

property tracer: str: Returns the tracer used for the experiment.

property tracer_element: str: Returns the tracer element used in the experiment.

property tracer_idx: int: Returns the tracer index used in the experiment.

`targeted_experiment.py`

class isogroup.base.targeted_experiment.TargetedExperiment(dataset: DataFrame, tracer: str, ppm_tol: float, rt_tol: float, database: DataFrame)[source]

Bases: Experiment

Represents a targeted mass spectrometry experiment. Used to group and annotate detected features from an experimental dataset using a reference database with isotopic tracer information.

annotate_features()[source]: Annotate experimental features by matching them with the database features within specified m/z and retention time tolerances.

clusterize()[source]: Group features by metabolite names within each sample and assign a unique cluster ID to each group. Populates self.clusters as a dictionary of the form: {sample_name: {cluster_id: Cluster object}}

create_clusters_df()[source]: Create and store a dataframe containing all clusters.

create_features_df()[source]: Create and store a dataframe containing all features.

get_clusters_from_name(name, sample_name: str)[source]

Get a cluster from the experiment by its name, in a given sample if provided

Parameters:

name – Name of the cluster to retrieve
sample_name – Name of the sample to retrieve the cluster from

Returns:

Cluster object if found, None otherwise

get_features_from_name(name: str, sample_name: str)[source]

Retrieve all features in a given sample that are annotated with a specific metabolite name.

Parameters:

name – Name of the metabolite for which to retrieve features
sample_name – Name of the sample from which to retrieve features

Returns:

List of Feature objects that match the metabolite name in the specified sample

run_targeted_pipeline()[source]

Run the full targeted annotation pipeline for the experiment.

This includes: - Initializing Feature objects from the dataset. - Matching experimental features to the database within specified tolerances. - Clustering features by metabolite names.

`untargeted_experiment.py`

class isogroup.base.untargeted_experiment.UntargetedExperiment(dataset: DataFrame, tracer: str, ppm_tol: float, rt_tol: float, max_atoms: int | None = None, keep: str | None = None)[source]

Bases: Experiment

Represents an untargeted mass spectrometry experiment. An untargeted experiment involves grouping features into potential isotopologue clusters based on retention time proximity and m/z differences.

build_clusters(rt_tol: float, ppm_tol: float, max_atoms: int | None = None)[source]: Group features into potential isotopologue clusters based on retention time proximity and m/z differences. :param rt_tol: Retention time window for clustering. :param ppm_tol: m/z tolerance in parts per million for clustering. :param max_atoms: Maximum number of tracer atoms to consider for isotopologues. If None, IsoGroup automatically estimates the maximum number of isotopologues based on the feature m/z and tracer element.

create_clusters_df()[source]: Create and store a dataframe containing all clusters.

create_features_df()[source]: Create and store a dataframe containing all features.

deduplicate_clusters(keep: str | None = None)[source]

Clean up and deduplicate clusters by : - Merging clusters with identical feature compositions. - Removing clusters that are subsets of larger clusters (if keep is “longest”). - Keeping only the best candidate feature for each isotopologue (if keep is “closest_mz”). - Updating each feature’s cluster memberships, isotopologue numbers, and also_in lists.

Parameters:: keep – Strategy for deduplication. Options are “longest” to keep the largest cluster, “closest_mz” to retain only the feature with the highest intensity for each isotopologue within a cluster, or “both” to apply both strategies. By default, all clusters are kept (“all”).

fully_labeled_enhancer(clusters_df, sample_name)[source]

Refine the untargeted pipeline annotations using fully labeled data.

Parameters:

clusters_df – DataFrame containing all clusters generated by the IsoGroup’s untargeted mode.
sample_name – Name of the fully labeled sample use for enhancer.

run_untargeted_pipeline(enhancing_mode=None, sample_name=None)[source]

Complete pipeline to build and deduplicate clusters from the dataset with logging and timing.

Parameters:

enhancing_mode – Mode used to enhance the dataset. Accepted values are “unlabeled” or “fully labeled”. If None, no enhancement is applied. Defaults to None.
sample_name – name of the sample file to use for enhancement. Required if enhancing_mode is specified.

unlabeled_enhancer(clusters_df, sample_name)[source]

Refine the untargeted pipeline annotations using unlabeled data.

Parameters:

clusters_df – DataFrame containing all clusters generated by the IsoGroup’s untargeted mode.
sample_name – Name of the unlabeled sample use for enhancer.

`feature.py`

class isogroup.base.feature.Feature(rt: float, mz: float, tracer: str, intensity: float, feature_id: str | None = None, tracer_element=None, formula: list | None = None, sample: str | None = None, chemical: list | None = None, metabolite: list | None = None, mz_error: list | None = None, rt_error: list | None = None, **extra_dims: dict)[source]

Bases: object

Represents a mass spectrometry feature in the dataset. A feature is characterized by its retention time (RT), mass-to-charge ratio (m/z), intensity. It can also have associated chemical information, isotopologues, and other metadata.

`database.py`

class isogroup.base.database.Database(dataset: DataFrame, tracer: str, tracer_element: str)[source]

Bases: object

Represents a database of theoretical features for a specific tracer.

initialize_theoretical_features()[source]: Creates chemical labelled objects from the dataset and initializes theoretical features. For each chemical, it generates features with isotopologues based on the tracer.

theoretical_database()[source]: Summarize theoretical features into a DataFrame and export it to a tsv file.

`cluster.py`

class isogroup.base.cluster.Cluster(features: list, cluster_id: str, name: str | None = None)[source]

Bases: object

Represents a cluster of mass spectrometry features. A cluster is a group of mass features originating from the same molecule, sharing the same elemental composition but different isotopic compositions. Clusters are used to group features related to the same metabolite or chemical compound.

property chemical: Returns the list of chemical objects for the cluster. Based on the metabolite name matching to the cluster name.

property duplicated_isotopologues: List[int]: Returns a list of duplicated isotopologues in the cluster.

property element_number: int: Returns the number of tracer elements in the cluster.

property expected_isotopologues_in_cluster: List[int]: Returns the list of expected isotopologues in the cluster. Based on the number of tracer element in its formula.

property formula: str: Returns the formula of the cluster. Based on the metabolite name matching to the cluster name.

property highest_mz: float: Returns the highest mass-to-charge ratio (m/z) of the features in the cluster.

property highest_rt: float: Returns the highest retention time (RT) of the features in the cluster.

property is_adduct: tuple[bool, str]

property is_complete: bool: Returns True if the cluster is complete (i.e contains all isotopologues expected).

property is_corrupted: bool: Returns True if the cluster is corrupted (overfilled ?) (i.e contains isotopologues not expected)

property is_duplicated: bool: Returns True if the cluster contains duplicated isotopologues.

property is_incomplete: bool: Returns True if the cluster is incomplete (i.e contains less isotopologues than expected).

property isotopologues: List[int]: Returns the list of isotopologues in the cluster. Based on the metabolite name matching to the cluster name.

property lowest_mz: float: Returns the lowest mass-to-charge ratio (m/z) of the features in the cluster.

property lowest_rt: float: Returns the lowest retention time (RT) of the features in the cluster.

property mean_mz: float: Returns the mean mass-to-charge ratio (m/z) of the features in the cluster.

property mean_rt: float: Returns the mean retention time (RT) of the features in the cluster.

property metabolite: Returns the list of metabolite annotations for features in the cluster.

property missing_isotopologues: List[int]: Returns a list of missing isotopologues in the annotated cluster. Based on the expected isotopologues in the cluster.

property status: str: Returns the status of the cluster based on its completeness, incompleteness, and duplication.

property summary: dict: Returns a summary of the cluster.

`misc.py`

class isogroup.base.misc.Misc[source]

Bases: object

Miscellaneous utility functions for isotope labelling analysis.

calculate_isotopologue_index(base_mz: float, mzshift_tracer: float) → int[source]

Calculate the theoretical isotopologue index based on m/z values.

Parameters:

candidate_mz – m/z of the candidate isotopologue.
base_mz – m/z of the base (unlabeled) feature.
mzshift_tracer – m/z shift corresponding to the tracer.

static calculate_mzshift(tracer: str) → float[source]

Calculate the m/z shift for a given tracer (e.g. “13C”).

Parameters:: tracer – Tracer code (e.g. “13C”).

static get_atomic_mass(element: str) → float | None[source]

Returns the atomic mass of the given element.

Parameters:: element – Chemical element symbol (e.g. “C”, “H”, “N”, “O”).

static get_max_isotopologues_for_mz(mz: float, tracer_element: str) → int[source]

Returns the maximum number of isotopologues to consider based on the m/z value. This is a placeholder function and should be replaced with actual logic as needed.

Parameters:

mz – Mass-to-charge ratio of the feature.
tracer_element – Tracer element symbol (e.g. “C”, “N”).

`io.py`

class isogroup.base.io.IoHandler[source]

Bases: object

Handles input and output operations.

clusters_summary(clusters_to_summarize: dict)[source]

Export a tsv file with a summary of the clusters

Parameters:: clusters_to_summarize – dict containing clusters to summarize
Returns:: pd.DataFrame with the summary of the clusters

create_output_directory(outputs_path)[source]

Create an output directory for saving results.

Parameters:: outputs_path – Path to the output directory.

export_clusters(dataframe_to_export: DataFrame)[source]

Convert the clusters into a pandas DataFrame for easier analysis and export (Untargeted case).

Parameters:: cluster_to_export – dict containing clusters to export

export_features(dataframe_to_export: DataFrame)[source]

Export all features to a TSV file.

Parameters:: features_to_export – dict containing features to export

export_theoretical_database(database: DataFrame)[source]

Summarize theoretical features into a DataFrame and export it to a tsv file.

Parameters:: database – Database object containing theoretical features.

read_database(database)[source]

Reads the database from the specified file path and loads it into a pandas DataFrame.

Parameters:: database – Path to the database file.

read_dataset(dataset)[source]

Reads the dataset from the specified file path and loads it into a pandas DataFrame.

Parameters:: dataset – Path to the dataset file.

Reference

experiment.py

targeted_experiment.py

untargeted_experiment.py

feature.py

database.py

cluster.py

misc.py

io.py

`experiment.py`

`targeted_experiment.py`

`untargeted_experiment.py`

`feature.py`

`database.py`

`cluster.py`

`misc.py`

`io.py`