Reference
experiment.py
- class isogroup.base.experiment.Experiment(dataset: DataFrame, tracer: str, ppm_tol: float, rt_tol: float, max_atoms: int | None = None, database: DataFrame | None = None)[source]
Bases:
objectRepresents a mass spectrometry experiment with experimental features.
- initialize_experimental_features()[source]
Initialize Feature objects from the dataset and organize them by sample. Each feature is created with its retention time, m/z, tracer, intensity, and sample name.
- property ppm_tol: float
Returns the m/z tolerance (in ppm) used for feature annotation.
- property rt_tol: float
Returns the retention time tolerance used for feature annotation.
- property tracer: str
Returns the tracer used for the experiment.
- property tracer_element: str
Returns the tracer element used in the experiment.
- property tracer_idx: int
Returns the tracer index used in the experiment.
targeted_experiment.py
- class isogroup.base.targeted_experiment.TargetedExperiment(dataset: DataFrame, tracer: str, ppm_tol: float, rt_tol: float, database: DataFrame)[source]
Bases:
ExperimentRepresents a targeted mass spectrometry experiment. Used to group and annotate detected features from an experimental dataset using a reference database with isotopic tracer information.
- annotate_features()[source]
Annotate experimental features by matching them with the database features within specified m/z and retention time tolerances.
- clusterize()[source]
Group features by metabolite names within each sample and assign a unique cluster ID to each group. Populates self.clusters as a dictionary of the form: {sample_name: {cluster_id: Cluster object}}
- get_clusters_from_name(name, sample_name: str)[source]
Get a cluster from the experiment by its name, in a given sample if provided
- Parameters:
name – Name of the cluster to retrieve
sample_name – Name of the sample to retrieve the cluster from
- Returns:
Cluster object if found, None otherwise
- get_features_from_name(name: str, sample_name: str)[source]
Retrieve all features in a given sample that are annotated with a specific metabolite name.
- Parameters:
name – Name of the metabolite for which to retrieve features
sample_name – Name of the sample from which to retrieve features
- Returns:
List of Feature objects that match the metabolite name in the specified sample
untargeted_experiment.py
- class isogroup.base.untargeted_experiment.UntargetedExperiment(dataset: DataFrame, tracer: str, ppm_tol: float, rt_tol: float, max_atoms: int | None = None, keep: str | None = None)[source]
Bases:
ExperimentRepresents an untargeted mass spectrometry experiment. An untargeted experiment involves grouping features into potential isotopologue clusters based on retention time proximity and m/z differences.
- build_clusters(rt_tol: float, ppm_tol: float, max_atoms: int | None = None)[source]
Group features into potential isotopologue clusters based on retention time proximity and m/z differences. :param rt_tol: Retention time window for clustering. :param ppm_tol: m/z tolerance in parts per million for clustering. :param max_atoms: Maximum number of tracer atoms to consider for isotopologues. If None, IsoGroup automatically estimates the maximum number of isotopologues based on the feature m/z and tracer element.
- deduplicate_clusters(keep: str | None = None)[source]
Clean up and deduplicate clusters by : - Merging clusters with identical feature compositions. - Removing clusters that are subsets of larger clusters (if keep is “longest”). - Keeping only the best candidate feature for each isotopologue (if keep is “closest_mz”). - Updating each feature’s cluster memberships, isotopologue numbers, and also_in lists.
- Parameters:
keep – Strategy for deduplication. Options are “longest” to keep the largest cluster, “closest_mz” to retain only the feature with the highest intensity for each isotopologue within a cluster, or “both” to apply both strategies. By default, all clusters are kept (“all”).
- fully_labeled_enhancer(clusters_df, sample_name)[source]
Refine the untargeted pipeline annotations using fully labeled data.
- Parameters:
clusters_df – DataFrame containing all clusters generated by the IsoGroup’s untargeted mode.
sample_name – Name of the fully labeled sample use for enhancer.
- run_untargeted_pipeline(enhancing_mode=None, sample_name=None)[source]
Complete pipeline to build and deduplicate clusters from the dataset with logging and timing.
- Parameters:
enhancing_mode – Mode used to enhance the dataset. Accepted values are “unlabeled” or “fully labeled”. If None, no enhancement is applied. Defaults to None.
sample_name – name of the sample file to use for enhancement. Required if enhancing_mode is specified.
feature.py
- class isogroup.base.feature.Feature(rt: float, mz: float, tracer: str, intensity: float, feature_id: str | None = None, tracer_element=None, formula: list | None = None, sample: str | None = None, chemical: list | None = None, metabolite: list | None = None, mz_error: list | None = None, rt_error: list | None = None, **extra_dims: dict)[source]
Bases:
objectRepresents a mass spectrometry feature in the dataset. A feature is characterized by its retention time (RT), mass-to-charge ratio (m/z), intensity. It can also have associated chemical information, isotopologues, and other metadata.
database.py
- class isogroup.base.database.Database(dataset: DataFrame, tracer: str, tracer_element: str)[source]
Bases:
objectRepresents a database of theoretical features for a specific tracer.
cluster.py
- class isogroup.base.cluster.Cluster(features: list, cluster_id: str, name: str | None = None)[source]
Bases:
objectRepresents a cluster of mass spectrometry features. A cluster is a group of mass features originating from the same molecule, sharing the same elemental composition but different isotopic compositions. Clusters are used to group features related to the same metabolite or chemical compound.
- property chemical
Returns the list of chemical objects for the cluster. Based on the metabolite name matching to the cluster name.
- property duplicated_isotopologues: List[int]
Returns a list of duplicated isotopologues in the cluster.
- property element_number: int
Returns the number of tracer elements in the cluster.
- property expected_isotopologues_in_cluster: List[int]
Returns the list of expected isotopologues in the cluster. Based on the number of tracer element in its formula.
- property formula: str
Returns the formula of the cluster. Based on the metabolite name matching to the cluster name.
- property highest_mz: float
Returns the highest mass-to-charge ratio (m/z) of the features in the cluster.
- property highest_rt: float
Returns the highest retention time (RT) of the features in the cluster.
- property is_adduct: tuple[bool, str]
- property is_complete: bool
Returns True if the cluster is complete (i.e contains all isotopologues expected).
- property is_corrupted: bool
Returns True if the cluster is corrupted (overfilled ?) (i.e contains isotopologues not expected)
- property is_duplicated: bool
Returns True if the cluster contains duplicated isotopologues.
- property is_incomplete: bool
Returns True if the cluster is incomplete (i.e contains less isotopologues than expected).
- property isotopologues: List[int]
Returns the list of isotopologues in the cluster. Based on the metabolite name matching to the cluster name.
- property lowest_mz: float
Returns the lowest mass-to-charge ratio (m/z) of the features in the cluster.
- property lowest_rt: float
Returns the lowest retention time (RT) of the features in the cluster.
- property mean_mz: float
Returns the mean mass-to-charge ratio (m/z) of the features in the cluster.
- property mean_rt: float
Returns the mean retention time (RT) of the features in the cluster.
- property metabolite
Returns the list of metabolite annotations for features in the cluster.
- property missing_isotopologues: List[int]
Returns a list of missing isotopologues in the annotated cluster. Based on the expected isotopologues in the cluster.
- property status: str
Returns the status of the cluster based on its completeness, incompleteness, and duplication.
- property summary: dict
Returns a summary of the cluster.
misc.py
- class isogroup.base.misc.Misc[source]
Bases:
objectMiscellaneous utility functions for isotope labelling analysis.
- calculate_isotopologue_index(base_mz: float, mzshift_tracer: float) int[source]
Calculate the theoretical isotopologue index based on m/z values.
- Parameters:
candidate_mz – m/z of the candidate isotopologue.
base_mz – m/z of the base (unlabeled) feature.
mzshift_tracer – m/z shift corresponding to the tracer.
- static calculate_mzshift(tracer: str) float[source]
Calculate the m/z shift for a given tracer (e.g. “13C”).
- Parameters:
tracer – Tracer code (e.g. “13C”).
- static get_atomic_mass(element: str) float | None[source]
Returns the atomic mass of the given element.
- Parameters:
element – Chemical element symbol (e.g. “C”, “H”, “N”, “O”).
- static get_max_isotopologues_for_mz(mz: float, tracer_element: str) int[source]
Returns the maximum number of isotopologues to consider based on the m/z value. This is a placeholder function and should be replaced with actual logic as needed.
- Parameters:
mz – Mass-to-charge ratio of the feature.
tracer_element – Tracer element symbol (e.g. “C”, “N”).
io.py
- class isogroup.base.io.IoHandler[source]
Bases:
objectHandles input and output operations.
- clusters_summary(clusters_to_summarize: dict)[source]
Export a tsv file with a summary of the clusters
- Parameters:
clusters_to_summarize – dict containing clusters to summarize
- Returns:
pd.DataFrame with the summary of the clusters
- create_output_directory(outputs_path)[source]
Create an output directory for saving results.
- Parameters:
outputs_path – Path to the output directory.
- export_clusters(dataframe_to_export: DataFrame)[source]
Convert the clusters into a pandas DataFrame for easier analysis and export (Untargeted case).
- Parameters:
cluster_to_export – dict containing clusters to export
- export_features(dataframe_to_export: DataFrame)[source]
Export all features to a TSV file.
- Parameters:
features_to_export – dict containing features to export
- export_theoretical_database(database: DataFrame)[source]
Summarize theoretical features into a DataFrame and export it to a tsv file.
- Parameters:
database – Database object containing theoretical features.