Showcase: isogroup Python package
Targeted processing to annotate isotopic clusters
The aim is to annotate a dataset from an untargeted labeling experiment using an in-house database
[ ]:
from isocor.base import LabelledChemical
from isogroup.base.feature import Feature
from isogroup.base.sample import Sample
from isogroup.base.cluster import Cluster
from isogroup.base.database import Database
from isogroup.base.targeted_experiment import Experiment
import pandas as pd
Instanciation of an isotopic database
A database object is build from a peaktable which contains the following information:
“metabolite” : name of the metabolite in your database
“rt” : retention time for this compound in your analytical method
“formula” : formula of the metabolite
“charge”
Give path to open your database file and run the cell
[ ]:
db_data = pd.read_csv("data/database_test.csv", sep=";")
# Displays the first lines of the database for inspection
db_data.head()
Define the tracer of your experiment (e.g: “13C”, “15N”…)
Run the cell to create your database. It returns all isotopologues (and masses) of the metabolite in your database.
[ ]:
database = Database(dataset=db_data, tracer="13C")
# Print the theoretical features
for feature in database.features:
print(feature)
Instanciation of the targeted experiment
Open the dataset
The dataset with the experimental features must contain the following columns :
the identification of features (id)
the mass-charge ratio (m/z)
the retention time (rt)
samples with intensities
Give path to open your dataset file (e.g: output file of XCMS, MZMine, … )
Run the cell
[ ]:
data = pd.read_csv("data/dataset_test.txt", sep="\t")
data = data.set_index(["mz", "rt", "id"])
# Displays the first lines of your dataset for inspection
data.head()
Create the experiment
It proceeds in the following steps:
Initialization of experimental features from the dataset. A feature is defined from mass data as a set of (mz, rt, intensity) and is individual for each sample in the dataset
Annotation of experimental features using your database, within given tolerance (mz & rt).
Create clusters from annotated features. It returns clusters with supplementary information to specify if the cluster is complete or not (if all isotopologues are retrieved)
Export dataframe
Give your database, your dataset and the tracer of your experiment
Run the cell : it returns your experiment object
[ ]:
experiment = Experiment(dataset=data, database=database, tracer="13C")
Annotate the dataset
Give the mz (in ppm) and rt tolerance (in seconds) you allowed
Run the cell. It returns experimental feature with potential annotations and exact mass and rt errors compared to your database.
[ ]:
experiment.annotate_experiment(mz_tol=5, rt_tol=10)
Optional : Display the samples of your experiment
[ ]:
# Display the samples of your experiment
for sample in experiment.samples:
print(sample)
print()
# Display the annotated features for each samples
#for sample, feature in experiment.samples.items():
# print(sample, feature, end="\n\n")
Get annotated clusters
[ ]:
experiment.clusterize()
[ ]:
# Display the annotated cluster for a specific sample
experiment.clusters['C12_TP_1']
print(experiment.clusters, end="\n\n")
Create dataframe and export tables
For features export
Export a dataframe containing all the features of your dataset with potential annotation (metabolite & isotopologues) and the calculated errors (mz & rt)
[ ]:
df = experiment.export_features()
# Print the head of your dataframe for inspection
df.head()
[ ]:
# Export the dataframe
experiment.export_features("data/df_feature.tsv")
[ ]:
# Export the dataframe for a specific sample
experiment.export_features("data/df_feature_sample.tsv", sample_name = "C13_TP_1")
For clusters export
ok if the cluster is complete
incomplete if there is missing isotopologues
[ ]:
df_cluster = experiment.export_clusters()
# Print the head of your dataframe for inspection
df_cluster.head()
[ ]:
# Export the cluster summary
experiment.export_clusters(filename="data/df_cluster.tsv")
[ ]:
# Export the cluster summary for a specific sample
experiment.export_clusters("data/cluster_summary_sample.tsv", sample_name="C13_TP_1")
For clusters summary
[ ]:
experiment.clusters_summary(filename="data/test_cluster_summary.tsv")