{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "e96eb373-a281-4308-8cc6-91b5653dd62e", "metadata": {}, "source": [ "# Showcase: `isogroup` Python package" ] }, { "cell_type": "markdown", "id": "3c18fb5e-9963-4aad-bc73-518f1dfa7d6d", "metadata": {}, "source": [ "## Targeted processing to annotate isotopic clusters\n", "The aim is to annotate a dataset from an untargeted labeling experiment using an in-house database" ] }, { "cell_type": "code", "execution_count": null, "id": "21feeb4c-9a64-4e42-a31e-c4c3809d18ce", "metadata": {}, "outputs": [], "source": [ "from isocor.base import LabelledChemical\n", "from isogroup.base.feature import Feature\n", "from isogroup.base.sample import Sample\n", "from isogroup.base.cluster import Cluster\n", "from isogroup.base.database import Database\n", "from isogroup.base.targeted_experiment import Experiment\n", "import pandas as pd" ] }, { "cell_type": "markdown", "id": "7f68cdf2-e950-4967-b716-e727129a830a", "metadata": {}, "source": [ "## Instanciation of an *isotopic database*" ] }, { "cell_type": "markdown", "id": "1f82e390-2d2a-46b1-b0e9-c358cabe82c4", "metadata": {}, "source": [ "**A database object** is build from a peaktable which contains the following information:\n", "- \"metabolite\" : name of the metabolite in your database\n", "- \"rt\" : retention time for this compound in your analytical method\n", "- \"formula\" : formula of the metabolite\n", "- \"charge\"" ] }, { "cell_type": "markdown", "id": "9f392d7a-1423-42f3-8f26-a39adefa0382", "metadata": { "tags": [] }, "source": [ "i) Give path to open your database file and run the cell" ] }, { "cell_type": "code", "execution_count": null, "id": "039440c9-b101-4479-80b1-ccc321642ac2", "metadata": {}, "outputs": [], "source": [ "db_data = pd.read_csv(\"data/database_test.csv\", sep=\";\")\n", "\n", "# Displays the first lines of the database for inspection\n", "db_data.head()" ] }, { "cell_type": "markdown", "id": "819cc4c2-4b97-46e9-a867-01cc622a8a14", "metadata": {}, "source": [ "ii) Define the tracer of your experiment (e.g: \"13C\", \"15N\"...) \\\n", "iii) Run the cell to create your database. It returns all isotopologues (and masses) of the metabolite in your database." ] }, { "cell_type": "code", "execution_count": null, "id": "059bc5d6-a4a1-48ec-a64c-5eb4a5e164dc", "metadata": { "scrolled": true }, "outputs": [], "source": [ "database = Database(dataset=db_data, tracer=\"13C\")\n", "\n", "# Print the theoretical features\n", "for feature in database.features:\n", " print(feature)" ] }, { "cell_type": "markdown", "id": "7673aab3-e1fb-4ab1-9dfc-e0b704eb912f", "metadata": {}, "source": [ "## Instanciation of the *targeted experiment*" ] }, { "cell_type": "markdown", "id": "bb1d9689-1d98-402c-b393-df57c71b0d1c", "metadata": {}, "source": [ "### Open the dataset\n", "The **dataset** with the experimental features must contain the following columns : \n", "- the identification of features (id)\n", "- the mass-charge ratio (m/z)\n", "- the retention time (rt)\n", "- samples with intensities" ] }, { "cell_type": "markdown", "id": "82b82468-4f52-4efb-98ea-d7fc0317e7df", "metadata": {}, "source": [ "i) Give path to open your dataset file (e.g: output file of XCMS, MZMine, ... ) \\\n", "ii) Run the cell" ] }, { "cell_type": "code", "execution_count": null, "id": "cffa2bfa-9acc-4934-99be-8d091db2ddeb", "metadata": {}, "outputs": [], "source": [ "data = pd.read_csv(\"data/dataset_test.txt\", sep=\"\\t\")\n", "data = data.set_index([\"mz\", \"rt\", \"id\"])\n", "\n", "# Displays the first lines of your dataset for inspection\n", "data.head()" ] }, { "cell_type": "markdown", "id": "175bca31-4177-4ffc-8219-7b96e318a09a", "metadata": {}, "source": [ "### Create the experiment\n", "\n", "It proceeds in the following steps:\n", "\n", "1. Initialization of experimental features from the dataset. A **feature** is defined from mass data as a set of (mz, rt, intensity) and is individual for each sample in the dataset\n", "2. Annotation of experimental features using your database, within given tolerance (mz & rt).\n", "3. Create clusters from annotated features. It returns clusters with supplementary information to specify if the cluster is complete or not (if all isotopologues are retrieved)\n", "4. Export dataframe" ] }, { "cell_type": "markdown", "id": "b347101d-343e-49f0-a686-772c649b54fc", "metadata": {}, "source": [ "i) Give your database, your dataset and the tracer of your experiment \\\n", "ii) Run the cell : it returns your experiment object " ] }, { "cell_type": "code", "execution_count": null, "id": "a4e8171c-9aa1-416d-a079-5d6411efd3e1", "metadata": {}, "outputs": [], "source": [ "experiment = Experiment(dataset=data, database=database, tracer=\"13C\")" ] }, { "cell_type": "markdown", "id": "4e078757-8d21-4359-9832-affec60a165b", "metadata": {}, "source": [ "### Annotate the dataset \n", "\n", "i) Give the mz (in ppm) and rt tolerance (in seconds) you allowed \\\n", "ii) Run the cell. It returns experimental feature with potential annotations and exact mass and rt errors compared to your database. " ] }, { "cell_type": "code", "execution_count": null, "id": "be2334a8-bfc3-4ee4-a0d6-69c65552f78b", "metadata": {}, "outputs": [], "source": [ "experiment.annotate_experiment(mz_tol=5, rt_tol=10)" ] }, { "cell_type": "markdown", "id": "da393af9-c183-4d4d-a2d8-529835b2ae1f", "metadata": {}, "source": [ "Optional : Display the samples of your experiment" ] }, { "cell_type": "code", "execution_count": null, "id": "7c77aa34-7ed4-46ff-96b5-9c8caac7d272", "metadata": {}, "outputs": [], "source": [ "# Display the samples of your experiment\n", "for sample in experiment.samples:\n", " print(sample)\n", "print()\n", "\n", "# Display the annotated features for each samples\n", "#for sample, feature in experiment.samples.items():\n", "# print(sample, feature, end=\"\\n\\n\")\n" ] }, { "cell_type": "markdown", "id": "34c26c30-1fd8-4246-848d-ad630d209b76", "metadata": {}, "source": [ "### Get annotated clusters\n", "\n", "\n", "A **cluster** is composed of a list of features\\\n", "The annotated clusters are obtained by grouping features according to their annotation" ] }, { "cell_type": "code", "execution_count": null, "id": "df7098df-f149-4687-a710-42481d60c7f9", "metadata": {}, "outputs": [], "source": [ "experiment.clusterize()" ] }, { "cell_type": "markdown", "id": "82d731d7-8713-4527-88b8-29826ad52d49", "metadata": {}, "source": [ "If you want to display the annotated cluster for a specif sample: \\\n", "change the name of the sample and run the cell below" ] }, { "cell_type": "code", "execution_count": null, "id": "fbb5ea4a-bd47-424d-9d7a-62a05e10b909", "metadata": {}, "outputs": [], "source": [ "# Display the annotated cluster for a specific sample\n", "experiment.clusters['C12_TP_1']\n", "\n", "print(experiment.clusters, end=\"\\n\\n\")" ] }, { "cell_type": "markdown", "id": "6afd6342-2868-4340-9cb2-c36ac31237d1", "metadata": {}, "source": [ "### Create dataframe and export tables" ] }, { "cell_type": "markdown", "id": "3bca6436-f913-4f50-aa06-736c1c251d12", "metadata": {}, "source": [ "#### For features export\n", "\n", "Export a dataframe containing all the features of your dataset with potential annotation (metabolite & isotopologues) and the calculated errors (mz & rt)" ] }, { "cell_type": "code", "execution_count": null, "id": "715924cd-1633-4bc7-9659-22ebe5b19698", "metadata": {}, "outputs": [], "source": [ "df = experiment.export_features()\n", "\n", "# Print the head of your dataframe for inspection\n", "df.head()" ] }, { "cell_type": "markdown", "id": "4a1966a7-f6de-4a14-9c6f-5fd950304eea", "metadata": {}, "source": [ "If you want to export a tsv file : provide a path and a filename.\\\n", "Run the cell" ] }, { "cell_type": "code", "execution_count": null, "id": "8c219b41-2f45-4b59-9776-d95aa71a05f9", "metadata": { "scrolled": true }, "outputs": [], "source": [ "# Export the dataframe\n", "experiment.export_features(\"data/df_feature.tsv\")" ] }, { "cell_type": "markdown", "id": "43e8438e-67bc-481b-8be8-bcab457f735e", "metadata": {}, "source": [ "I you want to export a tsv file for a specific sample : provide a path, a filename and a sample name\\\n", "Run the cell" ] }, { "cell_type": "code", "execution_count": null, "id": "2ce964c3-d461-4ff1-99cc-96ad5abd7651", "metadata": {}, "outputs": [], "source": [ "# Export the dataframe for a specific sample\n", "experiment.export_features(\"data/df_feature_sample.tsv\", sample_name = \"C13_TP_1\")" ] }, { "cell_type": "markdown", "id": "bf24e1a0-1b66-4f48-8136-e00fcb7d02e8", "metadata": {}, "source": [ "#### For clusters export\n", "\n", "Export a dataframe containing all the annotated clusters build from annotated features.\\\n", "The dataframe contains supplementary information on the clusters like its status :\n", "- ok if the cluster is complete\n", "- incomplete if there is missing isotopologues\n", "- ....." ] }, { "cell_type": "code", "execution_count": null, "id": "8c93c0f3-6392-41fe-930e-6d65e2da3d51", "metadata": {}, "outputs": [], "source": [ "df_cluster = experiment.export_clusters()\n", "\n", "# Print the head of your dataframe for inspection\n", "df_cluster.head()" ] }, { "cell_type": "markdown", "id": "2d273d8e-a9ca-4fcd-b3f5-8c87485dbfce", "metadata": {}, "source": [ "If you want to export a tsv file : provide a path and a filename.\\\n", "Run the cell" ] }, { "cell_type": "code", "execution_count": null, "id": "413fc4f6-0f0e-461b-8253-f92bef5334ec", "metadata": {}, "outputs": [], "source": [ "# Export the cluster summary\n", "experiment.export_clusters(filename=\"data/df_cluster.tsv\")" ] }, { "cell_type": "markdown", "id": "777c9f16-5368-4d35-8612-24815aba4bf7", "metadata": {}, "source": [ "I you want to export a tsv file for a specific sample : provide a path, a filename and a sample name\\\n", "Run the cell" ] }, { "cell_type": "code", "execution_count": null, "id": "e6e6d02b-39b4-4816-ba4e-1b27b7588dc0", "metadata": {}, "outputs": [], "source": [ "# Export the cluster summary for a specific sample\n", "experiment.export_clusters(\"data/cluster_summary_sample.tsv\", sample_name=\"C13_TP_1\")" ] }, { "cell_type": "markdown", "id": "b0a24b10-24b2-4d6f-9a06-5923523ec384", "metadata": {}, "source": [ "#### For clusters summary" ] }, { "cell_type": "markdown", "id": "ef00ce66-492c-4b37-8422-951df0ad52b2", "metadata": {}, "source": [ "If you want to export a summary of specificities for each cluster (i.e: id, name, features, isotopologues, status...)\\\n", "Give a path and a file name\\\n", "Run the cell" ] }, { "cell_type": "code", "execution_count": null, "id": "fa8c33c9-6a92-49fd-8d66-f58ed6cb1cfb", "metadata": {}, "outputs": [], "source": [ "experiment.clusters_summary(filename=\"data/test_cluster_summary.tsv\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.2" } }, "nbformat": 4, "nbformat_minor": 5 }