Skip to content

JiangLab2020/DM-P450

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DM-P450: Developing a Multimodal Deep Learning Model for P450 Mining

Overview

Given the complex interactions between cytochrome P450 enzyme catalytic pockets and their substrates, we developed a multimodal deep learning model (named DM-P450) to predict whether a given P450 enzyme can catalyze a specific substrate molecule.

Installation

To set up the environment:

conda env create -f environment.yml
conda activate DMP450
pip install  dgl -f https://data.dgl.ai/wheels/torch-2.4/cu124/repo.html
pip install boltz[cuda] -U

Due to dependency conflicts between different software components, a step-by-step installation strategy is adopted.

In addition, you can download the dataset from zenodo.org and place it in the P450_docking/P450_db folder for use in docking.

Furthermore, the software involves initialization of Uni-Mol and Boltz model weights. It is therefore recommended to install and run the program in an environment with a stable and high-bandwidth network connection, or alternatively, prepare the Uni-Mol and Boltz pretrained weights in advance to avoid download issues during execution.

Usage

  1. prpare data

Please place the protein sequences you wish to predict (in FASTA format) and the substrates (in SDF format) into the P450_docking folder, and provide them as inputs in the subsequent command-line instructions.

  1. Activate the environment
conda activate DMP450
  1. Clean logs and cache (optional)
rm -rf ./logs/* DM_P450_model/data/cache/*
  1. Set up environment variables
export PYTHONPATH=$PWD:$PYTHONPATH
  1. Run inference Choose the model you want to use from DM-P450, Pocket-P450, or Seq-P450, and provide the corresponding file name as input. (Only the file name is required; the program will automatically locate the file in the designated directory.)

The objective of this framework is to identify, from multiple candidate P450 enzymes, those capable of catalyzing a given target substrate. When the DM or Pocket model is selected, molecular docking is performed to generate enzyme–substrate complexes.

After docking is completed, the interface will prompt:

Please check the modeling results and input the reaction site residue number in uppercase (e.g., C21).

Users should visually inspect the docking output and specify the target active-site residue accordingly. The residue number can be found in the output PDB file located at ./P450_docking/*.pdbqt. Once the active site is confirmed, the program will automatically proceed with the deep learning prediction and output the final results.

python scripts/infer.py -model DM-P450     -inputFA test.fasta -substrate AGI.sdf | tee logs/infer.log

python scripts/infer.py -model Seq-Only    -inputFA test.fasta -substrate AGI.sdf | tee logs/infer.log

python scripts/infer.py -model Pocket-Only -inputFA test.fasta -substrate AGI.sdf | tee logs/infer.log

Or simply run:

conda activate DMP450
sh scripts/infer.sh

The output results will appear in the DM_P450_model/data/output directory and will be stored in CSV format.

The CSV file contains data in the format: Enzyme_ID, Substrate_ID, Predicted_Probability, where the maximum probability value is 1.

Citation

Discovery of Cytochrome P450 Enzymes via Multimodal Deep Learning

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors