- DM-P450: Developing a Multimodal Deep Learning Model for P450 Mining
- Overview
- Installation
- Usage
- Citation
Given the complex interactions between cytochrome P450 enzyme catalytic pockets and their substrates, we developed a multimodal deep learning model (named DM-P450) to predict whether a given P450 enzyme can catalyze a specific substrate molecule.
To set up the environment:
conda env create -f environment.yml
conda activate DMP450
pip install dgl -f https://data.dgl.ai/wheels/torch-2.4/cu124/repo.html
pip install boltz[cuda] -UDue to dependency conflicts between different software components, a step-by-step installation strategy is adopted.
In addition, you can download the dataset from zenodo.org and place it in the P450_docking/P450_db folder for use in docking.
Furthermore, the software involves initialization of Uni-Mol and Boltz model weights. It is therefore recommended to install and run the program in an environment with a stable and high-bandwidth network connection, or alternatively, prepare the Uni-Mol and Boltz pretrained weights in advance to avoid download issues during execution.
- prpare data
Please place the protein sequences you wish to predict (in FASTA format) and the substrates (in SDF format) into the P450_docking folder, and provide them as inputs in the subsequent command-line instructions.
- Activate the environment
conda activate DMP450- Clean logs and cache (optional)
rm -rf ./logs/* DM_P450_model/data/cache/*- Set up environment variables
export PYTHONPATH=$PWD:$PYTHONPATH- Run inference Choose the model you want to use from DM-P450, Pocket-P450, or Seq-P450, and provide the corresponding file name as input. (Only the file name is required; the program will automatically locate the file in the designated directory.)
The objective of this framework is to identify, from multiple candidate P450 enzymes, those capable of catalyzing a given target substrate. When the DM or Pocket model is selected, molecular docking is performed to generate enzyme–substrate complexes.
After docking is completed, the interface will prompt:
Please check the modeling results and input the reaction site residue number in uppercase (e.g., C21).
Users should visually inspect the docking output and specify the target active-site residue accordingly. The residue number can be found in the output PDB file located at ./P450_docking/*.pdbqt.
Once the active site is confirmed, the program will automatically proceed with the deep learning prediction and output the final results.
python scripts/infer.py -model DM-P450 -inputFA test.fasta -substrate AGI.sdf | tee logs/infer.log
python scripts/infer.py -model Seq-Only -inputFA test.fasta -substrate AGI.sdf | tee logs/infer.log
python scripts/infer.py -model Pocket-Only -inputFA test.fasta -substrate AGI.sdf | tee logs/infer.logOr simply run:
conda activate DMP450
sh scripts/infer.shThe output results will appear in the DM_P450_model/data/output directory and will be stored in CSV format.
The CSV file contains data in the format: Enzyme_ID, Substrate_ID, Predicted_Probability, where the maximum probability value is 1.
Discovery of Cytochrome P450 Enzymes via Multimodal Deep Learning