diff --git a/requirements.txt b/requirements.txt index 154b806..b8d9305 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,3 +1,3 @@ myst_parser -deepmodeling_sphinx >= 0.0.11 +deepmodeling_sphinx >= 0.1.2 sphinx_rtd_theme diff --git a/source/CaseStudies/Examples/methane.md b/source/CaseStudies/Examples/methane.md deleted file mode 100644 index 50138e9..0000000 --- a/source/CaseStudies/Examples/methane.md +++ /dev/null @@ -1,75 +0,0 @@ -# Simulation of the oxidation of methane - -Jinzhe Zeng, Liqun Cao, and Tong Zhu - -This tutorial was adapted from: Jinzhe Zeng, Liqun Cao, Tong Zhu (2022), Neural network potentials, Pavlo O. Dral (Eds.), _Quantum Chemistry in the Age of Machine Learning_, Elsevier. Please cite the above chapter if you follow the tutorial. - ----- - -In this tutorial, we will take the simulation of methane combustion as an example and introduce the procedure of DP-based MD simulation. All files needed in this section can be downloaded from [tongzhugroup/Chapter13-tutorial](https://github.com/tongzhugroup/Chapter13-tutorial). Besides DeePMD-kit (with LAMMPS), [ReacNetGenerator](https://github.com/tongzhugroup/reacnetgenerator) should also be installed. - -## Step 1: Preparing the reference dataset - -In the reference dataset preparation process, one also has to consider the expect accuracy of the final model, or at what QM level one should label the data. In [this paper](https://doi.org/10.1038/s41467-020-19497-z), the [Gaussian](https://gaussian.com) software was used to calculate the potential energy and atomic forces of the reference data at the MN15/6-31G\*\* level. The MN15 functional was employed because it has good accuracy for both multi-reference and single-reference systems, which is essential for our system as we have to deal with a lot of radicals and their reactions. Here we assume that the dataset is prepared in advance, which can be downloaded from [tongzhugroup/Chapter13-tutorial](https://github.com/tongzhugroup/Chapter13-tutorial). - -## Step 2. Training the Deep Potential (DP) - -Before the training process, we need to prepare an input file called `methane_param.json` which contains the control parameters. The training can be done by the following command: - -```sh -$ $deepmd_root/bin/dp train methane_param.json -``` - -There are several parameters we need to define in the `methane_param.json` file. The type_map refers to the type of elements included in the training, and the option of rcut is the cut-off radius which controls the description of the environment around the center atom. The type of descriptor is `se_a` in this example, which represents the DeepPot-SE model. The descriptor will decay smoothly from rcut_smth (R_on) to the cut-off radius rcut (R_off). Here rcut_smth and rcut are set to 1.0 Å and 6.0 Å respectively. The sel defines the maximum possible number of neighbors for the corresponding element within the cut-off radius. The options neuron in descriptor and fitting_net is used to determine the shape of the embedding neural network and the fitting network, which are set to (25, 50, 100) and (240, 240, 240) respectively. The value of `axis_neuron` represents the size of the embedding matrix, which was set to 12. - -## Step 3: Freeze the model - -This step is to extract the trained neural network model. To freeze the model, the following command will be executed: - -```sh -$ $deepmd_root/bin/dp freeze -o graph.pb -``` - -A file called `graph.pb` can be found in the training folder. Then the frozen model can be compressed: - -```sh -$ $deepmd_root/bin/dp compress -i graph.pb -o graph_compressed.pb -t methane_param.json -``` - -## Step 4: Running MD simulation based on the DP - -The frozen model can be used to run reactive MD simulations to explore the detailed reaction mechanism of methane combustion. The MD engine is provided by the [LAMMPS](https://github.com/lammps/lammps) software. Here we use the same system from [our previous work](https://doi.org/10.1038/s41467-020-19497-z), which contains 100 methane and 200 oxygen molecules. The MD will be performed under the NVT ensemble at 3000 K for 1 ns. The LAMMPS program can be invoked by the following command: -```sh -$ $deepmd_root/bin/lmp -i input.lammps -``` -The `input.lammps` is the input file that controls the MD simulation in detail, technique details can be found in [the manual of LAMMPS](https://docs.lammps.org/). To use the DP, the pair_style option in this input should be specified as follows: -``` -pair_style deepmd graph_compressed.pb -pair_coeff * * -``` - -## Step 5: Analysis of the trajectory - -After the simulation is done, we can use the [ReacNetGenerator](https://github.com/tongzhugroup/reacnetgenerator) software which was developed in our previous study to extract the reaction network from the trajectory. All species and reactions in the trajectory will be put on an interactive web page where we can analyze them by mouse clicks. Eventually we should be able to obtain reaction networks that consistent with the following figure. -```sh -$ reacnetgenerator -i methane.lammpstrj -a C H O --dump -``` - -![The initial stage of combustion](https://media.springernature.com/full/springer-static/image/art%3A10.1038%2Fs41467-020-19497-z/MediaObjects/41467_2020_19497_Fig2_HTML.png?as=webp) - -Fig: The initial stage of combustion. The figure is taken from [this paper](https://doi.org/10.1038/s41467-020-19497-z) and more results can be found there. - ----- -## Acknowledge - -This work was supported by the National Natural Science Foundation of China (Grants No. 22173032, 21933010). J.Z. was supported in part by the National Institutes of Health (GM107485) under the direction of Darrin M. York. We also thank the ECNU Multifunctional Platform for Innovation (No. 001) and the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation Grant ACI-1548562.56 (specifically, the resources EXPANSE at SDSC through allocation TG-CHE190067), for providing supercomputer time. - -## References - -1. Jinzhe Zeng, Liqun Cao, Tong Zhu (2022), Neural network potentials, Pavlo O. Dral (Eds.), _Quantum Chemistry in the Age of Machine Learning_, Elsevier. -2. Jinzhe Zeng, Liqun Cao, Mingyuan Xu, Tong Zhu, John Z. H. Zhang, Complex reaction processes in combustion unraveled by neural network-based molecular dynamics simulation, Nature Communications, 2020, 11, 5713. -3. Frisch, M.; Trucks, G.; Schlegel, H.; Scuseria, G.; Robb, M.; Cheeseman, J.; Scalmani, G.; Barone, V.; Petersson, G.; Nakatsuji, H., Gaussian 16, revision A. 03. Gaussian Inc., Wallingford CT 2016. -4. Han Wang, Linfeng Zhang, Jiequn Han, Weinan E, DeePMD-kit: A deep learning package for many-body potential energy representation and molecular dynamics, Computer Physics Communications, 2018, 228, 178-184. -5. Aidan P. Thompson, H. Metin Aktulga, Richard Berger, Dan S. Bolintineanu, W. Michael Brown, Paul S. Crozier, Pieter J. in 't Veld, Axel Kohlmeyer, Stan G. Moore, Trung Dac Nguyen, Ray Shan, Mark J. Stevens, Julien Tranchida, Christian Trott, Steven J. Plimpton, LAMMPS - a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales, Computer Physics Communications, 2022, 271, 108171. -6. Denghui Lu, Wanrun Jiang, Yixiao Chen, Linfeng Zhang, Weile Jia, Han Wang, Mohan Chen, DP Train, then DP Compress: Model Compression in Deep Potential Molecular Dynamics, 2021. -7. Jinzhe Zeng, Liqun Cao, Chih-Hao Chin, Haisheng Ren, John Z. H. Zhang, Tong Zhu, ReacNetGenerator: an automatic reaction network generator for reactive molecular dynamics simulations, Phys. Chem. Chem. Phys., 2020, 22 (2), 683–691. diff --git a/source/Tutorials/DP-GEN/learnDoc/DP-GEN_handson.md b/source/Tutorials/DP-GEN/learnDoc/DP-GEN_handson.md index d04eead..eb49bae 100644 --- a/source/Tutorials/DP-GEN/learnDoc/DP-GEN_handson.md +++ b/source/Tutorials/DP-GEN/learnDoc/DP-GEN_handson.md @@ -1,5 +1,5 @@ -# Hands-on tutorial for DP-GEN (v0.10.3) +# Hands-on tutorial for DP-GEN (v0.10.6) ## Workflow of the DP-GEN DeeP Potential GENerator (DP-GEN) is a package that implements a concurrent learning scheme to generate reliable DP models. Typically, the DP-GEN workflow contains three processes: init, run, and autotest. @@ -49,7 +49,7 @@ For surface systems, execute ```sh $ dpgen init_surf param.json machine.json ``` -A detailed description for preparing initial data in the standard way can be found at ‘Init’ Section of the [DP-GEN's documentation](https://github.com/deepmodeling/dpgen/blob/master/README.md). +A detailed description for preparing initial data in the standard way can be found at ‘Init’ Section of the [DP-GEN's documentation](https://docs.deepmodeling.com/projects/dpgen/en/latest/). **Initial data of this tutorial** @@ -689,10 +689,28 @@ $ cat cat dpgen.log | grep system ``` It can be found that 3010 structures are generated in `iter.000001`, in which no structure is collected for first-principle calculations. Therefore, the final models are not updated in iter.000002/00.train. +## Simplify +When you have a dataset containing lots of repeated data, this step will help you simplify your dataset.Since `dpgen simplify` is proformed on a large dataset, only a simple demo will be provided in this part. -### Auto-test +To learn more about simplify, you can refer to [DPGEN's Document](https://docs.deepmodeling.com/projects/dpgen/en/latest/) +[Document of dpgen simplify parameters](https://docs.deepmodeling.com/projects/dpgen/en/latest/simplify/simplify-jdata.html) +[Document of dpgen simplify machine parameters](https://docs.deepmodeling.com/projects/dpgen/en/latest/simplify/simplify-mdata.html) -To verify the accuracy of the DP model, users can calculate a simple set of properties and compare the results with those of a DFT or traditional empirical force field. DPGEN's autotest module supports the calculation of a variety of properties, such as +This demo can be download from dpgen/examples/simplify-MAPbI3-scan-lebesgue. You can find more example in [dpgen.examples](https://github.com/deepmodeling/dpgen/tree/master/examples) + +In the example, `data` contains a simplistic data set based on MAPbI3-scan case. Since it has been greatly reduced, do not take it seriously. It is just a demo. +`simplify_example` is the work path, which contains `INCAR` and templates for `simplify.json` and `machine.json`. You can use the command `nohup dpgen simplify simplify.json machine.json 1>log 2>err &` here to test if `dpgen simplify` can run normally. + +Kindly reminder: +1. `machine.json` is supported by `dpdispatcher 0.4.15`, please check https://docs.deepmodeling.com/projects/dpdispatcher/en/latest/ to update the parameters according to your `dpdispatcher` version. +2. `POTCAR` should be prepared by the user. +3. Please check the path and files name and make sure they are correct. + +Simplify can be used in Transfer Learning, see [CaseStudies: Transfer-learning](../../../CaseStudies/Transfer-learning/index.html) + +## Auto-test + +The function, `auto-test`, is only for alloy materials to verify the accuracy of their DP model, users can calculate a simple set of properties and compare the results with those of a DFT or traditional empirical force field. DPGEN's autotest module supports the calculation of a variety of properties, such as - 00.equi:(default task) the equilibrium state; @@ -706,6 +724,320 @@ To verify the accuracy of the DP model, users can calculate a simple set of pro - 05.surf: the surface formation energy. +In this part, the Al-Mg-Cu DP potential is used to illustrate how to automatically test DP potential of alloy materials. Each `auto-test` task includes three stages: +- `make` prepares all required calculation files and input scripts automatically; +- `run` can help submit calculation tasks to remote calculation plantforms and when calculation tasks are completed, will collect results automatically; +- `post` returns calculation results to local root automatically. + +### structure relaxation + +#### step1-`make` +Prepare the following files in a separate folder. +```sh +├── machine.json +├── relaxation.json +├── confs +│ ├── mp-3034 +``` +**IMPORTANT!** The ID number, mp-3034, is in the line with Material Project ID for Al-Mg-Cu. + +In order to harness the benefits of `pymatgen` combined with Material Project to generate files for calculation tasks by mp-ID automatically, you are supposed to add the API for Material Project in the `.bashrc`. + +You can do that easily by running this command. +```bash +vim .bashrc +// add this line into this file, `export MAPI_KEY="your-api-key-for-material-projects"` +``` +If you have no ideas about api-key for material projects, please refer to this [link](https://materialsproject.org/api#:~:text=API%20Key,-Your%20API%20Key&text=To%20make%20any%20request%20to,anyone%20you%20do%20not%20trust.). + +- machine.json is the same with the one used in `init` and `run`. For more information about it, please check this [link](https://bohrium-doc.dp.tech/#/docs/DP-GEN?id=步骤3:准备计算文件). +- relaxtion.json + +```json +{ + "structures": ["confs/mp-3034"],//in this folder, confs/mp-3034, required files and scripts will be generated automatically by `dpgen autotest make relaxation.json` + "interaction": { + "type": "deepmd", + "model": "graph.pb", + "in_lammps": "lammps_input/in.lammps", + "type_map": {"Mg":0,"Al": 1,"Cu":2} //if you calculate other materials, remember to modify element types here. + }, + "relaxation": { + "cal_setting":{"etol": 1e-12, + "ftol": 1e-6, + "maxiter": 5000, + "maximal": 500000, + "relax_shape": true, + "relax_vol": true} + } +} +``` + +Run this command, +```bash +dpgen autotest make relaxation.json +``` +and then corresponding files and scripts used for calculation will be generated automatically. + +#### step2-`run` +```bash +nohup dpgen autotest run relaxation.json machine.json & +``` +After running this command, structures will be relaxed. + +#### step3-`post` +```bash +dpgen autotest post relaxation.json +``` +### property calculation +#### step1-`make` +The parameters used for property calculations are in property.json. + +```json +{ + "structures": ["confs/mp-3034"], + "interaction": { + "type": "deepmd", + "model": "graph.pb", + "deepmd_version":"2.1.0", + "type_map": {"Mg":0,"Al": 1,"Cu":2} + }, + "properties": [ + { + "type": "eos", + "vol_start": 0.9, + "vol_end": 1.1, + "vol_step": 0.01 + }, + { + "type": "elastic", + "norm_deform": 2e-2, + "shear_deform": 5e-2 + }, + { + "type": "vacancy", + "supercell": [3, 3, 3], + "start_confs_path": "confs" + }, + { + "type": "interstitial", + "supercell": [3, 3, 3], + "insert_ele": ["Mg","Al","Cu"], + "conf_filters":{"min_dist": 1.5}, + "cal_setting": {"input_prop": "lammps_input/lammps_high"} + }, + { + "type": "surface", + "min_slab_size": 10, + "min_vacuum_size":11, + "max_miller": 2, + "cal_type": "static" + } + ] +} +``` +Run this command +```bash +dpgen autotest make property.json +``` +#### step2-`run` +Run this command +```bash +nohup dpgen autotest run property.json machine.json & +``` +#### step3-`post` +```bash +dpgen autotest post property.json +``` +In the folder, you can use the command `tree . -L 1` and then you can check results. + +``` +(base) ➜ mp-3034 tree . -L 1 +. +├── dpdispatcher.log +├── dpgen.log +├── elastic_00 +├── eos_00 +├── eos_00.bk000 +├── eos_00.bk001 +├── eos_00.bk002 +├── eos_00.bk003 +├── eos_00.bk004 +├── eos_00.bk005 +├── graph_new.pb +├── interstitial_00 +├── POSCAR +├── relaxation +├── surface_00 +└── vacancy_00 +``` + +- 01.eos: the equation of state; +```bash +(base) ➜ mp-3034 tree eos_00 -L 1 +eos_00 +├── 99c07439f6f14399e7785dc783ca5a9047e768a8_flag_if_job_task_fail +├── 99c07439f6f14399e7785dc783ca5a9047e768a8_job_tag_finished +├── 99c07439f6f14399e7785dc783ca5a9047e768a8.sub +├── backup +├── graph.pb -> ../../../graph.pb +├── result.json +├── result.out +├── run_1660558797.sh +├── task.000000 +├── task.000001 +├── task.000002 +├── task.000003 +├── task.000004 +├── task.000005 +├── task.000006 +├── task.000007 +├── task.000008 +├── task.000009 +├── task.000010 +├── task.000011 +├── task.000012 +├── task.000013 +├── task.000014 +├── task.000015 +├── task.000016 +├── task.000017 +├── task.000018 +├── task.000019 +└── tmp_log +``` + +The `EOS` calculation results are shown in `eos_00/results.out` file +```bash +(base) ➜ eos_00 cat result.out +conf_dir: /root/1/confs/mp-3034/eos_00 + VpA(A^3) EpA(eV) + 15.075 -3.2727 + 15.242 -3.2838 + 15.410 -3.2935 + 15.577 -3.3019 + 15.745 -3.3090 + 15.912 -3.3148 + 16.080 -3.3195 + 16.247 -3.3230 + 16.415 -3.3254 + 16.582 -3.3268 + 16.750 -3.3273 + 16.917 -3.3268 + 17.085 -3.3256 + 17.252 -3.3236 + 17.420 -3.3208 + 17.587 -3.3174 + 17.755 -3.3134 + 17.922 -3.3087 + 18.090 -3.3034 + 18.257 -3.2977 +``` +- 02.elastic: the elasticity like Young's module; +The `elastic` calculation results are shown in `elastic_00/results.out` file +```bash +(base) ➜ elastic_00 cat result.out +/root/1/confs/mp-3034/elastic_00 + 124.32 55.52 60.56 0.00 0.00 1.09 + 55.40 125.82 75.02 0.00 0.00 -0.17 + 60.41 75.04 132.07 0.00 0.00 7.51 + 0.00 0.00 0.00 53.17 8.44 0.00 + 0.00 0.00 0.00 8.34 37.17 0.00 + 1.06 -1.35 7.51 0.00 0.00 34.43 +# Bulk Modulus BV = 84.91 GPa +# Shear Modulus GV = 37.69 GPa +# Youngs Modulus EV = 98.51 GPa +# Poission Ratio uV = 0.31 +``` +- 03.vacancy: the vacancy formation energy; +The `vacancy` calculation results are shown in `vacancy_00/results.out` file +```bash +(base) ➜ vacancy_00 cat result.out +/root/1/confs/mp-3034/vacancy_00 +Structure: Vac_E(eV) E(eV) equi_E(eV) +[3, 3, 3]-task.000000: -10.489 -715.867 -705.378 +[3, 3, 3]-task.000001: 4.791 -713.896 -718.687 +[3, 3, 3]-task.000002: 4.623 -714.064 -718.687 +``` +- 04.interstitial: the interstitial formation energy; +The `interstitial` calculation results are shown in `interstitial_00/results.out` file +```bash +(base) ➜ vacancy_00 cat result.out +/root/1/confs/mp-3034/vacancy_00 +Structure: Vac_E(eV) E(eV) equi_E(eV) +[3, 3, 3]-task.000000: -10.489 -715.867 -705.378 +[3, 3, 3]-task.000001: 4.791 -713.896 -718.687 +[3, 3, 3]-task.000002: 4.623 -714.064 -718.687 +``` +- 05.surf: the surface formation energy. +The `surface` calculation results are shown in `surface_00/results.out` file +```bash +(base) ➜ surface_00 cat result.out +/root/1/confs/mp-3034/surface_00 +Miller_Indices: Surf_E(J/m^2) EpA(eV) equi_EpA(eV) +[1, 1, 1]-task.000000: 1.230 -3.102 -3.327 +[1, 1, 1]-task.000001: 1.148 -3.117 -3.327 +[2, 2, 1]-task.000002: 1.160 -3.120 -3.327 +[2, 2, 1]-task.000003: 1.118 -3.127 -3.327 +[1, 1, 0]-task.000004: 1.066 -3.138 -3.327 +[2, 1, 2]-task.000005: 1.223 -3.118 -3.327 +[2, 1, 2]-task.000006: 1.146 -3.131 -3.327 +[2, 1, 1]-task.000007: 1.204 -3.081 -3.327 +[2, 1, 1]-task.000008: 1.152 -3.092 -3.327 +[2, 1, 1]-task.000009: 1.144 -3.093 -3.327 +[2, 1, 1]-task.000010: 1.147 -3.093 -3.327 +[2, 1, 0]-task.000011: 1.114 -3.103 -3.327 +[2, 1, 0]-task.000012: 1.165 -3.093 -3.327 +[2, 1, 0]-task.000013: 1.137 -3.098 -3.327 +[2, 1, 0]-task.000014: 1.129 -3.100 -3.327 +[1, 0, 1]-task.000015: 1.262 -3.124 -3.327 +[1, 0, 1]-task.000016: 1.135 -3.144 -3.327 +[1, 0, 1]-task.000017: 1.113 -3.148 -3.327 +[1, 0, 1]-task.000018: 1.119 -3.147 -3.327 +[1, 0, 1]-task.000019: 1.193 -3.135 -3.327 +[2, 0, 1]-task.000020: 1.201 -3.089 -3.327 +[2, 0, 1]-task.000021: 1.189 -3.092 -3.327 +[2, 0, 1]-task.000022: 1.175 -3.094 -3.327 +[1, 0, 0]-task.000023: 1.180 -3.100 -3.327 +[1, 0, 0]-task.000024: 1.139 -3.108 -3.327 +[1, 0, 0]-task.000025: 1.278 -3.081 -3.327 +[1, 0, 0]-task.000026: 1.195 -3.097 -3.327 +[2, -1, 2]-task.000027: 1.201 -3.121 -3.327 +[2, -1, 2]-task.000028: 1.121 -3.135 -3.327 +[2, -1, 2]-task.000029: 1.048 -3.147 -3.327 +[2, -1, 2]-task.000030: 1.220 -3.118 -3.327 +[2, -1, 1]-task.000031: 1.047 -3.169 -3.327 +[2, -1, 1]-task.000032: 1.308 -3.130 -3.327 +[2, -1, 1]-task.000033: 1.042 -3.170 -3.327 +[2, -1, 0]-task.000034: 1.212 -3.154 -3.327 +[2, -1, 0]-task.000035: 1.137 -3.165 -3.327 +[2, -1, 0]-task.000036: 0.943 -3.192 -3.327 +[2, -1, 0]-task.000037: 1.278 -3.144 -3.327 +[1, -1, 1]-task.000038: 1.180 -3.118 -3.327 +[1, -1, 1]-task.000039: 1.252 -3.105 -3.327 +[1, -1, 1]-task.000040: 1.111 -3.130 -3.327 +[1, -1, 1]-task.000041: 1.032 -3.144 -3.327 +[1, -1, 1]-task.000042: 1.177 -3.118 -3.327 +[2, -2, 1]-task.000043: 1.130 -3.150 -3.327 +[2, -2, 1]-task.000044: 1.221 -3.135 -3.327 +[2, -2, 1]-task.000045: 1.001 -3.170 -3.327 +[1, -1, 0]-task.000046: 0.911 -3.191 -3.327 +[1, -1, 0]-task.000047: 1.062 -3.168 -3.327 +[1, -1, 0]-task.000048: 1.435 -3.112 -3.327 +[1, -1, 0]-task.000049: 1.233 -3.143 -3.327 +[1, 1, 2]-task.000050: 1.296 -3.066 -3.327 +[1, 1, 2]-task.000051: 1.146 -3.097 -3.327 +[1, 0, 2]-task.000052: 1.192 -3.085 -3.327 +[1, 0, 2]-task.000053: 1.363 -3.050 -3.327 +[1, 0, 2]-task.000054: 0.962 -3.132 -3.327 +[1, -1, 2]-task.000055: 1.288 -3.093 -3.327 +[1, -1, 2]-task.000056: 1.238 -3.102 -3.327 +[1, -1, 2]-task.000057: 1.129 -3.122 -3.327 +[1, -1, 2]-task.000058: 1.170 -3.115 -3.327 +[0, 0, 1]-task.000059: 1.205 -3.155 -3.327 +[0, 0, 1]-task.000060: 1.188 -3.158 -3.327 +``` + ## Summary Now, users have learned the basic usage of the DP-GEN. For further information, please refer to the recommended links. diff --git a/source/Tutorials/DeePMD-kit/learnDoc/DeePMD-kit(v2.0.3).md b/source/Tutorials/DeePMD-kit/learnDoc/DeePMD-kit(v2.0.3).md deleted file mode 100644 index 4313a4d..0000000 --- a/source/Tutorials/DeePMD-kit/learnDoc/DeePMD-kit(v2.0.3).md +++ /dev/null @@ -1,270 +0,0 @@ -# Hands-on tutorial for DeePMD-kit(v2.0.3) -This tutorial will introduce you to the basic usage of the DeePMD-kit, including data preparation, training/freezing/compressing, testing, and running molecular dynamics simulations with LAMMPS. -Typically the DeePMD-kit workflow contains three parts: data preparation, training/freezing/compressing/testing, and molecular dynamics. -The folder structure of this tutorial is like this: - - $ ls - 00.data 01.train 02.lmp - -where the folder 00.data contains the data, the folder 01.train contains an example input script to train a model with DeePMD-kit, and the folder 02.lmp contains LAMMPS example script for molecular dynamics simulation. - -## Data preparation -The training data of the DeePMD-kit contains the atom type, the simulation box, the atom coordinate, the atom force, the system energy, and the virial. A snapshot of a molecular system that has this information is called a frame. A system of data includes many frames that share the same number of atoms and atom types. For example, a molecular dynamics trajectory can be converted into a system of data, with each time step corresponding to a frame in the system. - -The DeePMD-kit adopts a compressed data format. All training data should first be converted into this format and can then be used by DeePMD-kit. The data format is explained in detail in the DeePMD-kit manual that can be found in the DeePMD-kit official Github site http://www.github.com/deepmodeling/deepmd-kit. - -We provide a convenient tool named dpdata for converting the data produced by VASP, Gaussian, Quantum-Espresso, ABACUS, and LAMMPS into the compressed format of DeePMD-kit. It needs to be noted that dpdata only works with python 3.x. - -As an example, go to the data folder: - - $ cd data - $ ls - OUTCAR - -The OUTCAR was produced by an ab-initio molecular dynamics (AIMD) simulation of a gas phase methane molecule using VASP. Now start an interactive python environment, for example - - $ python - -then execute the following commands: - - import dpdata - import numpy as np - sys = dpdata.LabeledSystem('OUTCAR', fmt = 'vasp/outcar') - print('# the system contains %d frames' % sys.get_nframes()) - sys.to_deepmd_npy('00.data', set_size = 40, prec = np.float32) - -The commands import a system of data from the OUTCAR (with format vasp/outcar ), and then dump it into the compressed format (numpy compressed arrays). The output folder 00.data. - - $ ls 00.data - set.000 set.001 set.002 set.003 set.004 type.raw type_map.raw - -The data system that has 200 frames is split into 5 sets, each of which has 40 frames. The parameter set_size specifies the set size. The parameter prec specifies the precision of the floating point number. - - $ cat 00.data/type.raw - H C - -Since all frames in the system have the same atom types and atom numbers, we only need to specify the type information once for the whole system - - $ cat 00.data/type_map.raw - 0 0 0 0 1 - -where atom H is given type 0, and atom C is given type 1. - -## Training -### Prepare input script -Once the data preparation is done, we can go on with training. Now go to the training directory - - $ cd ../01.train - $ ls - input.json - -where input.json gives you an example training script. The options are explained in detail in the DeePMD-kit manual, so they are not comprehensively explained. - -In the model section, the parameters of embedding and fitting networks are specified. - - "model":{ - "type_map": ["H", "C"], # the name of each type of atom - "descriptor":{ - "type": "se_e2_a", # full relative coordinates are used - "rcut": 6.00, # cut-off radius - "rcut_smth": 0.50, # where the smoothing starts - "sel": [4, 1], # the maximum number of type i atoms in the cut-off radius - "neuron": [10, 20, 40], # size of the embedding neural network - "resnet_dt": false, - "axis_neuron": 4, # the size of the submatrix of G (embedding matrix) - "seed": 1, - "_comment": "that's all" - }, - "fitting_net":{ - "neuron": [100, 100, 100], # size of the fitting neural network - "resnet_dt": true, - "seed": 1, - "_comment": "that's all" - }, - "_comment": "that's all"' - }, - -The se\_e2\_a descriptor is used to train the DP model. The item neurons set the size of the embedding and fitting network to [10, 20, 40] and [100, 100, 100], respectively. The components in $\tilde{\mathcal{R}}^{i}$ to smoothly go to zero from 0.5 to 6 Å. - -The following are the parameters that specify the learning rate and loss function. - - "learning_rate" :{ - "type": "exp", - "decay_steps": 5000, - "start_lr": 0.001, - "stop_lr": 3.51e-8, - "_comment": "that's all" - }, - "loss" :{ - "type": "ener", - "start_pref_e": 0.02, - "limit_pref_e": 1, - "start_pref_f": 1000, - "limit_pref_f": 1, - "start_pref_v": 0, - "limit_pref_v": 0, - "_comment": " that's all" - }, - -In the loss function, pref\_e increases from 0.02 to 1 $\mathrm{eV}^{-2}$, and pref\_f decreases from 1000 to 1 Å$^{2}$ $\mathrm{eV}^{-2}$ progressively, which means that the force term dominates at the beginning, while energy and virial terms become important at the end. This strategy is very effective and reduces the total training time. pref_v is set to 0 $\mathrm{eV}^{-2}$, indicating that no virial data are included in the training process. The starting learning rate, stop learning rate, and decay steps are set to 0.001, 3.51e-8, and 5000, respectively. -The training parameters are given in the following - - "training" : { - "training_data": { - "systems": ["../00.data/data_0/", # location of the training data - "../00.data/data_1/", - "../00.data/data_2/"], - "batch_size": "auto", - "_comment": "that's all" - }, - "validation_data":{ - "systems": ["../00.data/data_3"], - "batch_size": "auto", # automatically determined - "numb_btch": 1, - "_comment": "that's all" - }, - "numb_steps": 100000, # Number of training batch - "seed": 10, - "disp_file": "lcurve.out", - "disp_freq": 1000, - "save_freq": 10000, - }, - -We reshaped the structure of the data, splitting them into a training data and a validation data. The training data has 3 systems and validation data has 1 system. The model is trained for $10^6$ steps. - -### Train a model -After the training script is prepared, we can start the training with DeePMD-kit by simply running - - $ dp train input.json - -On the screen, you see the information of the data system(s) - - DEEPMD INFO ---------------------------------------------------------------------------------------------------- - DEEPMD INFO ---Summary of DataSystem: training ------------------------------------------------------------- - DEEPMD INFO found 3 system(s): - DEEPMD INFO system natoms bch_sz n_bch prob pbc - DEEPMD INFO ../00.data/data_0/ 5 7 5 0.250 T - DEEPMD INFO ../00.data/data_1/ 5 7 10 0.500 T - DEEPMD INFO ../00.data/data_2/ 5 7 5 0.250 T - DEEPMD INFO ----------------------------------------------------------------------------------------------------- - DEEPMD INFO ---Summary of DataSystem: validation -------------------------------------------------------------- - DEEPMD INFO found 1 system(s): - DEEPMD INFO system natoms bch_sz n_bch prob pbc - DEEPMD INFO ../00.data/data_3/ 5 7 5 1.000 T - -and the starting and final learning rate of this training - - DEEPMD INFO start training at lr 1.00e-03 (== 1.00e-03), decay_step 5000, decay_rate 0.950006, final lr will be 3.51e-08 - -If everything works fine, you will see, on the screen, information printed every 1000 steps, like - - DEEPMD INFO batch 1000 training time 7.61 s, testing time 0.01 s - DEEPMD INFO batch 2000 training time 6.46 s, testing time 0.01 s - DEEPMD INFO batch 3000 training time 6.50 s, testing time 0.01 s - DEEPMD INFO batch 4000 training time 6.44 s, testing time 0.01 s - DEEPMD INFO batch 5000 training time 6.49 s, testing time 0.01 s - DEEPMD INFO batch 6000 training time 6.46 s, testing time 0.01 s - DEEPMD INFO batch 7000 training time 6.24 s, testing time 0.01 s - DEEPMD INFO batch 8000 training time 6.39 s, testing time 0.01 s - DEEPMD INFO batch 9000 training time 6.72 s, testing time 0.01 s - DEEPMD INFO batch 10000 training time 6.41 s, testing time 0.01 s - DEEPMD INFO saved checkpoint model.ckpt - -They present the training and testing time counts. At the end of the 10000th batch, the model is saved in Tensorflow's checkpoint file model.ckpt. At the same time, the training and testing errors are presented in file lcurve.out. - - $ head -n 2 lcurve.out - #step rmse_val rmse_trn rmse_e_val rmse_e_trn rmse_f_val rmse_f_trn lr - 0 1.34e+01 1.47e+01 7.05e-01 7.05e-01 4.22e-01 4.65e-01 1.00e-03 - -and - - $ tail -n 2 lcurve.out - 999000 1.24e-01 1.12e-01 5.93e-04 8.15e-04 1.22e-01 1.10e-01 3.7e-08 - 1000000 1.31e-01 1.04e-01 3.52e-04 7.74e-04 1.29e-01 1.02e-01 3.5e-08 - -Volumes 4, 5 and 6, 7 present energy and force training and testing errors, respectively. It is demonstrated that after 140,000 steps of training, the energy testing error is less than 1 meV and the force testing error is around 120 meV/Å. It is also observed that the force testing error is systematically (but slightly) larger than the training error, which implies a slight over-fitting to the rather small dataset. - -When the training process is stopped abnormally, we can restart the training from the provided checkpoint by simply running - - $ dp train --restart model.ckpt input.json - -In the lcurve.out, you can see the training and testing errors, like - - 538000 3.12e-01 2.16e-01 6.84e-04 7.52e-04 1.38e-01 9.52e-02 4.1e-06 - 538000 3.12e-01 2.16e-01 6.84e-04 7.52e-04 1.38e-01 9.52e-02 4.1e-06 - 539000 3.37e-01 2.61e-01 7.08e-04 3.38e-04 1.49e-01 1.15e-01 4.1e-06 - #step rmse_val rmse_trn rmse_e_val rmse_e_trn rmse_f_val rmse_f_trn lr - 530000 2.89e-01 2.15e-01 6.36e-04 5.18e-04 1.25e-01 9.31e-02 4.4e-06 - 531000 3.46e-01 3.26e-01 4.62e-04 6.73e-04 1.49e-01 1.41e-01 4.4e-06 - -Note that input.json needs to be consistent with the previous one. - -### Freeze and Compress a model -At the end of the training, the model parameters saved in TensorFlow's checkpoint file should be frozen as a model file that is usually ended with extension .pb. Simply execute - - $ dp freeze -o graph.pb - DEEPMD INFO Restoring parameters from ./model.ckpt-1000000 - DEEPMD INFO 1264 ops in the final graph - -and it will output a model file named graph.pb in the current directory. The graph.pb can be compressed in the following way: - - $ dp compress -i graph.pb -o graph-compress.pb - DEEPMD INFO stage 1: compress the model - DEEPMD INFO built lr - DEEPMD INFO built network - DEEPMD INFO built training - DEEPMD INFO initialize model from scratch - DEEPMD INFO finished compressing - DEEPMD INFO - DEEPMD INFO stage 2: freeze the model - DEEPMD INFO Restoring parameters from model-compression/model.ckpt - DEEPMD INFO 840 ops in the final graph - -and it will output a model file named graph-compress.pb. - -### Test a model -We can check the quality of the trained model by running - - $ dp test -m graph-compress.pb -s ../00.data/data_3 -n 40 - -On the screen you see the information of the prediction errors of validation data. - - DEEPMD INFO # number of test data : 40 - DEEPMD INFO Energy RMSE : 3.168050e-03 eV - DEEPMD INFO Energy RMSE/Natoms : 6.336099e-04 eV - DEEPMD INFO Force RMSE : 1.267645e-01 eV/A - DEEPMD INFO Virial RMSE : 2.494163e-01 eV - DEEPMD INFO Virial RMSE/Natoms : 4.988326e-02 eV - DEEPMD INFO # ----------------------------------------------- - -## Run MD with LAMMPS - -Now let's switch to the lammps directory to check the necessary input files for running DeePMD with LAMMPS. - - $ cd ../02.lmp - -Firstly, we soft-link the output model in the training directory to the current directory - - $ ln -s ../01.train/graph-compress.pb - -Then we have three files - - $ ls - conf.lmp graph-compress.pb in.lammps - -where conf.lmp gives the initial configuration of a gas phase methane MD simulation, and the file in.lammps is the lammps input script. One may check in.lammps and finds that it is a rather standard LAMMPS input file for a MD simulation, with only two exception lines: - - pair_style graph-compress.pb - pair_coeff * * - -where the pair style deepmd is invoked and the model file graph-compress.pb is provided, which means the atomic interaction will be computed by the DP model that is stored in the file graph-compress.pb. - -One may execute lammps in the standard way - - $ lmp -i in.lammps - -After waiting for a while, the MD simulation finishes, and the log.lammps and ch4.dump files are generated. They store thermodynamic information and the trajectory of the molecule, respectively. One may want to visualize the trajectory by, e.g. ovito - - $ ovito ch4.dump - -to check the evolution of the molecular configuration.