CREST Adapter#807
Conversation
d6a83b4 to
c231a43
Compare
3e88c36 to
7674f5f
Compare
9c25f3d to
51368be
Compare
9be935e to
eab7647
Compare
KinBot is no longer installed into arc_env. The installer now creates a dedicated kinbot_env conda environment (Python 3.12) and installs current upstream KinBot (2.3.0, which requires Python >= 3.11) via pip, as recommended upstream, along with PyYAML for ARC's worker script IO.
The worker script runs inside kinbot_env and imports only KinBot, PyYAML, and the standard library. It reads a YAML input file (wells with SMILES, KinBot-format structures, charge, multiplicity, and families), reproduces the TS guess generation logic previously inlined in kinbot_ts.py (ReactionFinder/ReactionGenerator + get_constraints + modify_coordinates), and writes the resulting guesses to a YAML output file, mirroring gcn_script.py's conventions. Adaptations to KinBot 2.3.0: the 'dimer' and 'homolytic_scissions' parameters no longer exist and are not set (an explicit families list is always passed, so homolytic scissions cannot be triggered). KINBOT_PYTHON resolves the kinbot_env python analogously to TS_GCN_PYTHON.
The KinBot TS search adapter no longer imports kinbot into arc_env. It now writes a YAML input file describing the unimolecular wells, invokes arc/job/adapters/scripts/kinbot_script.py inside kinbot_env via run_in_conda_env (the same pattern as the GCN and AutoTST adapters), and parses the worker's YAML output into TSGuess objects exactly as before (collision check, deduplication, save_geo, method indexing). Availability is now determined by the existence of the KINBOT_PYTHON executable rather than by importing kinbot into arc_env.
The tests no longer require KinBot in arc_env (they were previously skipped silently unless kinbot was importable). run_in_conda_env is mocked to assert the input file contents and to emulate the worker's output file, covering: the happy path, a crashed subprocess, result parsing (duplicates merged, failures kept as unsuccessful guesses, unknown directions skipped), the family map, and the missing KINBOT_PYTHON availability check. Project directory cleanup uses addCleanup.
Since KinBot 2.2, Parameters validates its values at construction time and exits unless a barrier threshold is set, so the worker now supplies its parameter overrides (including a nominal barrier_threshold, unused in ARC's flow since no KinBot QC jobs are spawned) through a temporary JSON input file instead of mutating params.par after construction. Verified end-to-end against a locally built kinbot_env (KinBot 2.3.0): the adapter produces the same TS guesses through the subprocess boundary as the previous in-arc_env integration did. Also point the TS search docs at the kinbot_env environment.
The GCN TS-search integration was broken end to end: - gcn_ts.py gated on a module-level 'from inference import inference' (HAS_GCN), which can never succeed inside arc_env; the flag was also never consumed. Remove it: availability is now determined by the subprocess boundary being usable (gcn_available(): TS_GCN_PYTHON resolves to an existing interpreter and gcn_script.py exists), and execute_gcn() logs a clear warning and skips GCN TS guesses instead of crashing the scheduler when the ts_gcn env is missing (previously run_in_conda_env(None, ...) raised a TypeError). - run_subprocess_locally() passed the reactant SDF to --p_sdf_path and the product SDF to --r_sdf_path (inverting both search directions), and its failure log always claimed the reverse direction. - gcn_script.py only accepted --yml_in_path, so every incore invocation exited 2 with an argparse error. It now supports both the direct per-direction mode (--r_sdf_path/--p_sdf_path/--ts_xyz_path) used by incore execution and the YAML batch mode used by queue execution, defers the TS-GCN 'inference' import to call time with a sys.path fallback ($TSGCN_ROOT, then the TS-GCN clone next to ARC) since 'conda run' sources neither ~/.bashrc nor login shells, and replaces a bare except that swallowed inference failures while exiting 0. - The queue submit templates activated arc_env (instead of ts_gcn) and invoked arc/job/adapters/ts/scripts/gcn_runner.py, a path that no longer exists since the scripts folder relocation.
- install_gcn.sh: the conda activation hook was written with a quoted heredoc delimiter, so $repo was never expanded and TSGCN_ROOT ended up empty at activation time; the PYTHONPATH case pattern also lacked a trailing wildcard. Expand install-time values, escape runtime ones. - gcn_environment.yml (used by install_gcn_cpu.sh) contained no torch / torch-geometric at all, so the ts_gcn env it created could not run TS-GCN inference. Pin python=3.8 and add the same CPU wheel pins install_gcn.sh already uses.
Mock run_in_conda_env at the env boundary and assert the exact argv (interpreter, gcn_script.py, --r_sdf_path/--p_sdf_path/--ts_xyz_path per direction), the SDF input files written, and the parsing of a fake TS xyz output into TSGuess objects. Cover the failing-returncode path and the graceful skip when the ts_gcn env is unavailable (TS_GCN_PYTHON=None must not spawn a subprocess or raise). Add plumbing tests for the standalone gcn_script.py (CLI modes, batch YAML output, informative ImportError, no arc imports). Use per-test directories with addCleanup since the suite runs under pytest-xdist (worksteal), where class-scoped shared directories race across workers.
Step-semantics fix: previously (both in this integration and in the pre-existing in-arc_env one) get_constraints() was called once with step=20, which exceeds every KinBot family's max_step, so the 'change' constraint list was always empty and the returned TS guesses were unmodified well geometries. The worker now drives each reaction instance exactly like KinBot's reac_family.carry_out_reaction() does between the constrained-optimization jobs it submits: step from 0 (or 12 for the short-instance skip shortcut) through max_step - 1, applying each step's 'change' constraints (including the 'L'-prefixed entry conversion) to the evolving geometry via modify_coordinates. There is no step magic number anymore; the 'step' input key and KINBOT_CONSTRAINT_STEP were removed. Verified end-to-end against the real kinbot_env: CC[O] <=> [CH2]CO now yields 10 distinct TS guesses displaced 0.5-3.7 A from their wells (all previously equaled the well geometries). A kinbot_env-gated regression test asserts the guess geometries differ from the wells. Test project directories are now unique per test with per-directory cleanup, so parallel pytest-xdist workers cannot delete each other's directories. UMA refinement (opt-in, off by default): the worker can refine each successful template guess with a Sella saddle-point search on Meta's UMA machine-learned potential, mirroring KinBot 2.3.0's ase_fc_ts_end template. fairchem is lazily imported inside the refinement function only, and any refinement failure (missing fairchem, gated/missing checkpoint, non-convergence) logs the reason and falls back to the unrefined guess. Controlled by the new 'kinbot_uma_settings' dict in settings.py, passed through input.yml; refined guesses are labeled method 'KinBot-UMA' (still containing 'kinbot', keeping dedup and success-count logic intact) and output entries carry an explicit 'uma_refined' flag for benchmarking.
install_kinbot.sh --uma installs KinBot with its 'fc' pip extra (fairchem-core, for the opt-in UMA refinement of TS guesses) and sanity-checks the fairchem import. The script header documents that the UMA checkpoints themselves are license-gated on HuggingFace (they require a HuggingFace login and accepting Meta's UMA license, or a locally downloaded checkpoint file), which is why the default install stays lean.
…pters TSGuess now records the level of theory at which the guess-generating adapter ran its electronic structure calculations, as a plain YAML-safe dict that round-trips through as_dict()/from_dict(): - orca_neb: the adapter's NEB level (from job args or orca_neb_settings) - xtb_gsm: the job level if set, else GFN2-xTB (the ograd script runs plain 'xtb --grad', i.e., the xtb default) - crest: GFN2-xTB (CREST is run without an explicit method flag) Pure ML/template adapters (heuristics, GCN, AutoTST, KinBot, linear) leave it None. Adds a plain_level_dict() helper to arc.level for the shared Level-to-plain-dict serialization.
….yml
The Scheduler now appends a lightweight record (job name, type, adapter,
server, cpu cores, run time in seconds, status) for every job that
reaches end_job. The records are persisted in the restart file
('completed_job_records') so restarted runs keep their cost history,
and are handed to write_output_yml at the end of the run.
Note: pipe-mode tasks bypass the JobAdapter path and carry no per-task
timing, so they are not individually recorded (only jobs ejected back
to the Scheduler are).
Each transition state entry now carries a 'ts_guesses' list with one
record per TSGuess: method, method_sources (equivalent-guess clusters),
method_index/direction, success, relative energy (kJ/mol), execution
time (seconds), the guess-generation level of theory, the run-relative
guess log path, and whether the guess was chosen for optimization.
A run-level 'cost_metrics' section aggregates the Scheduler's per-job
records: total job count, per-ESS {job_count, summed execution time,
core-hours}, plus explicit wall_time_hrs. Jobs with missing run time or
core count are counted (jobs_missing_time / jobs_missing_cores) rather
than silently dropped, so benchmark analyses know the data coverage.
Documents the new fields in docs/output_yml_schema.md.
Freq calculations now carry a HessianPayload so TCKDB's calculation_hessian table is populated on upload. - arc/parser/adapters/gaussian.py, orca.py: parse_cartesian_hessian_lower_triangle() returns the packed lower triangle (with diagonal, row-major) in NATIVE Hartree/Bohr^2 — the same blocks Arkane reads (Gaussian "Force constants in Cartesian coordinates:" needs iop(7/33=1), already set on ARC freq jobs; Orca sibling input.hess $hessian block), minus Arkane's J/m^2 conversion. Returns None (never raises) when absent / monatomic / malformed. - arc/tckdb/adapter.py: attaches hessian to the freq calculation, geometry-bound to the freq input geometry, source=parsed_log/parsed_hess; wrapped so it can never break payload construction. - Tests: parser fixtures (Gaussian NH/CHO, new Orca H2O .hess), triangle-length invariant, native-unit sanity, absent-path, and HessianPayload validation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| from arc.common import ARC_TESTING_PATH | ||
| from arc.job.adapters.ts.kinbot_ts import KinBotAdapter, HAS_KINBOT | ||
| from arc.common import ARC_TESTING_PATH, read_yaml_file, save_yaml_file | ||
| import arc.job.adapters.ts.kinbot_ts as kinbot_ts |
Check notice
Code scanning / CodeQL
Module is imported with 'import' and 'import from' Note
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 1 day ago
Use only the module import and stop using from ... import KinBotAdapter.
Per the recommendation, keep:
import arc.job.adapters.ts.kinbot_ts as kinbot_ts
Then replace direct KinBotAdapter references with kinbot_ts.KinBotAdapter:
- In the return type hint of
get_adapter - In the constructor call inside
get_adapter
This keeps behavior unchanged and removes the duplicate import style in arc/job/adapters/ts/kinbot_test.py.
| @@ -14,7 +14,6 @@ | ||
|
|
||
| from arc.common import ARC_TESTING_PATH, read_yaml_file, save_yaml_file | ||
| import arc.job.adapters.ts.kinbot_ts as kinbot_ts | ||
| from arc.job.adapters.ts.kinbot_ts import KinBotAdapter | ||
| from arc.reaction import ARCReaction | ||
| from arc.species import ARCSpecies | ||
|
|
||
| @@ -58,16 +57,16 @@ | ||
| except OSError: | ||
| pass | ||
|
|
||
| def get_adapter(self, dir_name: str) -> KinBotAdapter: | ||
| def get_adapter(self, dir_name: str) -> kinbot_ts.KinBotAdapter: | ||
| """A helper function to instantiate a KinBotAdapter instance.""" | ||
| project_directory = os.path.join(self.project_dir, dir_name) | ||
| self.addCleanup(self._remove_test_dir, project_directory) | ||
| return KinBotAdapter(job_type='tsg', | ||
| reactions=[self.rxn_1], | ||
| testing=True, | ||
| project='test', | ||
| project_directory=project_directory, | ||
| ) | ||
| return kinbot_ts.KinBotAdapter(job_type='tsg', | ||
| reactions=[self.rxn_1], | ||
| testing=True, | ||
| project='test', | ||
| project_directory=project_directory, | ||
| ) | ||
|
|
||
| def test_family_map(self): | ||
| """Test that the KinBot family map supports the expected RMG families.""" |
| shutil.rmtree(path, ignore_errors=True) | ||
| try: | ||
| os.rmdir(self.project_dir) | ||
| except OSError: |
Check notice
Code scanning / CodeQL
Empty except Note
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 1 day ago
Use explicit, documented handling for expected cleanup races instead of an empty except.
Best fix here (without changing functionality): in arc/job/adapters/ts/kinbot_test.py, update _remove_test_dir so that:
- It catches
OSError as e. - It ignores only expected benign cases (
errno.ENOTEMPTY,errno.ENOENT), which correspond to “parent still has content” and “already removed”. - It re-raises unexpected errors.
- Add
import errnoat the top.
This preserves current cleanup semantics while improving debuggability and satisfying the static analysis rule.
| @@ -5,6 +5,7 @@ | ||
| This module contains unit tests of the arc.job.adapters.ts.kinbot_ts module | ||
| """ | ||
|
|
||
| import errno | ||
| import math | ||
| import os | ||
| import shutil | ||
| @@ -55,8 +56,11 @@ | ||
| shutil.rmtree(path, ignore_errors=True) | ||
| try: | ||
| os.rmdir(self.project_dir) | ||
| except OSError: | ||
| pass | ||
| except OSError as e: | ||
| # Benign during parallel tests: parent may still contain other test dirs | ||
| # or may already have been removed by another worker. | ||
| if e.errno not in (errno.ENOTEMPTY, errno.ENOENT): | ||
| raise | ||
|
|
||
| def get_adapter(self, dir_name: str) -> KinBotAdapter: | ||
| """A helper function to instantiate a KinBotAdapter instance.""" |
| # either a local UMA checkpoint ('model_path') or a HuggingFace login with | ||
| # access to the (license-gated) UMA models for downloading 'model_name'. | ||
| # On any refinement failure ARC falls back to the unrefined template guess. | ||
| kinbot_uma_settings = { |
Check notice
Code scanning / CodeQL
Unused global variable Note
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 1 day ago
To fix this without changing functionality, explicitly declare kinbot_uma_settings as part of the module’s public API via __all__. This preserves the variable name and semantics, keeps the settings available for imports/overrides, and documents intent clearly.
Best single change in arc/settings/settings.py:
- Add or update a module-level
__all__definition to include'kinbot_uma_settings'. - Since we can only edit shown regions and no existing
__all__is shown, add a minimal__all__right after thekinbot_uma_settingsblock (before subsequent constants).
No new imports, methods, or dependencies are needed.
| @@ -356,6 +356,8 @@ | ||
| 'steps': 250, # maximal number of Sella steps | ||
| } | ||
|
|
||
| __all__ = ['kinbot_uma_settings'] | ||
|
|
||
| # ARC families folder path | ||
| ARC_FAMILIES_PATH = os.path.join(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))), 'data', 'families') | ||
|
|
| OB_PYTHON = find_executable('ob_env') | ||
| TS_GCN_PYTHON = find_executable('ts_gcn') | ||
| AUTOTST_PYTHON = find_executable('tst_env') | ||
| KINBOT_PYTHON = find_executable('kinbot_env') |
Check notice
Code scanning / CodeQL
Unused global variable Note
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 1 day ago
To fix this, keep the right-hand-side call (so executable discovery behavior remains unchanged), but rename only the unused global to indicate intentional non-use.
Best minimal fix in arc/settings/settings.py:
- In the module-level default tuple, rename
KINBOT_PYTHONto_unused_KINBOT_PYTHON. - In the later assignment, rename
KINBOT_PYTHON = find_executable('kinbot_env')to_unused_KINBOT_PYTHON = find_executable('kinbot_env').
This follows CodeQL’s accepted naming convention for intentionally unused globals and avoids altering functionality.
| @@ -360,7 +360,7 @@ | ||
| ARC_FAMILIES_PATH = os.path.join(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))), 'data', 'families') | ||
|
|
||
| # Default environment names for sister repos | ||
| TS_GCN_PYTHON, TANI_PYTHON, AUTOTST_PYTHON, KINBOT_PYTHON, ARC_PYTHON, XTB, XTB_PYTHON, OB_PYTHON, RMG_PYTHON, RMG_PATH, RMG_DB_PATH = \ | ||
| TS_GCN_PYTHON, TANI_PYTHON, AUTOTST_PYTHON, _unused_KINBOT_PYTHON, ARC_PYTHON, XTB, XTB_PYTHON, OB_PYTHON, RMG_PYTHON, RMG_PATH, RMG_DB_PATH = \ | ||
| None, None, None, None, None, None, None, None, None, None, None | ||
|
|
||
| home = os.getenv("HOME") or os.path.expanduser("~") | ||
| @@ -402,7 +402,7 @@ | ||
| OB_PYTHON = find_executable('ob_env') | ||
| TS_GCN_PYTHON = find_executable('ts_gcn') | ||
| AUTOTST_PYTHON = find_executable('tst_env') | ||
| KINBOT_PYTHON = find_executable('kinbot_env') | ||
| _unused_KINBOT_PYTHON = find_executable('kinbot_env') | ||
| ARC_PYTHON = find_executable('arc_env') | ||
| XTB_PYTHON = find_executable('xtb_env') | ||
| RMG_ENV_NAME = 'rmg_env' |
A transition state's statmech harmonic_frequencies_cm1 lists only the real modes; the imaginary (reaction-coordinate) frequency is carried separately in imag_freq_cm1. The adapter set is_imaginary = freq < 0 over the real-only list, so it emitted n_imag=1 with zero imaginary modes — which TCKDB's FreqResultPayload validator (count(is_imaginary) must equal n_imag) rejects. Re-insert imag_freq_cm1 as a negative imaginary mode so modes are internally consistent with n_imag; if they can't be reconciled (e.g. a higher-order saddle with a single stored imag_freq_cm1), omit modes rather than emit a payload the backend rejects. Surfaced by an end-to-end upload of a real ARC project against a live TCKDB instance. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Addition of CREST Adapter that complements the heuristic adapter.
This pull request adds support for the CREST conformer and transition state (TS) search method to the ARC project, along with several related improvements and code cleanups. The most important changes include integrating CREST as a TS search adapter, updating configuration and constants, and enhancing the heuristics TS search logic for better provenance tracking and code clarity.
CREST Integration:
JobEnum(arc/job/adapter.py), included CREST in the list of adapters and RMG family mapping, and registered it as a default incore adapter (arc/job/adapters/common.py,arc/job/adapters/ts/__init__.py). [1] [2] [3] [4]arc/job/adapters/ts/crest_test.py).Makefile). [1] [2]Constants and Configuration:
angstrom_to_bohrconversion constant to both Cython and Python constants modules (arc/constants.pxd,arc/constants.py). [1] [2] [3]Heuristics TS Search Enhancements and Refactoring:
arc/job/adapters/ts/heuristics.py). [1] [2] [3] [4]arc/job/adapters/ts/heuristics.py). [1] [2] [3] [4] [5] [6] [7].