Skip to content

supNMT pre-train problem with multi gpus #177

@Andrewlesson

Description

@Andrewlesson

pre-train script from sup-nmt only run in single gpu. when i use multi gpus to pre-train supNMT, i get some problem below. Has anyone encountered the same situation?

Traceback (most recent call last):
File "/search/odin/txguo/anaconda3/envs/mass/bin/fairseq-train", line 8, in
sys.exit(cli_main())
File "/search/odin/txguo/anaconda3/envs/mass/lib/python3.6/site-packages/fairseq_cli/train.py", line 298, in cli_main
nprocs=args.distributed_world_size,
File "/search/odin/txguo/anaconda3/envs/mass/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 167, in spawn
while not spawn_context.join():
File "/search/odin/txguo/anaconda3/envs/mass/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 103, in join
(error_index, name)
Exception: process 0 terminated with signal SIGKILL

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions