Skip to content

docker run --gpus 'all' 报错,多卡不支持吗 #53

@guiniao

Description

@guiniao

A40双卡服务器,使用GPU部署服务时
docker run --gpus 'all' --shm-size 1g -p 9090:80 -v $HOME/models:/data
--env LOG_LEVEL="info,text_generation_router=debug"
ghcr.nju.edu.cn/huggingface/text-generation-inference:1.0.3
--model-id /data/CodeShell-7B-Chat --num-shard 1
--max-total-tokens 5000 --max-input-length 4096
--max-stop-sequences 12 --trust-remote-code

报错如下:
024-01-19T08:15:44.995533Z ERROR warmup{max_input_length=4096 max_prefill_tokens=4096}:warmup: text_generation_client: router/client/src/lib.rs:33: Server error: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
Error: Warmup(Generation("Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!"))
2024-01-19T08:15:45.052858Z ERROR text_generation_launcher: Webserver Crashed
2024-01-19T08:15:45.052873Z INFO text_generation_launcher: Shutting down shards
2024-01-19T08:15:45.395141Z INFO shard-manager: text_generation_launcher: Shard terminated rank=0
Error: WebserverFailed

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions