modelscope的notebook中运行示例程序报错Engine core initialization failed

我在modelscope的notebook中运行示例程序,vllm初始化报错：
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}。
我的notebook实例的配置为GPU环境，8核32G，NVIDIA A10显存24GB。vLLM version: 0.11.0。

我的代码：
try:
    llm = LLM(
        model=model_path,
        tokenizer=model_path,
        limit_mm_per_prompt={"image": 1},
        gpu_memory_utilization=0.8,
        trust_remote_code=True,
        # max_model_len=4096,
        # max_num_batched_tokens=4096,
        max_num_seqs=1,
        skip_tokenizer_init=False,
        enable_prefix_caching=False,
        enforce_eager=True,
        dtype="bfloat16",
    )
except Exception as e:
    print("❌ LLM 初始化失败:", str(e))
    raise

报错如下：
(EngineCore_DP0 pid=1987) INFO 11-14 18:00:45 [gpu_model_runner.py:2602] Starting to load model /mnt/workspace/.cache/modelscope/models/tongyi_dianjin/DianJin-OCR-R1/seal_sft...
(EngineCore_DP0 pid=1987) INFO 11-14 18:00:46 [gpu_model_runner.py:2634] Loading model from scratch...
(EngineCore_DP0 pid=1987) INFO 11-14 18:00:46 [__init__.py:381] Cudagraph is disabled under eager mode
(EngineCore_DP0 pid=1987) INFO 11-14 18:00:46 [cuda.py:366] Using Flash Attention backend on V1 engine.
Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:59<02:57, 59.22s/it]
Loading safetensors checkpoint shards:  50% Completed | 2/4 [01:55<01:55, 57.73s/it]
Loading safetensors checkpoint shards:  75% Completed | 3/4 [02:51<00:56, 56.83s/it]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [03:10<00:00, 41.80s/it]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [03:10<00:00, 47.61s/it]
(EngineCore_DP0 pid=1987) 
(EngineCore_DP0 pid=1987) INFO 11-14 18:03:56 [default_loader.py:267] Loading weights took 190.45 seconds
(EngineCore_DP0 pid=1987) INFO 11-14 18:03:57 [gpu_model_runner.py:2653] Model loading took 15.6269 GiB and 190.719423 seconds
(EngineCore_DP0 pid=1987) INFO 11-14 18:03:57 [gpu_model_runner.py:3344] Encoder cache will be initialized with a budget of 114688 tokens, and profiled with 1 video items of the maximum feature size.
❌ LLM 初始化失败: Engine core initialization failed. See root cause above. Failed core proc(s): {}
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

看不到更详细的错误信息。


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

modelscope的notebook中运行示例程序报错Engine core initialization failed #26

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

modelscope的notebook中运行示例程序报错Engine core initialization failed #26

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions