Skip to content

[Bug] Non-streaming mode causes backend disconnection after ~2 minutes of long inference #8

@shaoyu12138

Description

@shaoyu12138

In non-streaming mode, if model inference takes longer than about 2 minutes, the backend connection is interrupted and returns a 500 error. This issue does not occur in streaming mode. Could you consider adding a periodic data exchange (e.g., keep-alive) mechanism to prevent long-connection timeouts?

Also, I am a developer from China, here solely for fair and respectful technical collaboration.
While some business users from China may focus only on profit, many of us are genuine engineers who value open-source contribution and technical exchange. Please do not let the actions of a few create bias against Chinese developers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions