[Bug] Non-streaming mode causes backend disconnection after ~2 minutes of long inference

In non-streaming mode, if model inference takes longer than about 2 minutes, the backend connection is interrupted and returns a 500 error. This issue does not occur in streaming mode. Could you consider adding a periodic data exchange (e.g., keep-alive) mechanism to prevent long-connection timeouts?

Also, I am a developer from China, here solely for fair and respectful technical collaboration.
While some business users from China may focus only on profit, many of us are genuine engineers who value open-source contribution and technical exchange. Please do not let the actions of a few create bias against Chinese developers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Non-streaming mode causes backend disconnection after ~2 minutes of long inference #8

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Bug] Non-streaming mode causes backend disconnection after ~2 minutes of long inference #8

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions