Skip to content

Performance regression between 0.7.4 and 0.9.0 #1479

@jeffmaury

Description

@jeffmaury

Issue Description

While working on an Podman AI Lab workflow to automate the update of ramalama version used in Podman AI Lab, I noticed, when switching from 0.7.4 to 0.9.0 a big performance drop in my validation tests:

Here are the logs from the server in the 2 configurations:

0.7.4:

prompt eval time = 3058.24 ms / 64 tokens ( 47.78 ms per token, 20.93 tokens per second)
eval time = 83033.60 ms / 455 tokens ( 182.49 ms per token, 5.48 tokens per second)
total time = 86091.84 ms / 519 tokens

0.9.0:
prompt eval time = 121635.23 ms / 64 tokens ( 1900.55 ms per token, 0.53 tokens per second)
eval time = 15152.22 ms / 93 tokens ( 162.93 ms per token, 6.14 tokens per second)
total time = 136787.44 ms / 157 tokens
srv update_slots: all slots are idle

Steps to reproduce the issue

Need Podman AI Lab dev and modifying the image reference in the code

Describe the results you received

See above

Describe the results you expected

Similar performance numbers

ramalama info output

N/A

Upstream Latest Release

Yes

Additional environment details

No response

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions