Issue Description
While working on an Podman AI Lab workflow to automate the update of ramalama version used in Podman AI Lab, I noticed, when switching from 0.7.4 to 0.9.0 a big performance drop in my validation tests:
Here are the logs from the server in the 2 configurations:
0.7.4:
prompt eval time = 3058.24 ms / 64 tokens ( 47.78 ms per token, 20.93 tokens per second)
eval time = 83033.60 ms / 455 tokens ( 182.49 ms per token, 5.48 tokens per second)
total time = 86091.84 ms / 519 tokens
0.9.0:
prompt eval time = 121635.23 ms / 64 tokens ( 1900.55 ms per token, 0.53 tokens per second)
eval time = 15152.22 ms / 93 tokens ( 162.93 ms per token, 6.14 tokens per second)
total time = 136787.44 ms / 157 tokens
srv update_slots: all slots are idle
Steps to reproduce the issue
Need Podman AI Lab dev and modifying the image reference in the code
Describe the results you received
See above
Describe the results you expected
Similar performance numbers
ramalama info output
Upstream Latest Release
Yes
Additional environment details
No response
Additional information
No response
Issue Description
While working on an Podman AI Lab workflow to automate the update of ramalama version used in Podman AI Lab, I noticed, when switching from 0.7.4 to 0.9.0 a big performance drop in my validation tests:
Here are the logs from the server in the 2 configurations:
0.7.4:
prompt eval time = 3058.24 ms / 64 tokens ( 47.78 ms per token, 20.93 tokens per second)
eval time = 83033.60 ms / 455 tokens ( 182.49 ms per token, 5.48 tokens per second)
total time = 86091.84 ms / 519 tokens
0.9.0:
prompt eval time = 121635.23 ms / 64 tokens ( 1900.55 ms per token, 0.53 tokens per second)
eval time = 15152.22 ms / 93 tokens ( 162.93 ms per token, 6.14 tokens per second)
total time = 136787.44 ms / 157 tokens
srv update_slots: all slots are idle
Steps to reproduce the issue
Need Podman AI Lab dev and modifying the image reference in the code
Describe the results you received
See above
Describe the results you expected
Similar performance numbers
ramalama info output
N/AUpstream Latest Release
Yes
Additional environment details
No response
Additional information
No response