Skip to content

Streaming returns the full text in the last chunk. #618

@HendrikStrobelt

Description

@HendrikStrobelt

In code below, I expect an output of all sub-strings of the full answer. I instead get the all substring and, in addition, the full generated text as last chunk. That's a mistake.

Code:

import asyncio
import sys

from mellea.backends import ModelOption
from mellea.backends.ollama import OllamaModelBackend
from mellea.core import CBlock
from mellea.stdlib.context import SimpleContext


async def main():
    prompt = " ".join(sys.argv[1:]) or "Explain what a neural network is in 13 sentences."

    backend = OllamaModelBackend(model_options={ModelOption.STREAM: True})
    instruction = CBlock(prompt)
    ctx = SimpleContext()

    thunk, ctx = await backend.generate_from_context(instruction, ctx)

    while not thunk.is_computed():
        text = await thunk.astream()
        print(f"\r{text}\n=========\n", end="", flush=True)


if __name__ == "__main__":
    asyncio.run(main())

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions