Streaming returns the full text in the last chunk.

In code below, I expect an output of all sub-strings of the full answer. I instead get the all substring and, in addition, the full generated text as last chunk. That's a mistake. 

Code:
```
import asyncio
import sys

from mellea.backends import ModelOption
from mellea.backends.ollama import OllamaModelBackend
from mellea.core import CBlock
from mellea.stdlib.context import SimpleContext


async def main():
    prompt = " ".join(sys.argv[1:]) or "Explain what a neural network is in 13 sentences."

    backend = OllamaModelBackend(model_options={ModelOption.STREAM: True})
    instruction = CBlock(prompt)
    ctx = SimpleContext()

    thunk, ctx = await backend.generate_from_context(instruction, ctx)

    while not thunk.is_computed():
        text = await thunk.astream()
        print(f"\r{text}\n=========\n", end="", flush=True)


if __name__ == "__main__":
    asyncio.run(main())
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming returns the full text in the last chunk. #618

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Streaming returns the full text in the last chunk. #618

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions