In code below, I expect an output of all sub-strings of the full answer. I instead get the all substring and, in addition, the full generated text as last chunk. That's a mistake.
Code:
import asyncio
import sys
from mellea.backends import ModelOption
from mellea.backends.ollama import OllamaModelBackend
from mellea.core import CBlock
from mellea.stdlib.context import SimpleContext
async def main():
prompt = " ".join(sys.argv[1:]) or "Explain what a neural network is in 13 sentences."
backend = OllamaModelBackend(model_options={ModelOption.STREAM: True})
instruction = CBlock(prompt)
ctx = SimpleContext()
thunk, ctx = await backend.generate_from_context(instruction, ctx)
while not thunk.is_computed():
text = await thunk.astream()
print(f"\r{text}\n=========\n", end="", flush=True)
if __name__ == "__main__":
asyncio.run(main())
In code below, I expect an output of all sub-strings of the full answer. I instead get the all substring and, in addition, the full generated text as last chunk. That's a mistake.
Code: