-
Notifications
You must be signed in to change notification settings - Fork 79
Open
Description
Requests to api/v2/generate (websocket_api.py) can fail to detect the stop sequence in the generated response and will continue generation well after. This problem becomes even more apparent if max_new_tokens is greater than 1.
I have a change I can commit that works around this issue by scanning for the stop sequence in the last delta appended to the tail of the previous deltas, then stopping and returning a truncated new delta if the stop sequence is found.
If you'd like a PR for this, it will also require merging #31.
Metadata
Metadata
Assignees
Labels
No labels