Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -152,14 +152,14 @@
# We need a dict where if we need to override we can
# NOTE: These are in *descending* order of priority. e.g. if you see 'mammoth-coder'
# you'll use that override and not listen to the 'llama-2' override
_VLLM_MODEL_LENGTH_OVERRIDES: Dict[str, Dict[str, int]] = {
_VLLM_MODEL_LENGTH_OVERRIDES: Dict[str, Dict[str, Optional[int]]] = {
"mammoth-coder": {"max_model_len": 16384, "max_num_batched_tokens": 16384},
# Based on config here: https://huggingface.co/TIGER-Lab/MAmmoTH-Coder-7B/blob/main/config.json#L12
# Can also see 13B, 34B there too
"code-llama": {"max_model_len": 16384, "max_num_batched_tokens": 16384},
# Based on config here: https://huggingface.co/codellama/CodeLlama-7b-hf/blob/main/config.json#L12
# Can also see 13B, 34B there too
"llama-2": {"max_model_len": 4096, "max_num_batched_tokens": 4096},
"llama-2": {"max_model_len": None, "max_num_batched_tokens": 4096},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with vllm-project/vllm#1198 i think we can remove setting max_num_batched_tokens. this is merged in 0.2.0

"mistral": {"max_model_len": 8000, "max_num_batched_tokens": 8000},
}

Expand Down Expand Up @@ -534,7 +534,7 @@ async def create_vllm_bundle(
):
command = []

max_num_batched_tokens: int = 2560 # vLLM's default
max_num_batched_tokens: Optional[int] = 2560 # vLLM's default
max_model_len: Optional[int] = None

for key, value in _VLLM_MODEL_LENGTH_OVERRIDES.items():
Expand Down