Given you have a V100 gpu at your disposal - just curious what different folks here will use for inference Llama based 7b and 13b models. Also would you use fastchat along with vLLM for conversation template?
You must log in or register to comment.
Given you have a V100 gpu at your disposal - just curious what different folks here will use for inference Llama based 7b and 13b models. Also would you use fastchat along with vLLM for conversation template?