Inference Speed When Running Local Models

Frequent-Let231@alien.top · 1 year ago

Inference Speed When Running Local Models

ttkciar@alien.top · 1 year ago

orca-mini-3b is good at fast summarizations, but it lies a lot, so ymmv.

a_beautiful_rhind@alien.top · 1 year ago

you can try to go down to 7b, it will be slightly faster.

FlishFlashman@alien.top · 1 year ago

Please get specific. What’s “quite slow,” what’s “extremely quickly.” Use numbers and units that include a unit of time.

What hardware are you running on? Without changing hardware your best bet is a smaller model (in terms of parameters), or a smaller quantization of a 13b model, or both.