Self-hosting LLMs

GreenSofaBed@lemmy.zip · 6 months ago

Self-hosting LLMs

The Hobbyist@lemmy.zip · 6 months ago

I didn’t say it can’t. But I’m not sure how well it is optimized for it. From my initial testing it queues queries and submits them one after another to the model, I have not seen it batch compute the queries, but maybe it’s a setup thing on my side. vLLM on the other hand is designed specifically for the multi co current user use case and has multiple optimizations for it.

Avid Amoeba · 6 months ago

I see. Makes sense.