alessandro to PC Gaming · 10 hours agoIntel CEO sees 'less need for discrete graphics' and now we're really worried about its upcoming Battlemage gaming GPU and the rest of Intel's graphics roadmapwww.pcgamer.comexternal-linkmessage-square14fedilinkarrow-up150arrow-down11
arrow-up149arrow-down1external-linkIntel CEO sees 'less need for discrete graphics' and now we're really worried about its upcoming Battlemage gaming GPU and the rest of Intel's graphics roadmapwww.pcgamer.comalessandro to PC Gaming · 10 hours agomessage-square14fedilink
minus-squareNomeckslinkfedilinkarrow-up2·edit-22 hours agoPeople running LLMs aren’t the target. People who use things like ChatGPT and CoPilot on low power PCs who may benefit from edge inference acceleration are. Every major LLM dreams of offloading compute on the end users. It saves them tons of money.
minus-squarebrucethemoose@lemmy.worldlinkfedilinkarrow-up1·edit-22 hours agoOne can’t offload “usable” LLMs without tons of memory bandwidth and plenty of RAM. It’s just not physically possible. You can run small models like Phi pretty quick, but I don’t think people will be satisfied with that for copilot, even as basic autocomplete. About 2x faster than Intel’s current IGPs is the threshold where the offloading can happen, IMO. And that’s exactly what AMD/Apple are producing.
People running LLMs aren’t the target. People who use things like ChatGPT and CoPilot on low power PCs who may benefit from edge inference acceleration are. Every major LLM dreams of offloading compute on the end users. It saves them tons of money.
One can’t offload “usable” LLMs without tons of memory bandwidth and plenty of RAM. It’s just not physically possible.
You can run small models like Phi pretty quick, but I don’t think people will be satisfied with that for copilot, even as basic autocomplete.
About 2x faster than Intel’s current IGPs is the threshold where the offloading can happen, IMO. And that’s exactly what AMD/Apple are producing.