It all depends on the size of the model you are running, if it cannot fit in GPU memory, then it has to go back and forth with the host (cpu memory or even disk) and the GPU. This is extremely slow. This is why some people are running LLMs on macs, as they can have a large amount of memory shared between the GPU and CPU, making it viable to fit some larger models in memory.
mlflexer
joined 2 months ago
Thanks for sharing, I find it hard to discover new “lemmy spaces”??? On here
I thought nftables where replacing iptables?
I tired it once with my buddy and it seemed to work fine on the element client. Not sure if this was placebo, but it felt like the unencrypted video call had better quality, however it wasn’t very noticeable. Though it might be a bigger problem with many people in the call, haven’t tested it though
Matrix?
Matrix? I think you can setup text channels and also do voice/video/screen sharing in the channels as well if you’re using element, though I havn’t been able to convince my friends to jump ship yet, so don’t know how it compares to discord
Oh, I thought you could get 128gb ram or more, but I can see it does not make sense with the <24gb… sorry for spreading misinformation, I guess, in this case a GPU of the same GB ram would probably be better