Selfhosted

45757 readers

588 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.

Resources:

selfh.st Newsletter and index of selfhosted software and apps
awesome-selfhosted software
awesome-sysadmin resources
Self-Hosted Podcast from Jupiter Broadcasting

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago

MODERATORS

HybridSarcasm@lemmy.world

HybridSarcasm@lemmy.hybridsarcasm.xyz

How to use GPUs over multiple computers for local AI? (lemmy.dbzer0.com)

submitted 4 days ago by marauding_gibberish142@lemmy.dbzer0.com to c/selfhosted@lemmy.world

64 comments fedilink hide all child comments

The problem is simple: consumer motherboards don't have that many PCIe slots, and consumer CPUs don't have enough lanes to run 3+ GPUs at full PCIe gen 3 or gen 4 speeds.

My idea was to buy 3-4 computers for cheap, slot a GPU into each of them and use 4 of them in tandem. I imagine this will require some sort of agent running on each node which will be connected through a 10Gbe network. I can get a 10Gbe network running for this project.

Does Ollama or any other local AI project support this? Getting a server motherboard with CPU is going to get expensive very quickly, but this would be a great alternative.

Thanks

you are viewing a single comment's thread
view the rest of the comments

[–] Xanza@lemm.ee 9 points 3 days ago (1 children)

consumer motherboards don’t have that many PCIe slots

The number of PCIe slots isn't the most limiting factor when it comes to consumer motherboards. It's the number of PCIe lanes that are supported by your CPU and the motherboard has access to.

It's difficult to find non-server focused hardware that can do something like this because you need a significant number of PCIe lanes to accommodate your CPU, and your several GPUs at full speed. Using an M.2 SSD? Even more difficult.

Your 1 GPU per machine is a decent approach. Using a Kubernetes cluster with device plugins is likely the best way to accomplish what you want here. It would involve setting up your cluster, installing the drivers for your GPU (on each node) which then exposes the device to the system. Then when you create your Ollama container, in the prestart hook, ensure you expose your GPUs to the container for usage.

The issue with doing this, is 10Gbe is very slow compared to your GPU via PCIe. You're networking all these GPUs to do some cool stuff, but then you're severely bottle-necking yourself with your network. All in all, it's not a very good plan.

[–] marauding_gibberish142@lemmy.dbzer0.com 2 points 3 days ago

I agree with your assessment. I was indeed going to run k8s, just hadn't figured out what you told me. Thanks for that.

And yes, I realised that 10Gbe is just not enough for this stuff. But another commenter told me to look for used threadripper and EPYC boards (which are extremely expensive for me), which gave me the idea to look for older Intel CPU+Motherboard combos. Maybe I'll have some luck there. I was going to use Talos in a VM with all the GPUs passed through to it.