Has anyone here built a Beowulf Cluster?

plenipotentprotogod@lemmy.world · 10 months ago

Has anyone here built a Beowulf Cluster?

Kangie@lemmy.srcfiles.zip · 10 months ago

Yes. I’m actually doing so right now at work, and run multiple Beowulf clusters for a research institution. You don’t need or want this.

In a real cluster you would use software like Slurm or PBS to submit jobs to the cluster and have them execute on your compute nodes as resources are available to keep utilisation high.

It makes no sense for the home environment unless you’re trying to run some serious computations and if you have a need to do that for work or study then you probably have access to a real HPC.

It might be interesting and fun, but not particularly useful. Maybe a fun HCI setup would be more appropriate to enable you to scale VMS across hosts and get some redundancy.

plenipotentprotogod@lemmy.world · 10 months ago

Out of curiosity, what software is normally being run on your clusters? Based on my reading, it seems like some companies run clusters for business purposes. E.g. an engineering company might use it for structural analysis of their designs, or a pharmaceutical company might simulate the interactions of new drugs. I assume in those cases they’ve bought a license for some kind of high-end software that’s been specifically written to run in a distributed environment. I also found references to some software libraries that are meant to support writing programs in this environment. I assume those are used more by academics who have a very specific question they want to answer (and may not have funding for commercial software) so they write their own code that’s hyper focused on their area of study.

Is that basically how it works, or have I misunderstood?

Kangie@lemmy.srcfiles.zip · edit-2 10 months ago

Overall you’re not too far off, but what you’ll tend to find is that it’s a lot of doing similar calculations over and over.

For example, climate scientists may, for certain experiments, read a ton of data from storage for say different locations and date/times across a bunch of jobs, but each job is doing basically the same thing - you might submit 100000 permutations, or have an updated model that you want to crunch the existing dataset out with.

The data from each job is then output, and analysed (often with followup batch jobs).

Edit: here’s an example of a model that I have some real-world experience building to run on one of my clusters: https://www.nrel.colostate.edu/projects/century/

Swin have some decent, public docs. I think mine are pretty good, but they’re not public so…

https://supercomputing.swin.edu.au/docs/2-ozstar/oz-partition.html

There will typically be some interactive nodes in a cluster as well that enable users to log in and perform interactive tasks, like validating that the software will run or, more commonly, to submit jobs to the queue manager.

Nomecks · 10 months ago

deleted by creator

Has anyone here built a Beowulf Cluster?

Has anyone here built a Beowulf Cluster?

Beowulf Clusters Make Supercomputing Accessible | NASA Spinoff