—————————————————————————————————————————————-Given the recent focus of the coding club sessions on all things boab and HPC, we thought a post on what-the-H is HPC would do us all some good. So, read on if you’re curious, confused or confident. There’s something in it for everyone.
Do you dread the word big data? Does parallel computing make your palms sweaty? For those of you trying to convince yourselves you’re braver than that, let’s see how you roll with…high performance computing, servers, slurm! Yikes!
You are not alone. Almost everyone in QAECO and the School of Biosciences is grappling with these very fears.
So, what is so scary? We’re all scientists after all and are trained to take on the new and beat it into submission with experiments and theory. Well for a start, we don’t understand the terminology used in this space and find it difficult to articulate what we want to do with it.“It” is the great unknown, which makes it scary.
Why has this happened? Because the means to create data is quickly surpassing our ability to learn how to talk about, deal with and analyse large volumes of data. Think of big rasters, gene sequencing data, multiple model runs and global sensitivity analyses that generate heaps of output that needs to be stored and sifted through. Suddenly, we are staring in the face of what feels like a giant data-monster, and we need all the help we can get to tame the beast!
Which tool should i choose?
What is a server?
A server is a computer designed to process requests and deliver data to another computer over the internet or a local network. Simply put, a “server” is a computer that you don’t sit in front of and type, and a computer you use directly is a “client”. Here is an excellent description (you only need to read the first answer): https://www.quora.com/What-is-a-computer-server
What are supercomputers and HPC?
The University of Melbourne’s resident HPC wiz, Lev Lafayette, explains: “Supercomputer means any single computer system that has exceptional processing power for its time. High Performance Computer (HPC) is any computer system whose architecture allows for above average performance. High Throughput Computing (HTC) is an architecture for maximum job completion; capability vs capacity computing.”
Here are a few things to consider when trying to understand HPCs:
- Storage: what type of storage do you need and where can you get it? There are two kinds: the storage needed to run the analysis (sometimes called scratch/temporary storage) and permanent storage needed to store input and output data, which you might access through the use of cloud storage like dropbox/google or external hard drives.
- Interconnect: how close are the storage and compute nodes? How fast is the interconnect? This is usually the bottleneck when it comes to analyses involving big data. Ideally you want the two to be as close as possible.
- Computing power (cores available): number of nodes and cores per node available and this can scale up compute power and reduce computation times considerably.
Another distinction that needs to be made early on when deciding what type of an HPC system you need is computing capacity versus computing capability.
- Computing capacity refers to instances where we might want to solve a small number of somewhat large problems or a large number of small problems. This typically involves running the same task/procedure in parallel across many datasets.
- Computing capability refers to instances when we might want to solve a single large problem in the shortest amount of time, so using maximum computing power. This involves running independent tasks in parallel with communication across tasks such that inputs of one task depend on the outputs of another. This is where a user might want to explore efficient code and parallelisation procedures as the ones offered by most university HPC teams, e.g. through Research Platform Services at the University Melbourne for Spartan.
What options are available?
Many. Let’s start off this section by saying there isn’t one best solution to your problem. There are a number of choices and you will have to try them out to see which one fits best. There are a few specifications to keep in mind though. We’ll talk a little bit about these, i.e. the default allocation for storage and computing power available to a user, for the different options available. Note that in most cases the default allocation can be increased if requested. As a reference, your laptop or desktop, say a 3.1 GHz Intel Core i5 MacBook Pro, has 4 cores.
This is the in-house QAEco server. Nick G says, “It is our own dedicated machine for the group. Uni IT helps with occasional hardware stuff but it’s separate from the university’s HPC hardware.” In theory each user gets up to 72 cores. Though those same 72 are shared across lots of people, so how many cores are free depends on who else is using it. It is advisable to not call more than 32 cores at any given time so as not to overwhelm the system. By default, a user has access to 32GB of storage but this can be increased on a case-by-case basis by “the admins”. See: https://github.com/qaecology/wiki_private/wiki/Computing-and-data-storage
Melbourne Research Cloud
The University of Melbourne has several cloud computing options available (for free) to staff and students. If you’re interested, have a look here: https://docs.cloud.unimelb.edu.au/. The “Getting Started” and “Guides” tabs have good information on the different services available and how to access them. One guide that is probably relevant to many of us is the intro to RStudio server: https://docs.cloud.unimelb.edu.au/guides/application_rstudio/.
Thanks to Jian for pointing this out at coding club and sharing the relevant links. You can trust coding club to always point you in the right direction!
Melbourne University HPC – Spartan
Spartan is the University of Melbourne’s High Performance Computing system. It is freely available to researchers (staff and students) at the university who find that their own computer just can’t cope anymore. Spartan is a combination of “bare metal” physical compute nodes located on campus and cloud compute nodes made available by Nectar’s cloud resources to the university. There is a slight learning curve to Spartan as it uses a command line interface and a new programming language for job submission, but it offers much more computing resources compared to any other system. The cloud compute nodes have access to 100GB of RAM and 12 CPUs, physical nodes have a minimum of 250GB of RAM and 12 CPUs, while specialist nodes can go all the way up to 72 CPUs and 1540GB of RAM! Spartan users submit jobs to a ‘queue’ (a priority order of jobs or waitlist of sorts) and request a set amount of computing resources. The job starts when it is your turn, as per the queue, and the requested resources become available.
Spartan is maintained by Research Platform Services here at the university and they have a website for their own guides/FAQs here: https://dashboard.hpc.unimelb.edu.au/.
Our in-house Spartan expert, David Wilkinson, has prepared an R user specific guide for the QAECO coding club that attempts to ease the learning curve: https://doi90.github.io/lodestar/introduction-to-spartan.html. Thanks David!
Ecocloud of the BCCVL uses Nectar cloud. By default, a user has access to 16GB of storage and 4 cores for computation (can go up to 16). You will need to register to be able to use their services. There are two key contacts for Ecocloud if you want to find out more: Chantal Huijbers firstname.lastname@example.org; Sarah Richmond email@example.com.
Ecocloud also has some additional functionality, such as an inbuilt data search engine (through the CSIRO Knowledge Network which is an online platform for exploring data). This feature makes it easier to search and download data to use within an analysis. Most importantly, it helps download and store data “close to” the compute nodes. This means that there is considerably less time spent in transferring data for analysis (think of data on dropbox or an external hard disk and the time it could take to copy this over to your local machine or boab to run an analysis). These transfer times are usually the bottlenecks when it comes to running analyses in big data projects. Ecocloud also has a GitHub repository with code snippets which can be quite useful when you’re writing up your own models.
Another option similar to ecocloud is CoESRA which is available through Terrestrial Ecosystem Research Network (TERN). It also provides a virtual desktop which can come pre-loaded with tools such as R, QGIS, etc. Not sure about the specifications here but the information should be available on their website.
Spin up your own virtual machine!
Wait what? Yes you can! And Will Morris says it is relatively simple. All you need to do is download the docker software on your computer. And then follow these steps: https://github.com/wkmor1/eco-dev
Feel free to ignore what exactly a docker or container is. But if you’re curious, see this link: https://medium.com/@yannmjl/what-is-docker-in-simple-english-a24e8136b90b
Swipe left or right:Ecocloud, Boab, Spartan, other?
Ecocloud is similar to boab as it provides users with a virtual machine to run analyses using R or Python. It is also the easiest option to access without having to go through the painful, linux-heavy style of high performance computing (HPC).
For us at QAEco, it is useful to note that boab outperforms ecocloud in terms of computing power (i.e. cores available) and storage. Ecocloud is geared towards increasing computing capacity rather than capability. It is a great resource nevertheless for academics wanting more storage and faster computation times for their research.
A good practice for us as a lab could be to use Ecolcoud or the university’s research cloud for smaller jobs so that we free up space for larger ones on boab.
Spartan surpasses the above in terms of computational capacity but the trade-off is that is relatively difficult to understand and operate unless trained. So, think carefully before you decide that more is better! The university’s Research Platform Services offer training workshops in Spartan which fill up quickly. My experience with these has been that although they can be overwhelming and may not immediately equip you will the skills to run your analyses on Spartan independently, they do point you to the right resources and introduce you to all the terminology. This was tremendously helpful for me to start having “sensible” conversations with the HPC team when setting up my project on Spartan. If you know what you want to do, I would suggest getting in touch with the HPC team because they are keen to help academics with their projects and help explore options to improve computational capability of proposed analyses.
But these aren’t the only options available of course. There are many more, e.g. Pawsey, Nectar, Massive, etc. Which one is right for you is a matter of deciding on one and giving it a go. If you’re lucky that option will work perfectly (unlikely), otherwise you fail and move onto the next available option (very likely). In my case, I decided on whatever I thought I’d be able to get help with. That and accepting there is no “best” option.
Good luck all!
— by Payal Bal and David Wilkinson
If you weren’t satisfied with the bombardment of links above, here’s one more on ‘an intro to HPC’: https://learn.scientificprogramming.io/introduction-to-high-performance-computing-hpc-clusters-9189e9daba5a