-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What version of MPI and gcc should I use to compile? #6
Comments
It shouldn't much matter. I've generally used openMPI and gcc 5.3 or 5.4, but it shouldn't much matter. The Makefile will change a bit depending on the MPI source (openMPI, MPICH2, etc.), but that should be it. |
Great. Can you give me the version number of openMPI too please? I'm seeing some unexpected behavior and I want to rule out differences in the build environment. |
It looks like I most recently built it against 1.10.2 |
Thanks! I'll give it a go. |
Feel free to post error messages if you run into any problems! |
That didn't seem to help. Next question. Is there a specific version of bowtie2 I should be using? |
I would try the most recent version, though really anything that's come out in the past few years should work. That's not needed for the installation, so if you're getting some sort of error at that point then that's not the cause. Just make sure some version of bowtie2 is in the |
OK. I'm trying to run this on a Slurm cluster and I'm basically getting a different error every time I submit. I'm using data from your 2014 tutorial. Here is my script:
Here is my submission command:
And here are the output files from 10 different runs (with ~10 different errors). Can you see something obvious I'm doing wrong?
|
That's a new one. What happens if you set |
Thanks for the suggestion. I ran 10 more jobs with the
|
I'll have to see if I can reproduce this locally then. Perhaps I tagged a bad release. |
In the cluster in my office we're using Slurm with OpenMPI 1.10.2 and gcc 4.8.5. There, the following works:
With the command |
What version of linux are you using and what glibc are you up to? |
Our nodes are running CentOS7 I think, I don't recall what version of glibc that uses. If push comes to shove, I can try sending you some binaries to see if they work. |
We're using CentOS 6 with glibc version 2.12. I'm guessing this may be the problem. I'm going to try building it in a Singularity container using CentOS 7 and see if that fixes the bug. But it might take a little time because I don't have any experience with MPI in a container yet. |
Good luck with that. I just noticed that openmpi is in conda-forge, so perhaps I can put this in bioconda... |
I think I have something working but I'd like to do a larger scale test before I declare victory. I'm not an omics person, and your tutorial example runs on a single node. Are you able to provide some test data and sample commands to run a larger scale job on 3 nodes? Any help would be much appreciated. |
Hi Dave, You may use the example command I gave which include the location of test dataset. |
Perfect. Thanks @patidarr! |
I think I'm making progress, but now I have a new error. I copied patidarr's data and then wrote a script based on his like so: #!/bin/sh
module load bison
cd /scratch/godlovedc/patidarr_bison
mpiexec -n 5 bison \
-g bison-index \
-1 Sample_CL0080_T1D_M_HYCKNBGXY_R1_val_1.fq.gz \
-2 Sample_CL0080_T1D_M_HYCKNBGXY_R2_val_2.fq.gz
Then I submitted the job like so:
It hummed along merrily for a while and then produced this error:
|
@GodloveD, your bison index is not initialized? /data/Clinomics/Ref/khanlab/Index/Bison/ |
How many reads are in the fastq files ( |
Here is FastQC info on this library.
` |
So it sounds to me like there is some uncertainty whether this is a bug or if there is something wrong with the data. Is that correct? @dpryan79 are you able to provide a known good data set that produces a known output when processed using known commands so that we can rule out a bug? |
I'll have to put something larger together. Note that there's generally a bit of randomness in some of the results from aligners. I'll post a link when I've put together something a bit larger. |
I've uploaded a tar ball that contains a reference index (GRCh38) and index as well as fastq files and the BAM file produced locally. I have additionally included the script I used to submit that with I wouldn't do an md5sum to compare results since there's always a bit of randomness in a few of the alignments, but the output text files should be quite similar (and the program shouldn't segfault for any reason). |
@dpryan79 Thanks very much for going to this trouble. It is greatly appreciated. I downloaded and untarred that data and ran it on our cluster. I was able to replicate your slurm-*.out file and subset_1.txt files exactly. So it seems like this installation is working properly. The problem appears to have been the mismatching glibc. It may be useful to post a link to these data and the commands and output in the documentation so that other users can verify their installation as well. Thanks again for putting this together. For the record, if any other users run into trouble using bison on an older version of Linux and want to run it from a container within Singularity, here is the definition file that I used to build it.
And here is the wrapper script I use to drive it. #!/bin/bash
SINGULARITY_BINDPATH="/gpfs,/gs2,/gs3,/gs4,/gs5,/gs6,/spin1,/data,/scratch,/fdb,/lscratch"
export SINGULARITY_BINDPATH
dir=$(dirname "$0")
cmd=$(basename "$0")
args="$@"
singularity exec $dir/bison.img $cmd $args This enables you to create links to the wrapper script which will essentially become executables that run within the container like so:
Thanks again! Feel free to close! |
@dpryan79 - Is there any chance your bioconda version of bison might be pushed soon? I've been trying to compile bison locally and have been running into a nightmare of errors trying to compile htslib with all of its dependencies (especially on libcurl, libbz2, and liblzma). |
@tantrev: That's been on the back burner. I'll see if I have time next week. I don't think the version of htslib that comes with bison needs libcurl or libbz2, though I'd be surprised if those weren't available on your system (they're pretty common). |
@dpryan79 - thank you for the tip on just using the built-in htslib! I was stupidly just following the tutorial blindly without thinking about it (in trying to use the latest htslib). I've gotten everything to compile just fine, but I keep getting errors when trying to run bison. Here's the stdout and stderr for this slurm submission script. Also, here's the top of the Makefile I used. Is there anything obviously wrong that I'm doing? Thanks again for your help. |
@tantrev what is your OS? |
When
|
@tantrev For some reason your system is segfaulting at a BTW, I have a bioconda recipe that at least partially works. I just have to get the testing to work on the TravisCI cluster on both OSes. |
@dpryan79 So I had to re-compile bison to use the Intel mpi in order to get gdb to work (I couldn't figure how to use it with Openmpi), but here's the bt command output. The stdout and stderr of the regular non-gdb command also slightly changed. @GodloveD I tried building your Singularity definition file on my Mac using Vagrant, then uploading it to CHPC, but when I try executing bison with this command
on the head node, it complains:
|
@tantrev It's kind of hard to find documentation on this, but you are actually supposed to call
If you want, have a look at wrapper script and links I described above. If you set those things up properly you can dispense with all of the
|
@tantrev: The |
@GodloveD - thank you! I should've just followed your wrapper script from the beginning, I was able to regular bison running that way. The only problem I ran into was when trying to run bison_herd, I received the following error:
Also, I was reading through the Singularity documentation and it probably doesn't matter, but it might be worth considering using Open MPI 2.1, just in case someone doesn't have Open MPI 1.10.x natively available on their cluster. @dpryan79 - thank you as well! Yes, sorry I had a couple of idiot mistakes there. Yes, like you noticed, I forgot to fix my garbled genome path (a relic from earlier experimentation) and it turns out something had gone wrong with my initial index generation as well. Bison_herd seems to be running smoothly right now though with the Intel MPI implementation. Also, I was reading through Bison's paper and saw the note how bison_herd's scaling can become "limited by the underlying MPI implementation and network architecture of the cluster". Since the paper seems to be optimized at around 9 nodes, but bison's documentation talks about you using more than 9 nodes frequently, I am just curious if you have any general feeling for how to gauge the optimal potential of any given cluster (e.g. OpenMPI version, NIC card speed, etc.)? For example, the cluster I'm running on right now has QDR Infiniband interconnect, but I'd ideally like to use as many simultaneous nodes as possible. :P Thanks again for both of your guys' help. |
Glad things are working now. I used to use up to 21 nodes on my old cluster, but that was hardware that's now >5 years old. I unfortunately don't have any great advice on gauging the optimal numbers, but if you find what's optimal for you then please relay it :) |
@dpryan79 - sounds good, I'll have to do some tinkering. :) On that note, I unfortunately seemed to have jinxed myself with the previous hopeful comment. Bison_herd was running just fine for about 20 minutes, then exited with an error. Here's the stdout and stderr if you might have any ideas about what's going on. Sorry to keep bothering you! |
Since bowtie2 died with a signal 9 (aka, "kill"), which will most likely happen if the scheduler you're using doesn't like something (e.g., the default memory allocation is too low). For many schedulers there's a switch to specify reporting back how much memory was actually used, so if yours has such an option then give that a try. |
Thank you! You were dead on, the memory indeed was exceeded - everything is now running on 256GB nodes. :P |
So I've been doing some tinkering for a larger-scale deployment and it seems that I'm only getting marginal speed-ups when using 13-nodes (20 Intel cores, 64GB RAM), compared to when using 5-nodes. For example, 13-nodes gives about ~76,000 reads/sec, when using 5-nodes gives about ~66,000 reads/sec. In both configurations, I allocate 7 threads to both the -mp and -@ arguments. Is it possible that other bison parameters, like queue-size and throttling, are the culprit for this marginal performance improvement? If it makes any difference, each mate of the raw .fastq file pairs is about ~100GB. |
Can you please provide some guidance on what version of MPI / gcc you've used to compile and run bison?
The text was updated successfully, but these errors were encountered: