-
Notifications
You must be signed in to change notification settings - Fork 586
Create build_tarballs.jl for norMD #11075
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
# `julia build_tarballs.jl --help` to see a usage message. | ||
using BinaryBuilder, Pkg | ||
|
||
name = "norMD" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to use a less concise name as the tool is for a specific field (bioinformatics, MSA), e.g.,
name = "norMD" | |
name = "AquaMSANorMD" |
Even adding MSA to disambiguate among all the "Aqua"'s out there (Aqua.jl, Aqua macOS GUI, JetBrains Aqua, ... https://github.com/search?q=aqua&type=repositories&s=stars&o=desc ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! Thanks, I understand :) Can it be NormalizedMeanDistance
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my perspective, with a background in Computer Science, and not in Bioinformatics, I would be more comfortable with a name completely without field-specific abbreviations, i.e., I would really be most comfortable with a name like
AquaMultipleSequenceAlignmentNormalizedMeanDistance
But I'm OK with AquaMSA as the prefix if that is preferred, but I think there should be a prefix to disambiguate among other interpretations of "NorMD" or "NormalizedMeanDistance".
I do not think NormalizedMeanDistance only makes sense for MSA (with sequence referring to DNA/RNA sequences, I presume...). E.g., in Computer Vision, it is common to look at the IoU (intersection over union) distance between object detections, and then compute mAP
(mean average precision) based on an IoU threshold. I would not expect everyone in the whole world (Julia world) to understand, or even accept, computer vision-specific terms as universal terms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the package name should ideally match the tool's name, following the usual [convention](https://docs.binarybuilder.org/stable/building/#Name) for JLL packages.
After reconsidering, since the upstream AQUA
project provides the norMD
tool, I think it would make more sense to name the package AQUAnorMD
.
This keeps the name closely tied to the upstream project while being simpler and improving discoverability.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the last commit, I changed the name from norMD
to AQUAnorMD
to better align with the upstream project and improve clarity.
Should the file also be renamed to AQUAnorMD
and moved into the A/
folder to reflect this new naming?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By "not necessarily" I mean that they're specifically not checked for naming rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback and the thoughtful discussion.
I'd like to explain why I still think norMD
is the most appropriate name for the JLL package:
-
There is no other standalone software named
norMD
. So, currently there are no naming clashes, and such conflicts are unlikely in the future — especially since the casing (norMD
) isn’t standard in Julia. -
I really like the idea of JLL packages being named after their actual software or suite. It helps a lot with discoverability. For example, that’s how I found useful tools like
MAFFT_jll
. I wouldn’t have found them easily if the name had been something likeMultipleAlignmentUsingFastFourierTransform_jll
. -
While I’ve taken the implementation from AQUA because it’s the only publicly accessible one — available via
wget
— thenorMD
tool was published independently before AQUA. In fact, to get other versions, you'd need to contact the paper authors directly. So AQUA is just bundlingnorMD
, not the origin of it. Future versions of the tool could appear outside AQUA too. So, I would prefer not to tie the package name to AQUA.
Let me know what you think — I’m open to discussing further, but I believe keeping the norMD
name fits the conventions and benefits users the most.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding 1, when I go looking for “norMD”, I find (in the following order):
- https://rdrr.io/github/EvansLaboratory/OBIF/man/normD.html
- normD seems to have been a C thing since sometime in the 1990’s (or before): http://dbwww.essc.psu.edu/lasdoc/user/normd.html - seems to be an executable written in C or Fortran
(no trace of the aqua/msa normd)
Regarding 2: In contrast, all of the first 10 hits from
Google for mafft seems to be for bioinformatics multiple alignment FFT of some DNA/RNA sequences.
Regarding 3: It makes sense not to link the package to AQUA if they are merely bundling it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the mentioned normD
and normd
do not clash because of the different casing ;) Also, normD
is an R subroutine, so it would not be a JLL package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that casing is too brittle to rely on for disambiguation.
And yes, I don't see the R package command becoming a JLL, it was more to show that there were (more common?) interpretations of what "normd" might be.
I decided to come back to `norMD` as there no other standalone software named that way; therefore, there are no clashes.
Co-authored-by: Mosè Giordano <[email protected]>
This code is so bad, it's a miracle it was ever possible to compile it. |
Co-authored-by: Mosè Giordano <[email protected]>
Yes! I tried a similar setup early unsuccessfully, so at some point I desisted and chose to allow only the list of platforms for which I was able to compile it :/ |
Also the Makefile is complete rubbish and doesn't consistently use the |
Description:
This PR adds a build recipe for norMD (normalized Mean Distance) version 1.3, a tool from the AQUA suite used to assess the quality of multiple sequence alignments (MSAs).
The build process compiles several executables; however, only the
normd
binary is installed.References: