Skip to content

Create build_tarballs.jl for norMD #11075

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
49 changes: 49 additions & 0 deletions N/norMD/build_tarballs.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Note that this script can accept some limited command-line arguments, run
# `julia build_tarballs.jl --help` to see a usage message.
using BinaryBuilder, Pkg

name = "norMD"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to use a less concise name as the tool is for a specific field (bioinformatics, MSA), e.g.,

Suggested change
name = "norMD"
name = "AquaMSANorMD"

Even adding MSA to disambiguate among all the "Aqua"'s out there (Aqua.jl, Aqua macOS GUI, JetBrains Aqua, ... https://github.com/search?q=aqua&type=repositories&s=stars&o=desc ).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Thanks, I understand :) Can it be NormalizedMeanDistance ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my perspective, with a background in Computer Science, and not in Bioinformatics, I would be more comfortable with a name completely without field-specific abbreviations, i.e., I would really be most comfortable with a name like

AquaMultipleSequenceAlignmentNormalizedMeanDistance

But I'm OK with AquaMSA as the prefix if that is preferred, but I think there should be a prefix to disambiguate among other interpretations of "NorMD" or "NormalizedMeanDistance".

I do not think NormalizedMeanDistance only makes sense for MSA (with sequence referring to DNA/RNA sequences, I presume...). E.g., in Computer Vision, it is common to look at the IoU (intersection over union) distance between object detections, and then compute mAP (mean average precision) based on an IoU threshold. I would not expect everyone in the whole world (Julia world) to understand, or even accept, computer vision-specific terms as universal terms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the package name should ideally match the tool's name, following the usual [convention](https://docs.binarybuilder.org/stable/building/#Name) for JLL packages.
After reconsidering, since the upstream AQUA project provides the norMD tool, I think it would make more sense to name the package AQUAnorMD.
This keeps the name closely tied to the upstream project while being simpler and improving discoverability.
What do you think?

Copy link
Contributor Author

@diegozea diegozea Apr 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the last commit, I changed the name from norMD to AQUAnorMD to better align with the upstream project and improve clarity.
Should the file also be renamed to AQUAnorMD and moved into the A/ folder to reflect this new naming?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "not necessarily" I mean that they're specifically not checked for naming rules.

Copy link
Contributor Author

@diegozea diegozea Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback and the thoughtful discussion.

I'd like to explain why I still think norMD is the most appropriate name for the JLL package:

  1. There is no other standalone software named norMD. So, currently there are no naming clashes, and such conflicts are unlikely in the future — especially since the casing (norMD) isn’t standard in Julia.

  2. I really like the idea of JLL packages being named after their actual software or suite. It helps a lot with discoverability. For example, that’s how I found useful tools like MAFFT_jll. I wouldn’t have found them easily if the name had been something like MultipleAlignmentUsingFastFourierTransform_jll.

  3. While I’ve taken the implementation from AQUA because it’s the only publicly accessible one — available via wget — the norMD tool was published independently before AQUA. In fact, to get other versions, you'd need to contact the paper authors directly. So AQUA is just bundling norMD, not the origin of it. Future versions of the tool could appear outside AQUA too. So, I would prefer not to tie the package name to AQUA.

Let me know what you think — I’m open to discussing further, but I believe keeping the norMD name fits the conventions and benefits users the most.

Copy link
Contributor

@stemann stemann Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding 1, when I go looking for “norMD”, I find (in the following order):

(no trace of the aqua/msa normd)

Regarding 2: In contrast, all of the first 10 hits from
Google for mafft seems to be for bioinformatics multiple alignment FFT of some DNA/RNA sequences.

Regarding 3: It makes sense not to link the package to AQUA if they are merely bundling it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the mentioned normD and normd do not clash because of the different casing ;) Also, normD is an R subroutine, so it would not be a JLL package.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that casing is too brittle to rely on for disambiguation.

And yes, I don't see the R package command becoming a JLL, it was more to show that there were (more common?) interpretations of what "normd" might be.

version = v"1.3.0"

# This script installs norMD (normalized Mean Distance) version 1.3, provided by the AQUA suite.
# norMD is a statistical metric used to assess the quality of multiple sequence alignments (MSAs).
# The `normd` program calculates the overall norMD score for an entire multiple sequence alignment.
#
# Usage example:
# - `normd aln_file`: Calculates the norMD score for the specified alignment file.
#
# If you use this tool, please cite the following references:
# - Thompson, J. D., Plewniak, F., Ripp, R., Thierry, J. C., & Poch, O. (2001). Towards a reliable objective function for multiple sequence alignments. Journal of molecular biology, 314(4), 937-951.
# - Muller, J., Creevey, C. J., Thompson, J. D., Arendt, D., & Bork, P. (2010). AQUA: automated quality improvement for multiple sequence alignments. Bioinformatics, 26(2), 263-265.

# Collection of sources required to complete build
sources = [
ArchiveSource("https://www.bork.embl.de/Docu/AQUA/latest/norMD1_3.tar.gz", "24ba32425640ae6288d59ca2bf5820dd85616132fe6a05337d849035184c660d"),
FileSource("https://www.bork.embl.de/Docu/AQUA/latest/License.txt", "ddb9db7630752f8fdc6898f7c99a99eaeeac5213627ecb093df9c82f56175dc7")
]

# Bash recipe for building across all platforms
script = raw"""
cd $WORKSPACE/srcdir/normd_noexpat/
sed -i '/#include "score.h"/a#include <string.h>' init.c
make -j${nproc} CFLAGS="-c -O2 -std=c99 -Wno-implicit-function-declaration"
install -Dvm 755 normd "${bindir}/normd${exeext}"
"""
# NOTE: Only the normd executable is installed.
# The programs normd_subaln, normd_range, normd_sw, normd_aln, and normd_aln1 are built but not installed.

# These are the platforms we will build for by default, unless further
# platforms are passed in on the command line
platforms = supported_platforms()

# The products that we will ensure are always built
products = [
ExecutableProduct("normd", :normd)
]

# Dependencies that must be installed before this package can be built
dependencies = Dependency[
]

# Build the tarballs, and possibly a `build.jl` as well.
build_tarballs(ARGS, name, version, sources, script, platforms, products, dependencies; julia_compat="1.6")