strataG is a toolkit for haploid sequence and multilocus genetic data summaries, and analyses of population structure. One can select select specific individuals, loci, or strata using standard R '[' indexing methods. . The package contains functions for summarizing haploid and diploid loci (e.g., allelic richness, heterozygosity, haplotypic diversity, etc.), and haploid sequences by locus and by strata as well as functions for computing by-site base frequencies and identifying variable and fixed sites among strata. There are both overall and pairwise standard tests of population structure like PHIst, Fst, Gst, and Jost's D. If individuals are stratified according to multiple schemes, these stratifications can be changed with the stratify() function and summaries or tests can be re-run on the new object. The package also includes wrappers for several external programs like fastsimcoal2, STRUCTURE, and mafft. There are also multiple conversion functions for data objects for other population packages such as adegenet, pegas, and phangorn.
To install the latest version from GitHub:
# make sure you have devtools installed
if (!require('devtools')) install.packages('devtools')
# install strataG latest version
devtools::install_github('ericarcher/strataG', build_vignettes = TRUE)When installing on a Mac or LINUX system, you may get errors when installing during the compilation phase that look like:
─ installing *source* package ‘strataG’ ...
** using staged installation
** libs
...
ld: warning: search path '/opt/gfortran/lib/gcc/aarch64-apple-darwin20.0/12.2.0' not found
ld: warning: search path '/opt/gfortran/lib' not found
ld: library 'gfortran' not found
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [strataG.so] Error 1
ERROR: compilation failed for package ‘strataG’
─ removing ‘/private/var/folders/rx/z5h877kx2rx_85q95ct3fdfr0000gn/T/RtmpgBBQQU/Rinstfc0912a21e41/strataG’
-----------------------------------
ERROR: package installation failed
Error: Failed to install 'strataG' from GitHub:
! System command 'R' failed
If you see this, follow these steps to update the compilation-related paths in your Makeconf file:
- Confirm that you have
gfortranandgccinstalled. Open Terminal and type
<me> ~ % which gfortran
/usr/local/bin/gfortran
<me> ~ % which gcc
/usr/bin/gcc
If you see "gfortran not found", then you need to install it.
First, if you are on a Mac, make sure you have Xcode Command Line Tools installed:
xcode-select --install
If which gfortran still returns gfortran not found and you have homebrew then try:
brew install gcc
- Locate your
Makeconffile. Check/Library/Frameworks/R.framework/Resources/etc:
<me> ~ % ls /Library/Frameworks/R.framework/Resources/etc
Makeconf Renviron javaconf ldpaths repositories
-
Open
Makeconfin a text editor and find the lines labeled#Fortran. -
Update the
FCandF77paths to match the location given inwhich gfortran:
FC = /usr/local/gfortran/bin/gfortran -arch arm64
F77 = /usr/local/gfortran/bin/gfortran
- Search in the same
gfortrandirectory forgcc, perhaps inlib. UpdateFLIBSto match the paths for the libraries as below:
FLIBS = -L/usr/local/gfortran/lib/gcc/aarch64-apple-darwin23/14.1.0 -L/usr/local/gfortran/lib -lgfortran -lemutls_w -lquadmath
For more help, also check out this thread on Stack Overflow.
Vignettes are available on several topics:
- Creating and manipulating gtypes ("gtypes")
- Genotype and sequence summaries ("summaries")
- Working with sequences ("sequences")
- Tests of population structure ("population.structure")
- Installing external programs ("external.programs")
To see the list of all available vignettes:
browseVignettes("strataG")To open a specific vignette:
vignette("gtypes", "strataG")There is also a tutorial detailing running fastsimcoal2 through strataG
available through the function fscTutorial().
The paper can be obtained here, and is cited as (preferred):
Archer, F. I., Adams, P. E. and Schneiders, B. B. (2016), strataG: An R package for manipulating, summarizing and analysing population genetic data. Mol Ecol Resour. doi:10.1111/1755-0998.12559
If desired, the current release version of the package can be cited as:
Archer, F. 2025. strataG: An R package for manipulating, summarizing and analysing population genetic data. R package version 1.0.6. Zenodo. http://doi.org/10.5281/zenodo.60416
- submit suggestions and bug-reports: https://github.com/ericarcher/strataG/issues
- send a pull request: https://github.com/ericarcher/strataG/
- e-mail: [email protected]
- IMPORTANT: There was an error in computing observed hetrozygosity
(in
heterozygosity,summarizeLoci(), andsummarizeInds()) where the number of genotypes missing data were not being taken into account. This would lead to an underestimate in heterozygosity that was directly proportional to the amount of missing data. - added
zygosity() - added
diagnosability() - added
microhaplot2rubias() - added
qual2prob() - made
rfPermuteandrmetasimpackages Suggested - removed melt from structurePlot
- fixed ldNe error when one individual is present
- fixed mafft error and now have mafft .fasta files written to temporary file rather than working directory
- fixed error with
readGenData()not recognizingNAs. - fixed error with
fs2gtypes()not formatting multi-block DNA sequence data as gtypes properly - added
gtypesRF(),sequenceRF(), andgtypes2rfDF().
- Deleted functions:
alleleFreqFormat,as.array.gtypes - Changed structure of
gtypesobject, making it no longer compatible with previous versions - Fixed and enhanced
arlequinRead()so that it will read and parse all .arp files. Addedarp2gtypes()to creategtypesobject from parsed .arp files. - Improved performance of several standard summary functions, most notably
dupGenotypes(). - Full rework of fastsimcoal2 wrapper.
- Removed
strataGUI().
- fixed error in ldNe when missing data are present
- added STANDARD marker type to fastsimcoal
- added
na.rm = TRUEto calculation of mean locus summaries by strata insummary.gtypes. This avoidsNaNs when there is a locus with genotypes missing for all samples. - explicitly convert
xto adata.frameindf2gtypesin case it is adata.tableortibble.
- NOTE: In order to speed up indexing the data in large data sets, this version changes the underlying structure of the
gtypesobject by replacing the@locidata.frame slot with a@datadata.table slot. The data.table has aidcharacter column, astratacharacter column, and every column afterwards represents one locus. The@strataslot has been removed. - The
lociaccessor has been removed. - Added
as.arraywhich returns a 3-dimensional array with dimensions of [id, locus, allele]. - The print (show) function for
gtypesobjects no longer shows a by-locus summary. The display was getting too slow for data sets with a large number of loci. - The
summaryfunction now includes by-sample results. - Fixed computational errors in population structure metrics due to incorrect sorting of stratification.
- Added
mafto return minimum allele frequency for each locus. - Added
ldNeto calculate Ne. - Added
expandHaplotypesto expand the haplotypes in agtypesobject to one sequence per individual.
- Added
read.arlequinback. Fixed missing function error withwrite.arlequin. - Added
summarizeSamples - Changed
evannofrom base graphics to ggplot2 - Updated logic in
labelHaplotypesto assign haplotypes if possible alternative site combinations match a present haplotype - Added Zenodo DOI
- Added shiny app (
strataGUI) for creating gtypes objects, QA/QC, and population structure analyses - Added
typeargument tostructurePlotto select between area and bar charts - Changed
haplotypeLikelihoodstosequenceLikelihoods neiDanow creates haplotypes before calculating metric- Fixed error in
writePhasethat was creating improper input files for PHASE
- Fixed error in dupGenotypes, propSharedLoci, and propSharedIDs where missing genotypes were not being properly counted.
- Added as.data.frame.gtypes.
- Removed gtypes2df.
- Added arguments to as.matrix.gtypes to include id and strata columns in output.
- Removed the jmodeltest function as this functionality is available in the modeltest function in the phangorn package.
- Added conversion functions gtypes2phyDat and phyDat2gtypes to facilitate interoperability with the phangorn package.
- Removed read.arlequin.
- Added alleleNames accessor for gtypes object, which returns list of allele names for each locus.
- New version with different gtypes format from previous versions. See vignettes for instructions and examples.