The following scientific computing software is available on our HPC cluster systems under directory /share1.

* Several versions are installed; listed is the default version.

Licensed/Open Source Software Applications

Bioinformatics
Chemistry and Molecular Modeling
Physics and Materials Science
Mathematics and Statistics
  • Software HPC2021
    MATLAB R2021a, R2021b, R2022b
    R 4.0.4*, 4.1.2, 4.2.1
    STATA 16.1, 17.0,18.0*
Data Analysis and Machine Learning

Utilities and Libraries

Compilers and programming languages

Parallel Libraries

Math Libraries
Programming Utilities
  • Software HPC2021
    HDF5 1.10.7, 1.12.2*
    MPC 1.2.1
    MPFR 4.1.0
Visualization and Plotting
  • Software HPC2021
    Gnuplot 5.4.2
    ParaView 5.9.0, 5.10.0*
    VMD 1.9.3

List of available modules on HPC2021 system

Search:

Software Description Available versions Keywords
abaqus ABAQUS – Software suite for finite element analysis and computer-aided engineering.
abaqus/2020
abaqus/2021
abaqus/2022
abaqus/2023

(Default)

Finite Element Analysis, Computer-aided Engineering
abricate Mass screening of contigs for antimicrobial and virulence genes
abricate/1.0.0
Virus
ABySS ABySS is a de novo sequence assembler intended for short paired-end reads and genomes of all sizes
ABySS/2.3.3
Genome Assembler
adf ADF: Package that uses Density Functional Theory(DFT) to predict chemical structure and reactivity for electronic and molecular structure calculations.
adf/2019
Density Functional Theory, Spectroscopy, Transition Metal, Heavy Elements
AHRD Automated Assignment of Human Readable Descriptions (AHRD)
AHRD/3.3.3

 

Gene/Protein Annotation
alphafold AlphaFold: AI program performs predictions of protein structure that developed by Google’s DeepMind
alphafold/2.1.0
alphafold/2.1.1
alphafold/2.1.2

(Default)

Structural Bioinformatics, Protein Structure Prediction, AI
anaconda Anaconda: Python Data Science Platform for Python 3
anaconda/py3.8

 

Data Science, Conda, Python, Jupyter
ancestry_hmm-s Inferring adaptive introgression from genomic data using hidden Markov models
ancestry_hmm-s/0.9.0.2

 

Population Genomics
AnnotSV AnnotSV: An integrated tool for Structural Variations annotation and ranking
AnnotSV/3.1

 

Annotation, SV, CNV, Target Prioritization
ANNOVAR ANNOVAR is an efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes
ANNOVAR/2020-06-07
NGS, Annotation
aocc AOCC – AMD Optimizing C/C++ Compiler
aocc/3.1.0

 

aocc/3.2.0

(Default)

AMD, EPYC, Compiler
aocl/aocc AMD Optimizing CPU Libraries (AOCL)
aocl/aocc/3.0-6

 

aocl/aocc/3.1.0

(Default)

AMD, EPYC, Numerical Libraries
aocl/gcc AMD Optimizing CPU Libraries (AOCL)
aocl/gcc/3.0-6

 

aocl/gcc/3.1.0

(Default)

AMD, EPYC, Numerical Libraries
arlequin Arlequin: An Integrated Software for Population Genetics Data Analysis
arlequin/3.5.2.2

 

Population Genetics, Molecular Ecology
aspera IBM Aspera Command-Line Interface (the Aspera CLI) is a collection of Aspera tools for performing high-speed, secure data transfers from the command line..
aspera/3.9.6
Data Transfer
augustus AUGUSTUS is a program that predicts genes in eukaryotic genomic sequences.
augustus/3.4.0

 

Eukaryotic gene prediction
automake Automake – make file builder part of autotools
automake/1.16.3

 

Makefile, Configure Tool
axel axel: Lightweight CLI download accelerator
axel/2.17.11

(Default)

Data Download
bamtools BamTools provides both a programmer’s API and an end-user’s toolkit for handling BAM files.
bamtools/2.5.2

 

NGS, Data Format, BAM
BASTA https://github.com/timkahlke/BASTA.
BASTA/1.4.1

 

Taxonomy Assignment
BBmap BBMap: Short read aligner for DNA and RNA-seq data. Capable of handling arbitrarily large genomes with millions of scaffolds
BBmap/38.93

 

NGS, Aligner, Short-read
bc bc is an arbitrary precision numeric processing language.https://www.gnu.org/software/bc
bc/1.07.1

 

Calculator
bcftools BCFtools is a program for variant calling and manipulating files in the Variant Call Format (VCF) and its binary counterpart BCF.
bcftools/1.14

 

NGS, Data Format, VCF
bcl2fastq The Illumina bcl2fastq2 Conversion Software demultiplexes sequencing data and converts base call (BCL) files into FASTQ files.
bcl2fastq/2.19.0

 

NGS, Base-calling, Illumina
BEAGLE/4.0.0 BEAGLE is a high-performance library that can perform the core calculations at the heart of most Bayesian and Maximum Likelihood phylogenetics package.
BEAGLE/4.0.0/amd
BEAGLE/4.0.0/gpu
BEAGLE/4.0.0/intel

(Default)

Phylogenetics
BEAST BEAST is a cross-platform program for Bayesian analysis of molecular sequences using MCMC. It is entirely orientated towards rooted, time-measured phylogenies inferred using strict or relaxed molecular clock models.
BEAST/1.10.4

 

Phylogenetics
BEAST2 BEAST 2 is a cross-platform program for Bayesian phylogenetic analysis of molecular sequences. It estimates rooted, time-measured phylogenies using strict or relaxed molecular clock models.
BEAST2/2.6.7

 

Phylogenetics
bedtools bedtools – the swiss army knife for genome arithmetic
bedtools/2.30.0

 

NGS, Data Format, BAM, BED, GFF, GTF, VCF
berkeleydb Oracle Berkeley DB
berkeleydb/18.1.40

 

embedded key-value database
bismark Bismark is a tool to map bisulfite converted sequence reads and determine cytosine methylation states
bismark/0.23.1

 

NGS, Bisulfite Sequencing, Methylation Call
blast-plus BLAST finds regions of similarity between biological sequences.
blast-plus/2.13.0

 

Alignment, Sequeunce Query
boost/gcc Boost provide free-reviewd portable C++ source libraries, emphasizing libraries that work well woth the C++ Standard Library.
boost/gcc/1.77.0

 

boost/gcc/1.80.0

(Default)

C++ Libraries
bowtie Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour.
bowtie/1.3.1

 

NGS, Aligner
bowtie2 Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes.
bowtie2/2.4.4

(Default)

NGS, Aligner
Bracken Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.
Bracken/2.6.2

 

NGS, Metagenomics
BRAKER BRAKER2 is an extension of BRAKER1 which allows for fully automated training of the gene prediction tools GeneMark-EX R14, R15, R17, F1 and AUGUSTUS from RNA-Seq and/or protein homology information, and that integrates the extrinsic evidence from RNA-Seq and protein homology information into the prediction.
BRAKER/2.1.6

 

Gene structure annotation
bsmap BSMAP is a short reads mapping software for bisulfite sequencing reads.
bsmap/2.9.0

 

NGS, Bisulfite Sequencing, Genome Mapping
busco BUSCO – Benchmarking sets of Universal Single-Copy Ortholog
busco/5.3.2

 

Ortholog
Canu Canu is a fork of the Celera Assembler designed for high-noise single-molecule sequencing.
Canu/2.2

 

NGS, Genome Assembler
CellPhoneDB CellPhoneDB is a publicly available repository of curated receptors, ligands and their interactions. Subunit architecture is included for both ligands and receptors, representing heteromeric complexes accurately
CellPhoneDB/2.1.7
Receptors / Ligands database
CellProfiler CellProfiler is a free open-source software designed to enable biologists without training in computer vision or programming to quantitatively measure phenotypes from thousands of images automatically.
CellProfiler/4.2.1

 

Cell Imaging
CellRanger Cell Ranger is a set of analysis pipelines that process Chromium single-cell data to align reads, generate feature-barcode matrices, perform clustering and other secondary analysis, and more.
CellRanger/6.1.2

 

NGS, RNA-Seq, Single Cell
CheckM CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes. It provides robust estimates of genome completeness and contamination by using collocated sets of genes that are ubiquitous and single-copy within a phylogenetic lineage
CheckM/1.1.3
Metagenomics, Quality Control
CheckV CheckV is a fully automated command-line pipeline for assessing the quality of single-contig viral genomes, including identification of host contamination for integrated proviruses, estimating completeness for genome fragments, and identification of closed genomes
CheckV/0.8.1
Metagenomics, viral genomes
cmake A cross-platform, open-source build system. CMake is a family of tools designed to build, test and package software.
cmake/3.19.7

 

Make, Configure Tool
CMSeq CMSeq is a set of commands to provide an interface to .bam files for coverage and sequence consensus
CMSeq/1.0.4

 

NGS, Data Format, BAM
CNVnator a tool for CNV discovery and genotyping from depth-of-coverage by mapped reads
CNVnator/0.4.1

 

NGS, Structural Variant, CNV
CNVpytor CNVnator is a python extension of CNVnator — a tool for CNV analysis from depth-of-coverage by mapped reads
CNVpytor/1.0

 

NGS, Structural Variant, CNV
comsol COMSOL.
comsol/6.0
comsol/6.1
conos/R-4.1.2 R package wires together large collections of single-cell RNA-seq datasets, which allows for both the identification of recurrent cell clusters and the propagation of information between datasets in multi-sample or atlas-scale collections.
conos/R-4.1.2/1.4.4

 

NGS, RNA-seq, Single Cell
cpmd CPMD – Car-Parrinello Molecular Dynamics simulations.
cpmd/4.1

 

cpmd/4.3-impi2020u4

 

cpmd/4.3

(Default)

Density Functional Theory, ab-initio molecular dynamics
cp2k CP2K is a quantum chemistry and solid state physics software packages that can perform atomistic simulations of solid state, liquid, molecular, periodic, material, crystal and biological systems.
cp2k/2023.1
Quantum Chemistry, Simulations, Atoms
CTFFIND CTFFIND4: Fast and accurate defocus estimation from electron micrographs
CTFFIND/4.1.14

 

Cryo-EM, Micrograph
cuda NVIDIA CUDA Toolkit – comprehensive developemnt environment for C and C++ developers building GPU-accelerated applications
cuda/11.2

 

NVIDIA, CUDA, GPU
cuda-toolkit NVIDIA CUDA Toolkit – comprehensive developemnt environment for C and C++ developers building GPU-accelerated applications
cuda-toolkit/11.7

 

NVIDIA, CUDA, GPU
cudnn NVIDIA CUDNN Library – CUDA-based Deep Neural Network library
cudnn/8.2.4-cuda11.4

 

NVIDIA, GPU, CUDA, cuDNN
cufflinks Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples.
cufflinks/2.2.1

 

NGS, RNA seq
curl CURL is an open source command line tool and library for transferring data with URL syntax
curl/7.75.0

 

Downloader
cutadapt Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.
cutadapt/3.4
cutadapt/3.5

(Default)

Bioinformatics, sequence trimming
cytoscape Cytoscape is an open source software platform for visualizing complex networks and integrating these with any type of attribute data.
cytoscape/3.9.1

 

Network analysis
dadi dadi implements methods for demographic history and selection inference from genetic data, based on diffusion approximations to the allele frequency spectrum.
dadi/2.1.2

 

Demographic Inference
deepTools deepTools addresses the challenge of handling the large amounts of data that are now routinely generated from DNA sequencing centers. deepTools contains useful modules to process the mapped reads data for multiple quality checks, creating normalized coverage files in standard bedGraph and bigWig file formats, that allow comparison between different files (for example, treatment and control)
deepTools/3.5.1

 

NGS, Quality Control, Visualization
delly DELLY2: Structural variant discovery by integrated paired-end and split-read analysis
delly/0.9.1

 

NGS, Structural Variant
DensityMap DensityMap is perl tool for the visualization of features density along chromosomes
DensityMap/1.0

 

Chromosomes, Visualizations
DESeq2/R-4.1.2 DESeq2: Differential gene expression analysis based on the negative binomial distribution.
DESeq2/R-4.1.2/1.34.0

 

NGS, RNA-seq
diamond DIAMOND is a sequence aligner for protein and translated DNA searches, designed for high performance analysis of big sequence data.
diamond/2.0.9
diamond/2.0.13

(Default)

Aligner
dotnet-sdk .NET is a free and open-source, managed computer software framework for Windows, Linux, and macOS operating systems..
dotnet-sdk/3.1.100

 

.NET runtime
DoubletFinder/R-4.1.2 DoubletFinder is an R package that predicts doublets in single-cell RNA sequencing data.
DoubletFinder/R-4.1.2/2.0

 

NGS, RNA-seq, Single Cell
dRep dRep is a python program for rapidly comparing large numbers of genomes. dRep can also ‘de-replicate’ a genome set by identifying groups of highly similar genomes and choosing the best representative genome for each genome set.
dRep/3.2.2

 

Metagenomics, Microbial-genomics
DROP Detection of aberrant gene expression events in RNA sequencing data
DROP/1.1.1

 

NGS, RNA-Seq, Single Cell
Dsuite Dsuite: Fast calculation of Paterson’s D (ABBA-BABA) and the f4-ratio statistics across many populations/species
Dsuite/0.5_r44

 

Population Genetics, Molecular Ecology
EBSeq/R-4.1.2 EBSeq: An R package for gene and isoform differential expression analysis of RNA-seq data
EBSeq/R-4.1.2/1.34.0

 

NGS, RNA-seq, Single Cell
eigen Eigen is a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms.
eigen/3.4.0

 

C++ tempalte, Linear Algebra, Matrices, Vectos
eigensoft The EIGENSOFT package implements methods from the following 2 papers: Patterson et al. 2006 PLoS Genet 2:e190 [population structure], Price et al. 2006 Nat Genet 38:904-9 [EIGENSTRAT stratification correction]
eigensoft/7.2.1

 

Population Stratification
ensembl-vep Ensembl Variant Effect Predictor(VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions
ensembl-vep/103.1
ensembl-vep/104.3

(Default)

NGS, Variant Effect Annotator
entrez-direct Entrez Direct (EDirect) is an advanced method for accessing the NCBI’s set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window.
entrez-direct/16.2

 

Sequence Retrieval
EthSEQ/R-4.1.2 EthSEQ: Ethnicity Annotation from Whole Exome Sequencing Data.
EthSEQ/R-4.1.2/2.1.4

 

NGS, Ethnicity Analysis
evidencemodeler The EVidenceModeler (aka EVM) software combines ab intio gene predictions and protein and transcript alignments into weighted consensus gene structures.
evidencemodeler/1.1.1

 

Gene prediction
exonerate Exonerate is a generic tool for pairwise sequence comparison. It allows you to align sequences using a many alignment models, either exhaustive dynamic programming or a variety of heuristics
exonerate/2.4.0

 

Sequence alignment
FastANI FastANI is developed for fast alignment-free computation of whole-genome Average Nucleotide Identity (ANI). ANI is defined as mean nucleotide identity of orthologous gene pairs shared between two microbial genomes
FastANI/1.32

 

Microbiology, Genome assembly comparison
fastp fastp is a tool designed to provide fast all-in-one preprocessing for FastQ files
fastp/0.23.2

 

NGS, Data Format, fastq
FastQC FastQC is a program designed to spot potential problems in high througput sequencing datasets. It runs a set of analyses on one or more raw sequence files in fastq or bam format and produces a report which summarises the results.
FastQC/0.11.9

 

NGS, fastq, Quality Control
FastQScreen FastQ-Screen is used for detecting contamination in NGS data and multi-species analysis.
FastQScreen/0.15.2

 

NGS, fastq, Quality Control
FastSimCoal2 FastSimCoal2 – fast sequential markov coalescent simulation of genomic data under complex evolutionary models.
FastSimCoal2/fsc27-binary

 

Evolutionary Model, Genome Simulation
FastTree FastTree infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences.
FastTree/2.1.10

 

Phylogenetics, 16S rRNA
FASTX-Toolkit The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.
FASTX-Toolkit/0.0.14

 

NGS, Data Format, fastq
FEOS The equation of state package FEOS for high energy density matter
FEOS/20130701

 

Equation of State; High Energy Density matter
ffmpeg ffmpeg: Cross-platform solution to record, convert and stream audio and video.
ffmpeg/5.1.0

(Default)

Audio and Veido conversion
fftw FFTW – Software library implementation of the Fast Fourier Transform(FFT) algorithm for computing Discrete Fourier Transform(DFT) compiled with MPICH libraries
fftw/3.3.9-gcc10.2
fftw/3.3.9

(Default)

Fast Fourier Transform
fgbio fgbio is a set of tools to analyze genomic data with a focus on Next Generation Sequencing
fgbio/1.4.0

 

NGS
FRASER/R-4.1.2 Detection of rare aberrant splicing events in transcriptome profiles. The workflow aims to assist the diagnostics in the field of rare diseases where RNA-seq is performed to identify aberrant splicing defects.
FRASER/R-4.1.2/1.6.0

 

NGS, RNA-seq, Splicing
freebayes freebayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms), indels (insertions and deletions), MNPs (multi-nucleotide polymorphisms), and complex events (composite insertion and substitution events) smaller than the length of a short-read sequencing alignment
freebayes/1.3.5

 

NGS, SNP, Variant
FreeSurfer FreeSurfer is a software package for the analysis and visualization of structural and functional neuroimaging data from cross-sectional or longitudinal studies
FreeSurfer/7.3.2

 

Neuroimaging
fsl FSL is a comprehensive library of analysis tools for FMRI, MRI and DTI brain imaging data
fsl/6.0.6.2

 

Neuroimaging
FusionCatcher FusionCatcher is a finder of Somatic Fusion Genes in RNA-seq data.
FusionCatcher/1.33

 

NGS, RNA-Seq
gatk GATK is a collection of command-line tools for analyzing high-throughput sequencing data with a primary focus on variant discov
gatk/4.1.5.0
gatk/4.2.4.0

(Default)

NGS, Variant, CNV, Genome Mapping
gaussian Gaussian: A computational chemistry software of electronic structure modeling
gaussian/g09d01
gaussian/g16a03-avx2
gaussian/g16c01-avx2

(Default)

Computational Chemistry, Quantum Chemistry
gcc GCC – GNU Compiler Collection includes Fortran, C, C++ compilers and libraries for these langauges
gcc/9.2
gcc/10.2

(Default)

Compiler, C, C++, Fortran
GCTF Gautomatch – Fully automatic acccurate, convenient and extremely fast particle picking for EM
GCTF/0.56

 

CryoEM
gdal GDAL is a translator library for raster and vector geospatial data formats that is released under an X/MIT style Open Source license by the Open Source Geospatial Foundation. As a library, it presents a single raster abstract data model and vector abstract data model to the calling application for all supported formats. It also comes with a variety of useful command line utilities for data translation and processing.
gdal/3.2.2
gdal/3.3.2
gdal/3.3.3

(Default)

Geospatial
GEMMA GEMMA is a software toolkit for fast application of linear mixed models (LMMs) and related models to genome-wide association studies (GWAS) and other large-scale data sets
GEMMA/0.98.3

 

Statistical Genetics, GWAS
GeneMark-ES GeneMark-ES algorithm identifies protein coding genes in eukaryotic genomes. This is the only eukaryotic gene finder that can perform gene prediction without curated training sets.
GeneMark-ES/4.68

 

Eukaryotic gene prediction
genomethreader GenomeThreader is a software tool to compute gene structure predictions. The gene structure predictions are calculated using a similarity-based approach where additional cDNA/EST and/or protein sequences are used to predict gene structures via spliced alignments.
genomethreader/1.7.1

 

Gene prediction
geos GEOS is a C/C++ library for spatial computational geometry of the sort generally used by “geographic information systems” software. GEOS is a core dependency of PostGIS, QGIS, GDAL, and Shapely.
geos/3.8.2
geos/3.9.1
geos/3.10.0

(Default)

Geometry Engine, Geospatial
GffCompare GffCompare is a tool to classify, merge, track and annotate GFF files by comparing to a reference annotation GFF
GffCompare/0.11.2

 

Genome Annotation
GISTIC GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers.
GISTIC/2.0.23

 

Oncology, Oncogenomics, Somatic Variant
glimmerhmm GlimmerHMM is a new gene finder based on a Generalized Hidden Markov Model (GHMM).
glimmerhmm/3.0.4

 

Eukryotic gene prediction
gmp GMP – The GNU Multiple Precision Arithmetic Library
gmp/6.2.1

 

gnuparallel GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input
gnuparallel/20211222

(Default)

Utilities, Parallel
gnuplot Visualization, Plotting
gnuplot/5.4.2

 

Gnuplot is a portable command-line driven graphing utility
go The Go Programming Language
go/1.18.5
go/1.19.4

(Default)

Programming Language
googletest Google test framework for C++. Also called gtest. Google, Testing Library
gpumd GPUMD – Graphics Processing Units Molecular Dynamics
gpumd/2.7

 

GPU, Molecular Dynamics
graphviz The Graphviz layout programs take descriptions of graphs in a simple text language, and make diagrams in several useful formats such as images and SVG for web pages, Postscript for inclusion in PDF or other documents; or display in an interactive graph browser.
graphviz/2.50.0

 

Plotting, Visualization
gromacs GROMACS is a molecular dynamics package mainly designed for simulations of proteins, lipids, and nucleic acids.
gromacs/2021.3

(Default)

Molecular Dynamics, Protein, Lipid, DNA, Nucleic Acid
gsl/gcc GSL – GNU Scientific Library
gsl/gcc/2.7

 

Numerical Libary, C, C++
gsl/intel GSL – GNU Scientific Library
gsl/intel/2.7

 

Numerical Libary, C, C++
harfbuzz An OpenType text shaping engine
harfbuzz/5.3.0

 

OpenType
harmony/R-4.1.2 harmony: Scalable integration of single cell RNAseq data for batch correction and meta analysis
harmony/R-4.1.2/0.1

 

NGS, RNA-seq, Single Cell
hdf5/gcc HDF5 – suite for managing extremely large can complex data collections.
hdf5/gcc/1.10.7-gcc8.3.1
hdf5/gcc/1.12.2-gcc8.3.1

(Default)

Hierarchical Data Format
hdf5/impi HDF5 – suite for managing extremely large can complex data collections.
hdf5/impi/1.10.7-impi2021
hdf5/impi/1.12.2-impi2022

(Default)

Hierarchical Data Format
HISAT2 HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome
HISAT2/2.2.1

 

NGS, aligner
HMMcopy/R-4.1.2 HMMcopy: Copy number prediction with correction for GC and mappability bias for HTS data
HMMcopy/R-4.1.2/1.36.0

 

NGS, Structural Variant
HMMER HMMER is used for searching sequence databases for sequence homologs, and for making sequence alignments. It implements methods using probabilistic models called profile hidden Markov models (profile HMMs).
HMMER/3.3.2

 

Sequence Analysis, Sequence Clustering
HOMER HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and ChIP-Seq analysis, primarily written as a de novo motif discovery algorithm that is well suited for finding 8-12 bp motifs in large scale genomics data.
HOMER/4.11

 

NGS, ChIP-seq
HTSeq HTSeq is a Python library to facilitate programmatic analysis of data from high-throughput sequencing (HTS) experiments.
HTSeq/1.99.2

 

NGS, RNA-seq
htslib HTSlib is an implementation of a unified C library for accessing common file formats, such as SAM, CRAM and VCF, used for high-throughput sequencing data, and is the core library used by samtools and bcftools.
htslib/1.14

 

NGS, Data Format, VCF
HUMAnN2 HUMAnN 2.0 is a pipeline for efficiently and accurately profiling the presence/absence and abundance of microbial pathways in a community from metagenomic or metatranscriptomic sequencing data (typically millions of short DNA/RNA reads).
HUMAnN2/2.8.1

 

Metagenomics, Microbial Profiling
HUMAnN3 HUMAnN 2.0 is a pipeline for efficiently and accurately profiling the presence/absence and abundance of microbial pathways in a community from metagenomic or metatranscriptomic sequencing data (typically millions of short DNA/RNA reads).
HUMAnN3/3.0.0

 

Metagenomics, Microbial Profiling
hyphy An open-source software package for comparative sequence analysis using stochastic evolutionary models.
hyphy/2.5.42
hyphy/2.5.51

(Default)

Comparative Genomics, Evolution
IGV The Integrative Genomics Viewer (IGV) is a high-performance, easy-to-use, interactive tool for the visual exploration of genomic data. It supports flexible integration of all the common types of genomic data and metadata, investigator-generated or publicly available, loaded from local or cloud sources.
IGV/2.11.4
IGV/2.15.4

(Default)

Genome, Visualization
IGV-snapshot-automator IGV Snapshot Automator is a script to automatically create and run IGV snapshot batchscripts. This script will first write an IGV batch script for the supplied input files, then load all supplied files for visualization (.bam, etc) in a headless IGV session and take snapshots at the locations defined in the regions.bed file.
IGV-snapshot-automator/20.11.1

 

Genome, Visualization
imagemagick Software suite to create, edit, compose, or convert bitmap images.
imagemagick/7.1.0.43

 

Graphics, Images
impi Intel C/C++/Fortran Compilers with Intel MPI Libraries and profiler tools.
impi/2019u4
impi/2020u4
impi/2021.1
impi/2021.4
impi/2022.1
impi/2022.2

(Default)

Intel, MPI, C, C++, Fortran, Compiler
IMPUTE2 IMPUTE version 2 (also known as IMPUTE2) is a genotype imputation and haplotype phasing program based on ideas from Howie et al. 2009
IMPUTE2/2.3.2

 

GWAS, Genotype Imputation
InferCNV/R-4.1.2 InferCNV: Inferring copy number alterations from tumor single cell RNA-Seq data
InferCNV/R-4.1.2/1.3.3

 

NGS, RNA-seq, Single Cell
Infernal Infernal (INFERence of RNA ALignment) is for searching DNA sequence databases for RNA structure and sequence similarities. It is an implementation of a special case of profile stochastic context-free grammars called covariance models (CMs).
Infernal/1.1.4

 

Homolog Search, RNA Alignment
inStrain InStrain is a tool for analysis of co-occurring genome populations from metagenomes that allows highly accurate genome comparisons, analysis of coverage, microdiversity, and linkage, and sensitive SNP detection with gene localization and synonymous non-synonymous identification
inStrain/1.5.5

 

Metageomics
intel Intel C/C++/Fortran Compilers with Intel MKL: Optimized compilers, math libraries with debug and tuning tools.
intel/2019u4
intel/2020u4
intel/2021.1
intel/2021.4
intel/2022.1
intel/2022.2

(Default)

Intel, Compiler, C, C++, Fortran
InterProScan InterPro provides functional analysis of proteins by classifying them into families and predicting domains and important sites.
InterProScan/5.52_86.0
InterProScan/5.54_87.0
InterProScan/5.59_91.0

(Default)

Protein functional classifications
IQ-TREE IQ-TREE is a fast and effective stochastic algorithm to infer phylogenetic trees by maximum likelihood.
IQ-TREE/1.6.12

 

IQ-TREE/2.1.3

(Default)

Phylogenetics
JAGS JAGS: Just Another Gibbs Sampler
JAGS/4.3.0

 

MCMC simulation, Gibbs Sampler
jsonc A JSON implementation in C.
jsonc/0.13.1

 

jsonc/0.15

(Default)

json-parse/Perl-5.34.0 json-parse: A PERL module for parsing JSON
json-parse/Perl-5.34.0/0.61

 

Perl, JSON, Parser
julia Julia – a hige-level programming lauguage for numerical computing.
julia/1.6.1

(Default)

Numerical Computing
kaiju Kaiju is a program for the taxonomic classification of high-throughput sequencing reads, e.g., Illumina or Roche/454, from whole-genome sequencing of metagenomic DNA
kaiju/1.8.2

 

Metageomics, Taxonomy Classification
kallisto kallisto is a program for quantifying abundances of transcripts from bulk and single-cell RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.
kallisto/0.46.2

 

NGS, RNA-seq, Single Cell
kneaddata KneadData is a tool designed to perform quality control on metagenomic and metatranscriptomic sequencing data, especially data from microbiome experiments.
kneaddata/0.10.0

 

Metagenomics, Quality Control
kofamscan KofamKOALA assigns K numbers to the user’s sequence data by HMMER/HMMSEARCH against KOfam
kofamscan/1.3.0

 

Annotatiokn, Pathway
KOMB KOMB: Taxonomy-oblivious characterization of metagenome dynamics
KOMB/1.0

 

Metagenomics, Functional Analysis
kraken2 Kraken is a taxonomic sequence classifier that assigns taxonomic labels to DNA sequences. Kraken examines the $k$-mers within a query sequence and uses the information within those $k$-mers to query a database.
kraken2/2.1.2

 

Metageomics, Taxonomy Classification
krona Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.
krona/2.8.1

 

Metageomics, Taxonomy Classification
lammps LAMMPS stands for Large-scale Atomic/Molecular Massively Parallel Simulator. It is a classical molecular dynamics simulation code that models an ensemble of particles in a liquid, solid, or gaseous state.
lammps/20210929

 

lammps/20220803

(Default)

Molecular Dynamics, Simulations, Atoms
LASTZ LASTZ is a program for aligning DNA sequences, a pairwise aligner. Originally designed to handle sequences the size of human chromosomes and from different species.
LASTZ/1.04.15

 

NGS, DNA Aligner
LDhat LDhat: Estimate recombination rates from population genetic data
LDhat/2.2a

 

Population Genetics
LDhelmet Software package for estimating fine-scale recombination rate.
LDhelmet/1.9

 

Population Genetics
libgeotiff GeoTIFF represents an effort by over 160 different remote sensing, GIS, cartographic, and surveying related companies and organizations to establish a TIFF based interchange format for georeferenced raster imagery.
libgeotiff/1.6.0

 

libjpeg-turbo Libjpeg-turbo is a fork of the original IJG libjpeg which uses SIMD to accelerate baseline JPEG compression and decompression.
libjpeg-turbo/2.0.6

 

libpng Libpng is te official PNG reference library.
libpng/1.6.37

 

libtiff LibTIFF – Tag Image File Format(TIFF) Library and Utilities
libtiff/4.2.0

 

libxml2 Libxml2 is the XML C parser and toolkit developed for the Gnome project
libxml2/2.9.10

 

LIGGGHTS LIGGGHTS® is an Open Source Discrete Element Method Particle Simulation Software
LIGGGHTS/3.8.0

 

Molecular Dynamics, Simulations, Atoms
MAFFT MAFFT is a multiple sequence alignment program for unix-like operating systems. It offers a range of multiple alignment methods, L-INS-i (accurate; for alignment of <∼200 sequences), FFT-NS-2 (fast; for alignment of <∼30,000 sequences), etc
MAFFT/7.490

 

Sequence Analysis, Sequence Clustering
maker MAKER is a portable and easily configurable genome annotation pipeline. Its purpose is to allow smaller eukaryotic and prokaryotic genome projects to independently annotate their genomes and to create genome databases.
maker/3.01.03

 

Genome Annotation
manta Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs
manta/1.6.0

 

NGS, Structural Variant
MaSuRCA The MaSuRCA (Maryland Super Read Cabog Assembler) genome assembly and analysis toolkit contains of MaSuRCA genome assembler, QuORUM error corrector for Illumina data, POLCA genome polishing software, Chromosome scaffolder, jellyfish mer counter, and MUMmer aligner
MaSuRCA/4.0.9

 

Genome Assembler
matlab MATLAB – High-level technical computing lauguage for data analysis and numerical computation.
matlab/r2021a
matlab/r2021b
matlab/r2022b
matlab/r2023b

(Default)

Numerical Computing
maxquant MaxQuant is a quantitative proteomics software package designed for analyzing large mass-spectrometric data sets. License restricted
maxquant/2.2.0

 

Proteomics, Mass Spectrometry, MS
mcl MCL, the Markov Cluster algorithm, also known as Markov Clustering, is a method and program for clustering weighted or simple networks, a.k.a. graphs.
mcl/14.137

 

Graph
MEGA MEGA: Software package for phylogenetic analysis with a graphical user interface. It allows viewing and editing of the aligned input sequence data and provides many tools for phylogenetic and statistical analysis of the alignments.
MEGA/11.0.10

 

Phylogenetics
MEGAHIT MEGAHIT is an ultra-fast and memory-efficient NGS assembler. It is optimized for metagenomes, but also works well on generic single genome assembly (small or mammalian size) and single-cell assembly.
MEGAHIT/1.2.9

 

Metagenomcis, Genome Assembler
meme The MEME Suite is a motif-based sequence analysis tools
meme/5.4.1

(Default)

Motif Sequence Analysis
MetaBAT MetaBAT: A robust statistical framework for reconstructing genomes from metagenomic data
MetaBAT/2.15

 

Metagenomics, Taxonomy Classification
MetaPhlAn MetaPhlAn ‘Metagenomic Phylogenetic Analysis’ is a computational tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data.
MetaPhlAn/3.0.13
MetaPhlAn/4.0.3

(Default)

Metagenomics, Microbial Profiling
Metaxa2 Metaxa2: Improved Identification and Taxonomic Classification of Small and Large Subunit rRNA in Metagenomic Data
Metaxa2/2.2

 

Metagenomics, Taxonomy Classification
miniconda/py39 Miniconda: an open source package management system and environment management system
miniconda/py39/4.10.3

 

Python, Conda, Installer, Package
minimap2 minimap is a versatile pairwise aligner for genomic and spliced nucleotide sequences
minimap2/2.23

(Default)

Aligner
mitofinder MitoFinder – efficient automated large-scale extraction of mitogenomic data from high throughput sequencing data
mitofinder/1.4.1

 

Bioinformatics, Mitochondria, NFS
mity mity: A highly sensitive mitochondrial variant analysis pipeline for whole genome sequencing data
mity/0.3.0

 

Mitochondrial variant
mkl Intel Math Kernel Library (MKL)
mkl/2020u4
mkl/2021.1
mkl/2021.4
mkl/2022.1
mkl/2022.2

(Default)

Math Routine, BLAS, LAPACK, ScaLAPACK, sparse solvers, fast Fourier transforms
mlst Scan contig files against traditional PubMLST typing schemes
mlst/2.22.1

 

Sequence Typing, Bacteria
momap MOMAP – Molecular Material Property Prediction Packagei, a suite of programs for predicting the properties of polyatomic molecules.
momap/2021A-mpich2

 

Molecular Material
mOTUs mOTUs is a tool for microbial abundance, activity and population genomic profiling
mOTUs/2.1.1

 

Metagenomics, Microbial Profiling
mpc MPC – The GNU Multiple Precision C Library
mpc/1.2.1

 

mpfr MPFR – The GNU Multiple Precision Floating-Point Library
mpfr/4.1.0

 

mpich/gcc Message Passing MPICH libraries with GNU Compiler for parallel and distributed computing.
mpich/gcc/3.4.2-gcc8.3.1

 

MPI, Parallel, Distributed
mpich/intel Message Passing MPICH libraries with GNU Compiler for parallel and distributed computing.
mpich/intel/3.4.2-intel2021

 

MPI, Parallel, Distributed
MSGFgui/R-4.1.2 MSGFplus: This package makes it possible to perform analyses using the MSGFplus package in a GUI environment.
MSGFgui/R-4.1.2/1.28.0

 

Mass Spectrometry
MultiQC MultiQC is a tool to create a single report with interactive plots for multiple bioinformatics analyses across many samples.
MultiQC/1.11

 

NGS, Quality Control
MUMMER MUMmer is a versatile alignment tool for DNA and protein sequences
MUMMER/3.23

 

Aligner
muscle MUSCLE: multiple sequence alignment with high accuracy and high throughput.
muscle/5.1

 

Multiple Sequence Alignment
nasm NASM (Netwide Assembler) is an 80×86 assembler designed for portability and modularity. It includes a disassembler as well.
nasm/2.15.05

 

x86 Assembly
NCL NCL (NCAR Command Language).
NCL/6.6.2

 

NeEstimator NeEstimator V2.1 estimates contemporary effective population size (Ne) using multi-locus diploid genotypes from population samples.
NeEstimator/2.1

 

Population Genetics, Molecular Ecology
NetLogo NetLogo is a multi-agent programmable modeling environment.
NetLogo/6.2.2

 

Modelling
nextflow A DSL for data-driven computational pipelines
nextflow/22.10.0

 

Pipeline, Workflow
NextGenMap https://github.com/philres/NextGenMap.
NextGenMap/0.5.5

 

Sequence Mapping
ngsRelate ngsTools: Program for inferring relatedness and other summary statistics
ngsRelate/2022-09-26

 

Next generation sequencing
ngsTools ngsTools: Programs to analyse NGS data for population genetics purposes
ngsTools/2020-07-23

 

Population Genetics, Molecular Ecology
NIRVANA Nirvana provides clinical-grade annotation of genomic variants (SNVs, MNVs, insertions, deletions, indels, and SVs (including CNVs). It can be run as a stand-alone package or integrated into larger software tools that require variant annotation.
NIRVANA/3.17.0

 

NGS, Annotation
nvhpc NVIDIA HPC SDK includes compilers, libraries and software tools support GPU-accelerated HPC applications
nvhpc/20.11
nvhpc/21.3
nvhpc/22.3
nvhpc/22.7

(Default)

NIVIDA, GPU, CUDA, Compiler
n2p2 n2p2 – A neural network potential package.
n2p2/2.2.0

 

Neural Network
openfoam OpenFOAM (for Open-source Field Operation And Manipulation) is a C++ toolbox for the development of customized numerical solvers, and pre-/post-processing utilities for the solution of continuum mechanics problems, most prominently including computational fluid dynamics (CFD).
openfoam/2206

 

Fluid Dynamics
openjdk OpenJDK (Open Java Development Kit) is a free and open-source implementation of the Java Platform, Standard Edition (Java SE).
openjdk/11.0.9.1

 

java, jdk, openjdk, jar
openmpi/aocc An open source Message Passing Interface implementation.
openmpi/aocc/4.1.0-aocc3.1

 

MPI
openmpi/gcc An open source Message Passing Interface implementation.
openmpi/gcc/4.1.0-gcc8.3.1
openmpi/gcc/4.1.0-gcc10.2
openmpi/gcc/4.1.4-gcc10.2

(Default)

MPI
openmpi/intel An open source Message Passing Interface implementation.
openmpi/intel/4.1.0-intel2020u4

 

MPI
OptiType OptiType is a novel HLA genotyping algorithm based on integer linear programming, capable of producing accurate 4-digit HLA genotyping predictions from NGS data by simultaneously selecting all major and minor HLA Class I alleles
OptiType/1.3.5

 

NGS, HLA-Typing
orca ORCA – general purpose tool for quantum chemistry with specific emphasis on spectroscopic properties of open-shell molecules.
orca/5.0.0
orca/5.0.2
orca/5.0.3

(Default)

Quantum Chemistry
orthofinder Phylogenetic orthology inference for comparative genomics
orthofinder/2.5.4

 

Orthology
pandoc Pandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library.
pandoc/2.16.2

 

Doc Convert
ParallelFold ParallelFold: Modified version of Alphafold to divide CPU part (MSA and template searching) and GPU part. This can accelerate Alphafold when predicting multiple structures
ParallelFold/2.1.2

 

Structural Bioinformatics, Protein Structure Prediction, AI
paraview ParaView is an open-source, multi-platform data analysis and visualization application based on Visualization Toolkit (VTK).
paraview/5.9.0-binary
paraview/5.10.0

(Default)

Visualization
pcre PCRE – Perl-Compatible Regular Expressions
pcre/8.45

 

pcre2 PCRE2 – Perl-Compatible Regular Expressions
pcre2/10.40

 

perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages.
perl/5.34.0

 

Programming Language, Script
perl-lib Perl-lib allows for user-installed Perl modules in home folder
perl-lib/5.34.0

 

Programming Language, Script
PGDSpider PGDSpider is a powerful automated data conversion tool for population genetic and genomics programs. It facilitates the data exchange possibilities between programs for a vast range of data types (e.g. DNA, RNA, NGS, microsatellite, SNP, RFLP, AFLP, multi-allelic data, allele frequency or genetic distances)
PGDSpider/2.1.1.5

 

Population Genetics, Molecular Ecology
phylip PHYLIP is a free package of programs for inferring phylogenies.
phylip/3.697

 

Phylogenetics
phyloseq/R-4.2.1 phyloseq provides a set of classes and tools to facilitate the import, storage, analysis, and graphical display of microbiome census data.
phyloseq/R-4.2.1/1.42.0

 

Phylogenetics, Microbiome
PhyML PhyML is a software package that uses modern statistical approaches to analyse alignments of nucleotide or amino acid sequences in a phylogenetic framework.
PhyML/3.3.20220408

 

Phylogenetics
picard Picard is a set of Java command line tools for manipulating high-throughput sequencing (HTS) data and formats.
picard/2.26.6

(Default)

NGS, data formats
pigz Parallel implementation of gzip
pigz/2.6

 

File compression
PLINK PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
PLINK/1.90b6.24
PLINK/2.00a2.3
PLINK/2.00a3

(Default)

GWAS
pmi
pmi/pmix-x86_64

 

PosiGene PosiGene is a tool that (i) detects positively selected genes on genome-scale, (ii) allows analysis of specific evolutionary branches, (iii) can be used in arbitrary species contexts and (iv) offers visualization of the candidates.
PosiGene/0.1

 

Bioinformatics, Genome, Bacterial, Assembly, Short-read
postgresql PostgreSQL is a powerful, open source object-relational database system.
postgresql/13.2

 

Relational Database
ppanggolin Depicting microbial species diversity via a Partitioned PanGenome Graph
ppanggolin/1.2.74

 

Microbiome, Bacteria
prank PRANK is a probabilistic multiple alignment program for DNA, codon and amino-acid sequences.
prank/170427

 

Multiple Sequence Alignment
proj PROJ is a genertic coordinate transformation software, that transforms geospatial coordinates from one coordinate reference system (CRS) to another. This inculdes cartographic projections as well as geodetic transformations.
proj/7.2.1

 

proj/8.0.0

 

proj/8.1.1

(Default)

Coordinate Tranformation
prokka prokka: Rapid annotation of prokaryotic genomes
prokka/1.14.6

 

Prokaryote, Annotation
psi4 Open-Source Quantum Chemistry – an electronic structure package in C++ driven by Python
psi4/1.7+6ce35a5

 

Quantum Chemistry
pypopgen3 Tools for population genomic analysis for Python 3
pypopgen3/2021-11-23

 

Population Genetics
pyrho pyrho: Fast inference of fine-scale recombination rates based on fused-LASSO
pyrho/0.1

 

Population Genetics
python Python – A widely used high-level programming launuage.
python/3.9.2

 

python/3.9.7

(Default)

Programming Language, Data Science
QIIME2 QIIME 2 is a powerful, extensible, and decentralized microbiome analysis package with a focus on data and analysis transparency. QIIME 2 enables researchers to start an analysis with raw DNA sequence data and finish with publication-quality figures and statistical results.
QIIME2/2021.11

 

Microbiome, Microbiology
QualiMap Qualimap 2 is a platform-independent application written in Java and R that provides both a Graphical User Inteface (GUI) and a command-line interface to facilitate the quality control of alignment sequencing data and its derivatives like feature counts
QualiMap/2.2.1

 

NGS, Quality Control
R R – computing lauguage for statistical computation and graphics.
R/4.1.2-G
R/4.1.2-gcc
R/4.1.2-one
R/4.1.2
R/4.2.1
R/4.0.4

(Default)

Statistical Computing
RAxML-NG RAxML-NG is a phylogenetic tree inference tool which uses maximum-likelihood (ML) optimality criterion. Its search heuristic is based on iteratively performing a series of Subtree Pruning and Regrafting (SPR) moves.
RAxML-NG/1.1.0

 

Phylogenetics
RELION-cpu RELION (for REgularised LIkelihood OptimisatioN, pronounce rely-on) is a stand-alone computer program that employs an empirical Bayesian approach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM).
RELION-cpu/4.0b2

 

Cryo-EM
RELION-gpu RELION (for REgularised LIkelihood OptimisatioN, pronounce rely-on) is a stand-alone computer program that employs an empirical Bayesian approach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM).
RELION-gpu/4.0b2

 

Cryo-EM
repeatmasker RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences.
repeatmasker/4.1.2.p1

 

DNA repeat
repeatmodeler RepeatModeler is a de-novo repeat family identification and modeling package.
repeatmodeler/2.0.2a

 

DNA repeat
rmats rMATS turbo is the C/Cython version of rMAT, a computational tool to detect differential alternative splicing events from RNA-Seq data.
rmats/4.1.1

 

NGS, RNA-Seq
roary Rapid large-scale prokaryote pan genome analysis
roary/3.13.0

 

Prokaryote
ROOT ROOT enables statistically sound scientific analyses and visualization of large amounts of data
ROOT/6.24.6

 

Data Science
RSEM RSEM is a software package for estimating gene and isoform expression levels from RNA-Seq data. The RSEM package provides an user-friendly interface, supports threads for parallel computation of the EM algorithm, single-end and paired-end read data, quality scores, variable-length reads and RSPD estimation.
RSEM/1.3.3

 

NGS, RNA-Seq
RSeQC RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
RSeQC/4.0.0

 

NGS, RNA-Seq, Quality Control
RStudio The RStudio IDE is a set of integrated tools designed to help you be more productive with R and Python. It includes a console, syntax-highlighting editor that supports direct code execution, and a variety of robust tools for plotting, viewing history, debugging and managing your workspace.
RStudio/1.4.1717
RStudio/2022.02.3
RStudio/2022.07.1
RStudio/2022.12.0
RStudio/2023.03.0

(Default)

Data Science, Statistics, R, Python, IDE
Ruby Ruby is an interpreted, high-level, general-purpose programming language
Ruby/2.7.2

 

Programming Language
rvtests Rvtests, which stands for Rare Variant tests, is a flexible software package for genetic association analysis for sequence datasets. Since its inception, rvtests was developed as a comprehensive tool to support genetic association analysis and meta-analysis NGS, Variant Caller, Rare Variant
salmon Salmon is a wicked-fast program to produce a highly-accurate, transcript-level quantification estimates from RNA-seq data.
salmon/1.6.0

 

NGS, RNA-Seq
sambamba Sambamba is a high performance highly parallel robust and fast tool (and library), written in the D programming language, for working with SAM and BAM files.
sambamba/0.8.1

 

NGS, File Format
samtools samtools is a suite of programs for interacting with high-throughput sequencing data.
samtools/1.14

(Default)

NGS, Data Format, SAM
scanpy Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It includes preprocessing, visualization, clustering, trajectory inference and differential expression testing.
scanpy/1.7.2

 

NGS, RNA-Seq, Single Cell
scikit-bio scikit-bio is an open-source, BSD-licensed Python 3 package providing data structures, algorithms and educational resources for bioinformatics.
scikit-bio/0.5.6

 

Bioinformatics, Data Science
scotch Static Mapping, Graph, Mesh and Hypergraph Partitioning, and Parallel and Sequential Sparse Matrix Ordering Package
scotch/6.0.9

 

Mesh, Mapping, Graph
scran/R-4.1.2 R packages with methods for Single-Cell RNA-Seq Data Analysis.
scran/R-4.1.2/1.23.1

 

NGS, RNA-seq, Single Cell
Seurat/R-4.1.2 Seurat is an R toolkit for single cell genomics
Seurat/R-4.1.2/4.0.5
Seurat/R-4.1.2/4.3.0

(Default)

NGS, RNA-seq, Single Cell
Seurat/R-4.2.1 Seurat is an R toolkit for single cell genomics
Seurat/R-4.2.1/4.3.0

 

NGS, RNA-seq, Single Cell
shapeit4 Segmented HAPlotype Estimation and Imputation Tools version 4
shapeit4/4.2.2

 

Population Genetics
Signac/R-4.1.2 Signac is a comprehensive R package for the analysis of single-cell chromatin data. Signac includes functions for quality control, normalization, dimension reduction, clustering, differential activity, and more.
Signac/R-4.1.2/1.4.0

 

NGS, RNA-seq, Single Cell Chromatin
simplejson simplejson is a simple, fast, complete, correct and extensible JSON <http://json.org> encoder and decoder for Python
simplejson/3.17.6

 

JSON Parser
singularity Singularity – Enable using containers in HPC environments.
singularity/3.8.0

 

Docker-alternative, Container, sif, simg
smcpp SMC++ is a program for estimating the size history of populations from whole genome sequence data.
smcpp/1.15.2

 

NGS, WGS, Population Genetics
snakemake The Snakemake workflow management system is a tool to create reproducible and scalable data analyses.
snakemake/6.12.1

 

Workflow
SNAP SNAP is a general purpose gene finding program suitable for both eukaryotic and prokaryotic genomes. SNAP is an acroynm for Semi-HMM-based Nucleic Acid Parser.
SNAP/2013_11_29

 

Eukaryotic and Prokaryotic gene prediction
snpEff SnpEff is a Genetic variant annotation and functional effect prediction toolbox.It annotates and predicts the effects of genetic variants on genes and proteins (such as amino acid changes).
snpEff/5.0

 

Variant Functional Analysis
SNPhylo SNPhylo is a pipeline to generate a phylogenetic tree from huge SNP data.
SNPhylo/20180901

 

Phylogenetics
SortMeRNA SortMeRNA is a local sequence alignment tool for filtering, mapping and clustering.
SortMeRNA/4.3.4

 

NGS, Metagenomics, RNA-seq, Data Cleaning
sourmash Sourmash is a tool that quickly search, compare, and analyze genomic and metagenomic data sets.
sourmash/4.2.2

 

Metagenomics
SPAdes SPAdes – St. Petersburg genome assembler – is an assembly toolkit containing various assembly pipelines.
SPAdes/3.15.3
SPAdes/3.15.4

(Default)

Genome Assembler
sqlite SQLite3 is an SQL database engine in C library. Prgrams that link the SQLite3 library can have SQL database access without running a separate RDBMS process.
sqlite/3.35.2

 

Relational Database
squashfuse FUSE filesystem to mount squashfs archives
squashfuse/0.1.104

 

filesystem
sra-tools The SRA Toolkit provides a number of tools for download of data in Sequence Read Archive (SRA)
sra-tools/2.11.0

 

NCBI, Sequence Read Archive
srst2 Short Read Sequence Typing for Bacterial Pathogens
srst2/0.2.0

 

Sequence Typing, Bacteria
STAAR/R-4.1.2-gcc An R package for performing STAAR procedure in whole-genome sequencing studies
STAAR/R-4.1.2-gcc/0.9.6.2

 

NGS, WGS
stairway-plot The stairway plot is a method for inferring detailed population demographic history using the site frequency spectrum (SFS) from DNA sequence data
stairway-plot/2.1.1

 

Population Genetics, Molecular Ecology
STAR STAR: ultrafast universal RNA-seq aligner
STAR/2.7.9a

 

NGS, RNA-seq, Aligner
STAR-Fusion STAR-Fusion is a component of the Trinity Cancer Transcriptome Analysis Toolkit (CTAT). STAR-Fusion uses the STAR aligner to identify candidate fusion transcripts supported by Illumina reads. STAR-Fusion further processes the output generated by the STAR aligner to map junction reads and spanning reads to a reference annotation set.
STAR-Fusion/1.10.0

 

NGS, RNA-seq, Fusion Detection
stata STATA is a general-purpose statistical software package for data analysis, data management and graphics.
stata/16.1
stata/17.0
stata/18.0

(Default)

Statistical Computing
strelka Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation in small cohorts and somatic variation in tumor/normal sample pairs.
strelka/2.9.10

 

NGS, Variant Caller
StringTie Stringtie employs efficient algorithms for transcript structure recovery and abundance estimation from bulk RNA-Seq reads aligned to a reference genome.
StringTie/2.1.7

 

NGS, RNA-Seq, Expression Analysis
SvABA SvABA is a method for detecting structural variants in sequencing data using genome-wide local assembly
SvABA/1.1.0

 

NGS, Structural Variant
tcl The TCL programming language.
tcl/8.6.12

 

texlive texlive: An easy way to get up and running with the TeX document production system.
texlive/20220503

 

document
tk A dynamic programming language with GUI support. Bundles Tcl and Tk.
tk/8.6.12

 

GUI, Tk, Tcl
TrimGalore Trim Galore is a wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data.
TrimGalore/0.6.7

 

NGS, Adaptor Trimming
trimmomatic trimmomatic: A flexible read trimming tool for Illumina NGS data
trimmomatic/0.39

(Default)

NGS, Sequeince Trimmer
trinity Trinity assembles transcript sequences from Illumina RNA-Seq data.
trinity/2.13.2

 

trinity/2.14.0

(Default)

Illumina, RNA-Seq, Assembler
Trycycler Trycycler is a tool for generating consensus long-read assemblies for bacterial genomes
Trycycler/0.5.3

 

Bioinformatics, Genome, Bacterial, Assembly, Short-read
ucsc-kent UCSC Genome Browser source tree
ucsc-kent/2021-11-18

 

UCSC, Genome Browser
Unicycler Unicycler is an assembly pipeline for bacterial genomes
Unicycler/0.4.9
Unicycler/0.5.0

(Default)

Unicycler/0.5.0p

 

Bioinformatics, Genome, Bacterial, Assembly, Short-read
USEARCH USEARCH is a tool designed to enable high-throughput, sensitive search of very large sequence databases
USEARCH/11.0.667

 

Sequence Alignment, Sequence Clustering
VarDictJava VarDictJava is a variant discovery program written in Java and Perl. It is a Java port of VarDict variant caller.
VarDictJava/1.8.3

 

Variant Caller
VarScan VarScan is a platform-independent mutation caller for targeted, exome, and whole-genome resequencing data generated on Illumina, SOLiD, Life/PGM, Roche/454, and similar instruments.
VarScan/2.4.4

 

NGS, Mutation Caller
vasp5 VASP: Package for ab initiio quantum-machanical molecular dynamics simulation.
vasp5/5.4.4

 

Molecular Dynamics
vasp6 VASP: Package for ab initiio quantum-machanical molecular dynamics simulation.
vasp6/6.2.0
vasp6/6.2.1-impi2021
vasp6/6.2.1
vasp6/6.3.0
vasp6/6.3.1
vasp6/6.3.2

(Default)

Molecular Dynamics
vcftools VCFtools is a program package designed for working with VCF files, such as those generated by the 1000 Genomes Project. The aim of VCFtools is to provide easily accessible methods for working with complex genetic variation data in the form of VCF files.
vcftools/0.1.17

 

Bioinformatics, Genome, Sequence, VCF
velvet Velvet is a short read de novo assembler using de Bruijn graphs
velvet/1.2.10

 

Genome Assembler
VerifyBamID VerifyBamID2: A robust tool for DNA contamination estimation from sequence reads using ancestry-agnostic method.
VerifyBamID/2.0.1

 

NGS, Contamination Detection
VMD Visual Molecular Dynamics (VMD) is a molecular visualization program for displaying, animating, and analyzing large biomolecular systems using 3-D graphics and built-in scripting
VMD/1.9.3

 

Molecular Dynamics, Visualization
VSCode Visual Studio Code is a lightweight but powerful source code editor .
VSCode/1.68
VSCode/1.74

(Default)

Integrated Development Environment
VSEARCH VSEARCH which supports de novo and reference based chimera detection, clustering, full-length and prefix dereplication, rereplication, reverse complementation, masking, all-vs-all pairwise global alignment, exact and global alignment searching, shuffling, subsampling and sorting
VSEARCH/2.18

(Default)

Metagenomics
vtk The Visualization Toolkit (VTK) is an open-source, freely available software system for 3D computer graphics, modeling, image processing, volume rendering, scientific visualization, and information visualization.
vtk/9.0.3

 

Visualization
XFuse XFuse: Super-resolved spatial transcriptomics by deep data fusion
XFuse/0.2.1

 

Spatial transcriptomics
xtb XTB is a Semiempirical Extended Tight-Binding Program Package
xtb/6.5.0

 

zlib A free, general-purpose, legally unencumbered lossless data-compression library.
zlib/1.2.11