Skip to content

Latest commit

 

History

History
259 lines (226 loc) · 24.4 KB

File metadata and controls

259 lines (226 loc) · 24.4 KB
title description lead date lastmod draft weight images contributors
Tools and libraries
Clojure tools and libraries for data and science
Clojure tools and libraries for data and science
2022-02-14
2024-12-12
false
31
daslu

To supplement our opinionated discussions of the ecosystem, here is a less-opinionated, plain list of relevant libraries written by Clojurians. Not all libraries mentioned here are affiliated with Scicloj, but we seek to be in dialogue with library authors as much as possible.

Do you know about anything relevant that is missing here? - Let us talk!

For every library, we mark whether it is actively developed (act), and whether it is still experimental (exp). A star (:star:) means that we know the library to be actively used and useful.

We tag libraries with the field they are relevant to.

  • array - array programming
  • tensor - tensor programming
  • linalg - linear algebra
  • native - interop with native-optimized libraries
  • gpu - gpu support
  • vis - data visualization and visual art
  • vega - visualization using Vega/Vega-lite specifications
  • lit - literate programming
  • ui - building UIs for data exploration
  • geo - geographical and geometrical data processing
  • df - dataframe-like data structures and abstractions
  • data - general data processing
  • csv - csv import/export
  • xl - Excel spreadsheets interaction
  • json - json import/export
  • xform - transducers support
  • math - diverse math functions
  • stat - statistics
  • ts - time series analysis
  • rand - simulation and random sampling
  • prob - Bayesian computing and probabilistic programming
  • ml - machine learning
  • dnn - deep learning
  • opt - optimization
  • graph - graph algorithms and network analysis
  • interop - general libraries for interop
  • cljs - supports not only Clojure but also Clojurescript
  • nlp - natural language processing
  • llm - large language models and related services

Other lists 🔗

These other lists of libraries are very relevant to the emerging Clojure data science stack:

  • Clojurelog ⭐ by the XTDB team - a comparison of various Clojure-Datalog databases
  • Clojure DSL resources ⭐ by Simon Gray - a curated list of mostly mature and/or actively developed Clojure resources relevant for dealing with domain-specific languages, in particular parsing and data transformation with/of DSLs.
  • Clojure graph resources ⭐ by Simon Gray - a curated list of mostly mature and/or actively developed Clojure resources relevant for dealing with graph-like data

Diverse toolsets

  • noj ⭐ (act): - an toolkit with the main relevant Scicloj libraries
  • fastmath ⭐ (act): math,ml,rand,stat - a collection of functions for mathematical and statistical computing, machine learning, etc., wrapping several JVM libraries
  • spork (act): df,graph,opt,rand,ui,vis - a toolbox for data-science and operation research
  • Incanter: csv,df,rand,stat,vis - an R-like data-science platform built on top of the core.matrix abstractions
  • huri: df,stat,vis - a toolbox for data-science using plain sequences of maps

Optimization

  • matlib ⭐ (act): opt - optimisation and control theory tools and convenience functions based on Neanderthal.

Visual tools: literate programming and data visualization

  • Clay ⭐ (act): cljs,lit,vega,vis - a REPL-friendly tool for notebooks and datavis
  • Saite ⭐ (act): cljs,hiccup,lit,ui,vega,vis - data exploration, dashboards, and interactive documents
  • Clerk ⭐ (act): cljs,lit,vega,vis - local-first notebooks for Clojure
  • Oz (act): lit,vega,vis - data visuzliation using Vega/Vega-Lite and Hiccup, and a live-reload platform for literate-programming
  • rmarkdown-clojure ⭐: lit,vis - rendering Clojure code in various format using Rmarkdown
  • Pink-Gorilla/Goldly ⭐ (act): cljs,exp,lit,ui,vis - a port of the Gorilla REPL project using a Clojurescript (Reagent) frontend
  • Org-babel-clojure ⭐: lt - executing Clojure inside Emacs Org-mode documents
  • Devcards ⭐: cljs,lit - visual repl exprience for Clojurescript
  • Notespace: lit - notebook experience with Clojure namespaces edited at any editor
  • Reveal ⭐ (act): - browser-based data navigation GUI
  • Portal ⭐ (act): - desktop data navigation GUI
  • Gorilla-REPL: lit - a notebook application written in Clojure and Javascript
  • proto-repl-charts: vis - an Atom plugin for displaying tables and graphs
  • Maria (act): cljs,lit,vis - a Clojurescript coding environment for beginners
  • emmy-viewers ⭐ (act): cljs,vis - High-performance symbolic, 2D and 3D visual extensions to the Emmy computer algebra system

Vega rendering

In addition to a few of the tools mentioned above, here is a list of dedicated tools dedicated mainly to handling Vega/Vega-lite specifications. See this conversation for some discussion of the differences and tradeoffs across these tools.

  • darkstar ⭐: vega,vis - a minimal wrapper over Vega/Vega-lite as a single JVM-only Clojure library, using the GraalJS javascript runtime, which does not require GraalVM runtime to run.
  • xvsy: cljs,vega,vis - grammer of graphics over Vega
  • Vegan (act): vega,vis - a nodejs clojurescript library designed to validate and render Vega and Vega-lite files, supports docker-based setup
  • Vega-clj (act): vega,vis - a clojure wrapper for the (node-based) Vega-cli and Vega-lite standalone scrips
  • Optikon: vega,vis - a command line tool that wraps Vega and Vega-lite, using GraalVM polyglot programming
  • Vegafx: vega,vis - a static-site viewer using javafx that renders Vega specs
  • VL example gallery as EDN: - The vega lite example in EDN format, ready to be copy/pasted into Clojure code

Data visualization libraries

  • Tableplot ⭐ (act): exp,vis - easy layered graphics with Hanami & Tablecloth
  • cljplot ⭐ (act): exp,vis - a data visualization platform written in Clojure and inspired by R's ggplot2 and lattice libraries
  • Hanami ⭐ (act): cljs,hiccup,ui,vega,vis - a template system for creating interactive data visualizations using Vega/Vega-lite, Reagent and Re-Com
  • viz.clj: exp,vega,vis - a data visualization library for beginners (WIP)
  • Clojure2D ⭐ (act): vis - Java2D wrapper + creative coding supporting functions (based on Processing and openFrameworks)
  • Quil ⭐: vis - a clojure/clojuresctit wrapper for Processing
  • thi-ng/geom ⭐: cljs,vis - 2d/3d geometry toolkit
  • Gorilla-plot: vega,vis - plotting functions using Vega for Gorilla-REPL
  • gg4clj: r,vis - a clojure DSL for creating ggplot2 plots using R
  • gg4clj port: - by the Pink Gorilla project
  • Analemma: cljs,exp,vis - generating charts and SVG with a syntax similar to Incanter's and a visual theme similar to ggplot2.
  • emacs-Vega-view (act): vega,vis - an emacs mode to facilitate interactive data visualization using Vega from within emacs. Supports elisp, json and clojure notations

Data processing

  • ham-fisted ⭐ (act): data - data structures, functions, and macros for efficient functional programming in the JVM
  • Specter ⭐ (act): cljs,data - declarative navigation of nested data structures for selection and transformation in Clojure and Clojurescript
  • Meander ⭐ (act): cljs,data - transforming neseted data structures by declaratively declaring the shape of source and target datastructures
  • xforms ⭐: cljs,data,xform - a collection of transduces and reducing functions
  • Odin: data - processing nested data structures by extensible logic programming
  • Charred ⭐ (act): csv,json - zero dependency efficient read/write of json and csv data.
  • Semantic Csv: cljs,csv - higher level csv parsing/processing

Geospatial processing

  • geo ⭐: geo - unifying several JVM libraries for geoprocessing with a Clojure API
  • ovid ⭐: exp,geo - protocols for geospatial concepts
  • aurelius ⭐: exp,geo,xform - transducible analysis of geospatial features
  • geo-clj ⭐: cljs,geo - encoding/decoding of geographic datatypes

Dataframe-like structures

  • tech.ml.dataset ⭐ (act): csv,df,stat,vis - abstractions for dataframe-like structures in clojure, based on dtype-next infrastructure
  • tablecloth ⭐ (act): csv,df - a dataframe grammar wrapping tech.ml.dataset, inspired by serveral R libraries
  • clojask ⭐ (act): df - a library for parallel computing of larger-than-memory datasets.
  • datajure (act): df - a domain-specific language for data processing wrapping libraries such as tech.ml.dataset, tablecloth, clojask, and geni
  • Panthera: df,py - a Clojure API wrapping Python's Pandas library
  • koala: csv,df,exp - Pandas-like data-processing for clojure with some I/O functionality (experimental)
  • dataframe: df - Pandas-like data processing for clojure
  • danzig (act): df,exp,xform - Pandas-like data processing using transducers (danzig was formerly named wombat)
  • bamboo: df - a minimal data processing library for Clojure, with some of the capabilities of pandas and numpy
  • see also geni ⭐ under the Spark sub category below

Spreadsheets

  • Docjure ⭐ (act): xl - making it easy to read and write Excel spreadsheets as Clojure data
  • kixi.large ⭐ (act): exp,xl - a tech.ml.dataset-friendly fork of Docjure, providing clear entry point at the workbook and sheet level and a way to insert images
  • Excel-clj ⭐ (act): xl - building Excel spreadsheets from Clojure data in various forms such as trees, tables, and more
  • fxl ⭐ (act): xl - manipulating spreadsheets with a versatile API for handling various spreadsheet constructs
  • Excel-templates: exp,xl - building Excel spreadsheets from Clojure data combined with an Excel sheet serving as a template
  • xl-parse-clj: exp,xl - converting an Excel workbook to Clojure code

Array programming, linear algebra

  • dtype-next ⭐ (act): array,native,stat,tensor - abstractions and foundations for working with array-like structures and sequential structures
  • Neanderthal ⭐ (act): array,gpu,linalg,native - matrix and linear algebra in Clojure
  • tvm-clj (act): array,exp,gpu,linalg,native - bindings to tvm
  • Geometric Algebra (act): linalg,math - A library to do geometric algebra
  • jutsu.matrix: array,gpu,linalg,native - bindigs to ND4J
  • core.matrix: array,cljs,linalg,native - matrix abstractions, supporting diffent backends
  • denisovan: array,gpu,linalg,native - Neanderthal backend for core.matrix

Deep learning

  • Deep Diamond ⭐ (act): dnn,gpu,native,tensor - infrastructure for tensor computation and deep learning
  • clj-djl ⭐ (act): dnn,gpu,native,tensor - a wrapper for the Deep Java Library
  • MXNet: dnn - bindings to Apache MXNet, a part of the MXNet project
  • jutsu.ai: dnn - a wrapper for deeplearning4j
  • Cortex: dnn - a deep learning library written in Clojure
  • Flare: dnn - dynamic neural networks in Clojure

Statistics

  • kixi.stats ⭐ (act): rand,stat,xform - statistics and random sampling using transducers
  • fitdistr ⭐ (act): stat - fitting distributions

Time series analysis

  • tide: ts - STL and FastDTW algorithms

Bayesian computing & probabilistic programming

  • inferme ⭐ (act): prob,rand,vis - extensible probabilistic programming in Clojure itself (rather than a language variation), with support for visualizations
  • Gen.clj ⭐ (act): prob,rand - A stack for generative modeling and probabilistic inference.
  • cmdstan-clj ⭐ (act): exp - Using the Stan statistical modelling language from Clojure using the CmdStan CLI
  • clj-stan: - A library for calling the Stan language from Clojure (by shelling out to cmdstan).
  • bayadera: gpu,prob,rand,stat - Bayesian computing using the GPU
  • sampling: rand - support srandom sampling of different kinds
  • distributions: prob,rand - random sampling and some basic Bayesian computing for certain families of distributions
  • metaprob: cljs,exp,prob,rand - an embedded languages for probabilistic programming and metaprogramming
  • daphne: exp,prob - a probabilisic programming compiler from Clojure syntax to Pytorch
  • anglican: cljs,prob,rand - a probabilistic programming language written in clojure, that supports a subset of clojure

Random sampling and simulations

  • masonclj ⭐ (act): rand - a Clojure wrapper of MASON, which is a Java library for discrete-event multiagent simulation and agent-based modeling.
  • dsim.cljc ⭐ (act): cljs,rand - an event-driven engine for Clojure(script) heavily borrowing ideas from discrete-event simulation and hybrid dynamical systems
  • date-gen (act): rand - randomized date generation supporting CSV output
  • drand: rand - a client to the Drand randomness service

Science

  • emmy ⭐ (act): - (was SICMUtils) a library for algebra, calculus, differential geometry and physics based on the SICM book by Sussman & Wisdom
  • cljbox2d ⭐ (act): cljs - a Clojure/Clojurescript wrapper of the Box2D physics engine API

Machine learning

  • scicloj.ml ⭐ (act): ml - A machine learning platform supporting a large collection of algorithms and pipeline ergonomics
  • clj-ml: ml - machine learning based on wrapping libraries such as the Weka Java library
  • clj-boost: ml - a wrapper for XGBoost
  • propaganda: ml - an implementation of the propagator computational model
  • dvc: ml - A programming language independent tool for ML experiment tracking, Clojure fully supported

Genetic programming

  • Propeller ⭐ (act): ml - "Yet another Push-based genetic programming system in Clojure"
  • Clojush (act): ml - an implementation of the Push programming language for genetic programming

Natural Language Processing

  • DataLinguist ⭐ (act): nlp - a Clojure wrapper for Stanford CoreNLP

Large Language Models and related services

Interop

  • clj-polyglot-app ⭐ (act): interop - A deps-new template to create a polyglot app in Clojure (Clojure, R, & Python)
  • Libpython-clj ⭐ (act): interop - interop with Python
  • clj-python-trampoline: interop - using libpython-clj from an already running python process, without needing any special python builds
  • Libjulia-clj ⭐ (act): interop - Julia bindings for Clojure
  • Wolframite ⭐ (act): interop - interop with Wolfram language
  • ClojisR ⭐ (act): interop - interop with R and Renjin (R on the JVM)
  • graalvm-interop: interop - interop with FastR (GraalVM's R)
  • rdata: - A Renjin (pure-JVM R) wrapper to allow Clojure programs to easily read R's RData file format
  • from-scala: interop - interop with Scala
  • Interop template project: interop - A project template which configure several interop libraries automaticaly using Docker
  • other R interop libraries

Parallel computing

  • claypoole ⭐ (act): for,future,pmap - threadpool-based parallel versions of Clojure functions such as pmap, future, and for
  • parallel ⭐: - parallel-enabled functions, addditional transducers and supporting utilities
  • tesser ⭐: - a library for concurrent & commutative folds, including some statistical tasks and Hadoop support
  • tech.parallel ⭐ (act): - parallelization and threading primitives

Distributed computing

  • titanoboa ⭐ (act): - a fully distributed, highly scalable and fault tolerant workflow orchestration platform
  • onyx ⭐: - a library for distributed computation in the cloud
  • overseer: - a library for building and running data pipelines

Hadoop

  • Parkour: - Hadoop MapReduce in idiomatic Clojure

Spark

Stream processing

Kafka

  • jackdaw ⭐ (act): - a wrapper for Kafka and Kafka Streams
  • kafka.clj ⭐ (act): - a wrapper for Kafka and Kafka Streams
  • ksml ⭐ (act): - representing kafka streams topologies as data
  • rp-jackdaw-clj: - various components for interacting with Kafka using Jackdaw

Datasets

  • hdfs-clj ⭐ (act): - Access to HuggingFace datasets via Clojure