{content.title}
+ {content.intro &&{content.intro}
} +- BioFile Finder helps users effortlessly interact with, extrapolate, and - share data. -
-- Explore the data without any coding through the standardized metadata, using - filters and hierarchies of folders to search for the data needed to answer a - specific question. Example... -
-- Find all images corresponding to a list of criteria provided by - annotation and tags. -
-- Visualize each file before uploading them ensuring you extracted the - right data for your research. -
-- Provide the URL address to your collaborator in just one click for - analysis and further exploration of your exact query. -
-- By storing the .csv file in a public cloud storage, you can include the - URL address in your publication for others to access your data in just - one click. -
-- BioFile Finder is a web application that allows users to interact with their - data in a more efficient and effective way. Users upload a .csv file with - metadata and a link to their data. The .csv file is used to create a query that - can be shared with others. The URL contains the query that can be shared with - others to view the same data and metadata that the user is currently looking at. - The receiver of the URL will need access to the datasource (.csv). -
-.csv File
-Biofile Finder
-Image Viewer
-- BioFile Finder is designed to be flexible and work with a wide range of data. - Currently, all you need to get started is a .csv file containing links to your - data. Add metadata to your .csv to make using BioFile Finder even more powerful! -
-- No! You can use BioFile Finder with private data. Just make sure to provide a - link to your data that is accessible to BioFile Finder. Similarly, whoever you - share your query with will need to have access to the data you have linked to or - at the very least the .csv file you have used to create the query. -
-- No! The data source file you upload is not stored by BioFile Finder, nor does - any of the data you query get sent to us. -
-- Sort of. We are working on a way to allow users to store their .csv with us to - make it public. In the meantime, email us at - - aics_software_support@alleninstitute.org - - to request your data be included with our own collection of open-source - datasets. Please note, your image data would need to be stored in a public - location like - - Image Data Registry - - or AWS. -
-- Check out our{" "} - - Open-source datasets - {" "} - for inspiration and examples of datasets. -
-- Please reach out to us at - - aics_software_support@alleninstitute.org - - . -
-Page not found.
+{content.intro}
} ++ BioFile Finder (BFF) is a web-based application + designed to enable researchers to explore and manage large-scale + biological imaging datasets and associated files in a consistent and + streamlined way. It enables users to query structured metadata and + seamlessly connect results to associated image assets. +
++ Built to handle complex, high-volume data, BioFile Finder supports + advanced search, filtering, and sorting—making it easier to access, + curate, collaborate on, and share datasets. The intuitive interface + requires no coding, allowing users to quickly preview data through + thumbnails, open files in common industry tools, or visualize them in + the companion web-based 3D volume viewer, Vol-E. +
+ {/* TODO: Add publication link once paper is available in Nature Methods */} + > + ), + }, + { + id: "who-is-bff-for", + heading: "Who is BFF for?", + body: ( + <> ++ BFF is designed for anyone who needs to explore and manage large + collections of biological files, especially those associated with + imaging datasets. It is particularly useful for: +
++ + Read detailed scenarios and use cases + +
+ > + ), + }, + { + id: "why-use-bff", + heading: "What makes BFF unique?", + body: ( + <> ++ A number of thoughful features set BFF apart from other similar tools. + Key differentiators include: +
++ The following table highlights how BFF compares to similar tools in the + bioimaging data management ecosystem. +
+| Feature | +BioFile Finder (BFF) | +OMERO | +IDR | +SSBD | +Zarrcade | +BioImage Archive (BIA) | +Quilt | +Cytomine | +BisQue | +
|---|---|---|---|---|---|---|---|---|---|
| + Type + | +Desktop/web file browser | +Client-server image management | +Public image repository | +Public dynamics database | +OME-Zarr discovery tool | +Public image archive | +Data versioning platform | +Web collaborative analysis | +Web image management | +
| + Cost + | +Free/open-source | ++ Free / open-source (self-hosted infrastructure + costs) + | +Free (public resource) | +Free (public resource) | +Free / open-source | +Free (public resource) | +Free tier; paid plans for teams | +Free / open-source (self-hosted) | +Free / open-source (self-hosted) | +
| + Deployment + | +Local app or static web page | +Requires server + PostgreSQL + storage | +Hosted by EMBL-EBI | +Hosted by RIKEN | +Static web page | +Hosted by EMBL-EBI | +SaaS or self-hosted | +Requires server + DB | +Requires server + DB | +
| + File Format Support + | +Any file type; Parquet, CSV, JSON metadata | +150+ microscopy formats via Bio-Formats | +Same as OMERO | +OME-Zarr, BD5/HDF5 | +OME-Zarr only | +Any bioimaging format | +Any file type | +TIFF, whole-slide images | +Many image formats | +
| + Metadata Source + | +User-supplied files (Parquet/CSV) or URLs | +Stored in PostgreSQL; key-value annotations | +Curated study-level metadata | +Curated per-study | +Derived from Zarr metadata | +Study-level submissions | +Package-level metadata | +Project/annotation-based | +Tag/key-value on images | +
| + Dynamic Querying / Filtering + | ++ Yes — in-browser SQL via DuckDB; filter, sort, group + by any column + | +Yes — HQL/API queries; filter by tag/key-value | +Limited — browse by study, screen, gene | +Limited — browse by organism/study | +No — browse/list only | +Limited — search by study/accession | +Limited — search by package name | +Yes — ontology-based spatial queries | +Yes — tag-based queries | +
| + Annotation Hierarchy / Grouping + | ++ Yes — user-defined nested grouping by any annotation + | +Partial — tag groups, datasets, projects | +No | +No | +No | +No | +No | +Partial — project/folder hierarchy | +Partial — tag hierarchy | +
| + Shareable URLs / Copy-Paste Sharing + | +Yes — query state encoded in URL | ++ Partial — links to images/datasets, requires server + access + | +Yes — public stable URLs per study | +Yes — public DOI-based URLs | +Yes — public URLs to Zarr stores | +Yes — accession-based URLs | +Yes — versioned package URLs | +Partial — project links, requires login | +Partial — resource links, requires server | +
| + Works Without a Server + | ++ Yes — runs entirely in-browser or as desktop app + | +No — requires OMERO.server | +N/A (hosted service) | +N/A (hosted service) | +Yes — static site | +N/A (hosted service) | +No (SaaS) | +No — requires server | +No — requires server | +
| + Cloud / Remote Data + | +Yes — S3, HTTP/HTTPS URLs | +Yes — via OMERO.server with S3 backend | +N/A | +N/A | +Yes — any HTTP-hosted Zarr | +N/A | +Yes — S3-backed | +Limited | +Limited | +
| + Data Scale + | +Tested to 10M+ rows; limited by browser memory | +Millions of images (server-dependent) | +~50 TB across studies | +Moderate (curated datasets) | +Unlimited (just a catalog) | +Petabyte-scale archive | +Package-size dependent | +Large histopathology images | +Moderate | +
| + Image Viewing + | +Thumbnails; delegates to external viewers | ++ Built-in multi-dimensional viewer (OMERO.web, + OMERO.figure) + | +Built-in viewer (idr.openmicroscopy.org) | +Built-in 3D/4D viewer | +Links to external Zarr viewers (e.g. Vizarr) | +Links to BioStudies viewer | +Built-in preview for some types | +Built-in annotation/viewer | +Built-in multi-dim viewer | +
| + User Annotations / Editing + | +Yes — add/edit metadata columns in-browser | +Yes — key-value pairs, tags, ratings, ROIs | +No (read-only) | +No (read-only) | +No (read-only) | +No (submission-based) | +Yes — package metadata | +Yes — spatial annotations, ontology terms | +Yes — tags, gobjects | +
| + Programmatic API + | +DuckDB SQL in-browser; no REST API needed | ++ Full REST + JSON API; Python (omero-py), Java, CLI + | +REST API (same as OMERO) | +REST API | +None (static JSON) | +REST API (BioStudies) | +Python SDK, REST API | +REST API, Python client | +REST API, Python/MATLAB | +
| + Multi-User / Auth + | +No — single-user local tool | +Yes — LDAP, groups, permissions | +Public (no auth) | +Public (no auth) | +Public (no auth) | +Submission requires login | +Yes — teams, RBAC | +Yes — LDAP, project roles | +Yes — user/group permissions | +
| + Primary Use Case + | ++ Explore & filter large tabular file metadata; + share queries via URL + | ++ Manage, view & annotate microscopy data for a + lab/institute + | +Publish & browse reference image datasets | +Share quantitative bio-dynamics data | +Discover & link to OME-Zarr datasets | +Archive & publish bioimaging data | +Version & share data packages | ++ Collaborative image annotation (pathology, etc.) + | +Manage & analyze diverse bio-images | +
| + License + | +MIT | +AGPL v3 | +N/A (hosted) | +N/A (hosted) | +MIT | +N/A (hosted) | +Apache 2.0 | +Apache 2.0 | +BSD | +
+ BFF uses DuckDB — a high-performance analytical SQL + engine — to run queries entirely in your browser. No server, no backend, + no credentials required. Filter, sort, and search across millions of + rows of metadata instantly. +
++ Group files by any combination of metadata columns to create a navigable + folder-like hierarchy — without moving or reorganizing your actual + files. Switch grouping strategies instantly to explore different + dimensions of your dataset. +
++ Sharing is one of BFF's most powerful and distinctive features. + Every filter, sort, grouping, and column layout you configure is encoded + directly into the URL. Copy the link and share it — anyone who opens it + sees exactly the same view of the data, without re-running any queries, + sending files, or setting anything up. +
++ This makes BFF uniquely suited for collaborative research and open + science: +
++ Most tools in this space either require server access to share data or + only link to a static dataset. BFF shares the exact filtered, sorted, + grouped view — making it a powerful tool for transparent and + reproducible science. +
++ BFF renders thumbnail previews for files in your dataset so you can + visually scan your data without opening each file individually. + Thumbnails appear inline in the file list and update dynamically as you + filter and group. +
+Thumbnail column
+ in your dataset — useful for large or complex files where
+ auto-generation isn't possible
+ + Content coming soon. +
+ ), + }, + { + id: "viewer-integrations", + heading: "Viewer integrations", + body: ( + <> ++ BFF connects directly to a variety of image viewers — web-based and + desktop. Select any file and open it in the viewer best suited for its + format and your workflow. +
++ + See the viewer comparison table + +
+ > + ), + }, + ], + }, +}; diff --git a/packages/web/src/components/UserGuide/content/app-information.tsx b/packages/web/src/components/UserGuide/content/app-information.tsx new file mode 100644 index 000000000..460154f68 --- /dev/null +++ b/packages/web/src/components/UserGuide/content/app-information.tsx @@ -0,0 +1,399 @@ +import { Icon } from "@fluentui/react"; +import * as React from "react"; + +import type { PageContent } from "./types"; + +export const APP_INFORMATION_CONTENT: Record+ BFF ingests metadata about biological files (a dataset), not the files + themselves. This metadata is intended to be tabular and can be stored as + the following formats: +
++ Information on file size limitations coming soon. +
++ Limitations around the files tracked within BFF are imposed by the + applications BFF links to for that given file. For example, FIJI will + only work with the files that it supports. BFF itself is agnostic to the + file types and sizes referenced in a dataset. +
+ > + ), + }, + { + id: "preferred-browsers", + heading: "Browser and device compatibility", + body: ( + <> ++ For best performance and compatibility, we recommend using the latest + versions of the following browsers: +
++ BFF is optimized for desktop use and is not currently designed for + mobile devices. +
+ > + ), + }, + { + id: "open-source", + heading: "Open source", + body: ( +
+ BioFile Finder is open-source and free to use. You can find the code, report
+ issues, and contribute on{" "}
+
+ GitHub
+ File format will heavily limit viewer options, but when multiple options + are feasible, the following information may help guide your decision. +
++ The following table offers comparisons between various supported + viewers. +
+| Feature | +Vol-E | +AGAVE | +FIJI / ImageJ | +Neuroglancer | +OME NGFF Validator | +Browser (web) | +Simularium | +VolView | +
|---|---|---|---|---|---|---|---|---|
| + Type + | +Web-based 3D volume viewer | +Desktop GPU-accelerated volume renderer | +Desktop image analysis suite | +Web-based volumetric viewer | +Web-based validation tool | +Native file preview | +Web-based simulation viewer | +Web-based 3D volume viewer | +
| + Platform + | +Web app (browser) | +Desktop (Windows, macOS, Linux) | +Desktop (Windows, macOS, Linux) | +Web app (browser) | +Web app (browser) | +Desktop (OS-native) | +Web app (browser) | +Web app (browser) | +
| + Installation required + | +No | +Yes (standalone app) | +Yes (Java-based) | +No | +No | +No (built into OS) | +No | +No | +
| + Cost + | +Free / open-source | +Free / open-source | +Free / open-source | +Free / open-source | +Free / open-source | +Free (bundled with OS) | +Free / open-source | +Free / open-source | +
| + Primary use case + | ++ Interactive 3D volume rendering of microscopy data + | ++ High-quality cinematic 3D rendering and path tracing + | ++ General-purpose image analysis, measurement, and + processing + | ++ Explore large-scale connectomics / volumetric neuro + datasets + | ++ Validate OME-Zarr/NGFF file structure and metadata + compliance + | +Quick preview of standard image/video files | ++ Visualize agent-based biological simulations over + time + | ++ Clinical and research DICOM/volume visualization + | +
| + Supported formats + | +OME-Zarr, TIFF, OME-TIFF | ++ OME-TIFF, TIFF, CZI, LIF, and other microscopy + formats + | +100+ formats via Bio-Formats | +Precomputed, N5, Zarr, NIFTI | +OME-Zarr (NGFF) only | +JPEG, PNG, TIFF, MP4, PDF (OS-dependent) | +Simularium, CytoSim, ReaDDy, Smoldyn | +DICOM, NIFTI, MHA, VTI, NRRD, Zarr | +
| + 3D volume rendering + | +Yes — real-time ray marching | +Yes — GPU path tracing, cinematic quality | +Limited — 3D Viewer plugin | +Yes — multi-scale, GPU-accelerated | +No | +No | +Yes — 3D agent trajectories and meshes | +Yes — GPU-accelerated ray casting | +
| + Multi-channel support + | +Yes | +Yes | +Yes | +Yes | +Validates channel metadata | +No | +N/A | +Yes | +
| + Time series / 4D + | +Yes | +Yes | +Yes | +Limited | +Validates time dimension metadata | +No | +Yes — primary feature | +Limited | +
| + Large data / streaming + | +Yes — streams OME-Zarr from cloud/HTTP | +No — loads full volume into GPU memory | +Limited | +Yes — designed for petascale | +Validates metadata only | +No | +Streams from URL | +Yes — progressive loading | +
| + Cloud / remote data + | +Yes — HTTP/S3 URLs | +No — local files only | +Limited | +Yes — GCS, S3, HTTP | +Yes — validates remote URLs | +No | +Yes | +Yes | +
| + Collaborative / sharing + | +Shareable URL with view state | +No | +No | +Yes — URL encodes full view state | +Shareable validation URL | +No | +Shareable URL | +Shareable URL via hosted instance | +
| + Best for + | ++ Quick interactive exploration of cloud-hosted + OME-Zarr volumes + | ++ High-quality figures and movies of 3D microscopy + data + | +Comprehensive image analysis and scripting | ++ Browsing terabyte+ volumetric datasets in the cloud + | +Checking OME-Zarr files before sharing | +Quickly previewing a standard image file | ++ Viewing and sharing spatiotemporal biological + simulations + | ++ Medical/research volumes with clinical-style tools + | +
| + Limitations + | +No analysis tools; limited format support | +Requires dedicated GPU; local files only | ++ Basic 3D rendering; struggles with very large + datasets + | ++ Steep learning curve; specific pre-tiled formats + only + | +Validation only; OME-Zarr only | +No scientific image capabilities | +Simulation data only | +Limited microscopy format support | +
+ Prepare a metadata table describing your files. Each row typically + represents a file, while columns contain metadata such as: +
++ See:{" "} + + Creating a dataset + + ,{" "} + + Metadata guidance + +
+ ++ Your dataset must include paths or URLs pointing to the files you want + BFF to access. Files can live: +
++ BFF is storage agnostic and does not require files to be moved into a + proprietary system. +
++ See:{" "} + + Storage options + + ,{" "} + + Viewer compatibility + +
+ +Open BFF and either:
++ Once loaded, BFF allows you to filter and search metadata, group files + dynamically, preview and open files in compatible viewers, and share + exact dataset views via URL. +
+ > + ), + }, + { + id: "minimum-requirements", + heading: "Minimum requirements", + body: ( + <> +To use BFF, you only need:
+No backend, database, or server infrastructure is required.
+ > + ), + }, + { + id: "common-workflows", + heading: "Common workflows", + body: ( +| Goal | +Typical setup | +
|---|---|
| Personal / local exploration | +Local dataset + local files | +
| Shared lab dataset | +Hosted dataset + shared storage | +
| Public publication companion | +Hosted dataset + public cloud storage | +
| Large-scale datasets | +Parquet + cloud storage | +
| Metadata validation / QC | +Dataset + metadata descriptor file | +
| File lineage / relationship tracking | +Dataset + provenance file | +
+ If data is intended to be publicly shared — like in a publication — + store the dataset and files referenced in the dataset in{" "} + cloud storage{" "} + to enable readers to explore the dataset and its files via a sharable + BFF link (URL). +
++ Note: You can use BFF as a way to circumvent having to publish all files + by publishing only the dataset file and instructing readers to request + files directly. This allows viewers to see metadata about every file in + the dataset without you paying for full cloud storage of each file. + Building on this approach, you can host thumbnails of each file so + readers can get a preview without you paying for full-resolution images + to live in the cloud. +
+ ++ A BFF dataset is a tabular file where each row represents a file and + each column is a piece of metadata about that file. The format is + flexible — any columns beyond the required ones are yours to define + based on what matters to your workflow. +
++ + See App information + {" "} + for accepted file types and size limitations. +
+ > + ), + }, + { + id: "rows-columns", + heading: "Rows and columns", + body: ( + <> ++ Rows: Each row in the table should correspond with a + file — either on the cloud, a hard drive, or network attached storage. + However, you can have a row corresponding to multiple files, or + different rows corresponding to the same file. +
++ Columns: Columns can be anything, but there is one + required column and a few special optional columns described below. +
+ > + ), + }, + { + id: "required-columns", + heading: "Required columns", + body: ( ++ File Path — A reference to the file that BFF will attempt + to open with relevant applications. This column does not have to be unique.{" "} + + Information about file storage options + + . +
+ ), + }, + { + id: "optional-columns", + heading: "Optional special columns", + body: ( + <> ++ These columns are optional but enable specific features in BFF when + provided. +
++ Each row is a file. Columns can be anything meaningful to your workflow + — here a well position, gene target, and color channel. +
+| File Path | +Well | +Gene | +Color | +
|---|---|---|---|
| Abc123.txt | +B3 | +CDH2 | +Blue | +
| Def456.txt | +G9 | +VIM | +Green | +
+
+ Download this example as CSV{" "}
+
+ Browse open-source datasets +
+ > + ), + }, + ], + }, + + "getting-started/metadata-guidance": { + title: "Metadata guidance", + intro: + "Clear, consistent metadata is what turns microscopy data from a static file into something others can actually find, interpret, and reuse. This section outlines recommended metadata practices that support sharing datasets in a way that is both accessible and meaningful to a broad audience — from collaborators to future researchers. Rather than prescribing a rigid standard, the guidance focuses on capturing the essential context needed to understand how the data was generated, how it is structured, and how it can be used. Our hope is that by following these suggestions, you can make your data easier to explore, visualize, and integrate into downstream analyses, while reducing ambiguity and the need for follow-up clarification.", + sections: [ + { + id: "recommendations", + heading: "Recommendations", + body: ( + <> +
+ The following interpretation of the{" "}
+
+ FoundingGIDE{" "}
+
+ Fields included: Metadata Field, Study Description, Authors, + Organization, Publication, License, Release Date, Imaging Method, Cell + Line, Organism, Gene, Compound, Antibody, Channel — Content, Channel — + Biological Entity, Instrument, Dimension, Pixel/Voxel Size / Time + resolution, Study Unique ID, Dataset Unique ID, Pathology/Disease, + Phenotype, Organ, Analyzed Data. +
+
+
+ Download FoundingGIDE template CSV{" "}
+
Example descriptions for these fields are provided below.
+ > + ), + }, + { + id: "column-descriptions", + heading: "Providing column descriptions", + body: ( + <> ++ BFF can display tooltips that describe the columns in your dataset if + provided an additional file (referenced as a “metadata descriptor + file” in the app). This file must contain three columns: +
+Open file link, which tells BFF the column represents a
+ link that can be opened with the “Open with…”
+ button. This is useful for pointing to alternative viewers or
+ related resources — for example, a column containing a direct link
+ to open a file in a specific tool.
+ | Column Name | +Description | +Type | +
|---|---|---|
| Metadata Field | +Name of the metadata attribute being described | ++ |
| Study Description | +Summary of the study's purpose, design, and scope | ++ |
| Authors | +List of contributors to the dataset or study | ++ |
| Organization | +Institution or organization responsible for the dataset | ++ |
| Publication | +Associated publication or DOI describing the dataset | +Open file link | +
| License | +Usage license governing the dataset (e.g., CC-BY) | ++ |
| Release Date | +Date the dataset was made publicly available | ++ |
| Imaging Method | ++ Microscopy or imaging modality used (e.g., confocal, + light-sheet) + | ++ |
| Cell Line | +Cell line used in the experiment | ++ |
| Organism | +Species from which the sample was derived | ++ |
| Gene | +Gene(s) of interest or manipulated in the experiment | ++ |
| Compound | +Chemical compound or treatment applied | ++ |
| Antibody | +Antibody used for staining or detection | ++ |
| Channel — Content | ++ Imaging channel identifier or label (e.g., Channel 1, GFP) + | ++ |
| Channel — Biological Entity | ++ Biological structure or molecule represented in the channel + | ++ |
| Instrument | +Microscope or imaging instrument used | ++ |
| Dimension | ++ Dimensionality of the dataset (e.g., 2D, 3D, time series) + | ++ |
| Pixel/Voxel Size / Time resolution | +Spatial or temporal resolution of the imaging data | ++ |
| Study Unique ID | +Unique identifier for the overall study | ++ |
| Dataset Unique ID | ++ Unique identifier for a specific dataset within the study + | ++ |
| Pathology/Disease | +Disease or pathological condition represented | ++ |
| Phenotype | +Observed or computed phenotype from analysis | ++ |
| Organ | +Organ or tissue source of the sample | ++ |
| Analyzed Data | ++ Link to derived or processed data (e.g., segmentation, + features) + | +Open file link | +
+
+ Download this example as CSV{" "}
+
+ BFF supports describing relationships between files and metadata via a + provenance file.{" "} + + See the full provenance guide + + . +
+ ), + }, + ], + }, + + "getting-started/provenance": { + title: "File & metadata provenance", + intro: + 'Information about how files relate to each other or to different pieces of metadata can be provided via an additional file called a "Provenance file". Provenance in BioFile Finder (BFF) can describe relationships between files, between a file and a piece of metadata, and between two pieces of metadata.', + sections: [ + { + id: "provenance-where", + heading: "Where to provide the provenance file", + body: ( ++ In BFF, open the data source panel by clicking the dataset name at the top + of the app. At the bottom of that panel you will find an optional field + labeled Provenance file. Paste the URL or drag in the file + there to load it alongside your dataset. +
+ ), + }, + { + id: "provenance-format", + heading: "Provenance file format", + body: ( + <> +The provenance file should contain 6 columns:
+pointer, this should be the name of a dataset column
+ that encodes the relationship.
+ file if the child is a
+ file in the dataset; entity if it is metadata.
+ file if the parent is a
+ file; entity if it is metadata.
+ pointer if the
+ relationship is defined via a dataset column.
+ | Child | +Relationship | +Parent | +Child Type | +Parent Type | +Relationship Type | +
|---|---|---|---|---|---|
| WellID | +is well in | +PlateID | +entity | +entity | ++ |
| ColonyImage | +is image acquired from | +WellID | +file | +entity | ++ |
| SegmentationImage | +segmentation_algorithm_v1 | +ColonyImage | +file | +file | +pointer | +
+
+ Download this example{" "}
+
+ Provenance is especially important in microscopy workflows that span + multiple levels of biological organization — such as plates, wells, and + individual image files. Without clear provenance linking each + segmentation file back to its original image, well, and plate context, + it becomes difficult to trace results back to the experimental setup. + Capturing these relationships ensures that derived data products remain + connected to their biological source, enabling validation, + troubleshooting, and reproducibility. +
++ In BFF, once a provenance file is loaded, each file row in the file list + will show a relationship indicator. Expanding a row reveals its linked + parent or child entities — for example, clicking a segmentation image + will show the colony image it was derived from and the well it + originated in. +
+
+
+ Download example
+ Provenance is also critical when a single publication draws on images + from multiple datasets. If the origin of each image is not clearly + documented — which dataset it came from, how it was selected, whether it + was processed consistently — readers and collaborators may struggle to + interpret how comparable those images truly are. By maintaining + provenance across datasets, researchers can clearly communicate how + figures were constructed and allow others to navigate back to the full + underlying data for verification or reuse. +
++ When provenance spans multiple datasets, BFF displays the dataset origin + of each file alongside its metadata. Filtering by dataset source allows + you to isolate images from a specific experiment, verify that processing + was applied consistently, and trace any figure back to its full source + dataset. +
+
+
+ Download example
+ BFF works with both private and public cloud storage. The only + requirement is that{" "} + CORS permissions are + configured on the bucket so the browser can access the files. +
+ > + ), + }, + { + id: "hard-drive", + heading: "Local and network storage", + body: ( + <> ++ BFF can load a dataset file from a local hard drive or network-attached + storage. However, because BFF runs in the browser, local paths are not + persisted — if you refresh the page or share the link, you will be + prompted to reload the dataset file. +
++ Files referenced in the dataset can also live locally or on a network + drive, but they can only be opened in desktop applications (e.g. FIJI). + Web-based viewers do not have access to the local file system. If the + drive is disconnected, BFF will still display the metadata but the + viewers will be unable to open the files. +
+ > + ), + }, + { + id: "cloud-examples", + heading: "Cloud storage examples", + body: ( + <> ++ A useful way to think about integrating the Image Data Resource (IDR), + BioImage Archive (BIA), and SSBD with BFF is that they occupy different + layers of the bioimaging data stack, and BFF can sit above them as a + unified discovery and navigation interface. +
++ BFF can sit above all three as a unified metadata and exploration layer: +
++ By sitting on top of these resources, BFF provides a single search and + discovery interface across raw (BIA), curated (IDR), and + quantitative/model-based (SSBD) datasets — allowing cross-repository + linking and normalized navigation from raw images through to derived + analysis. +
+You can use Google Sheets to publish your dataset publicly as a CSV:
++ In GitHub you can link to the Raw version of a file in + a repository to share the dataset with anyone that has access to that + repository. This also provides implicit dataset versioning, which can be + very useful for collaboration. +
+
+ New to GitHub?{" "}
+
+ See GitHub documentation{" "}
+
+ Your organization may provide support for choosing the best option, but
+ AWS S3 is a commonly used cloud storage service compatible with BFF that
+ you may consider.{" "}
+
+ See AWS S3 documentation{" "}
+
+ CORS (Cross-Origin Resource Sharing) permissions tell your cloud storage + bucket which web origins are allowed to read its files. Without this, + browsers block BFF and web-based viewers from accessing your data. +
+
+ Example for AWS S3:{" "}
+
+ AWS CORS documentation{" "}
+
This table is summary of the in-depth use cases described below.
+| Use case | +Key BFF actions | +Time saved vs. manual | +
|---|---|---|
| + Explore screening results + | ++ Group by plate/treatment; filter by phenotype; share URL + | +Hours of scripting per query | +
| + Validate metadata + | ++ Filter for blanks/duplicates; group to check counts; export + errors + | +Days of spreadsheet auditing | +
| + Inspect image subsets + | ++ Multi-filter to exact subset; open in viewer; arrow-key + navigation + | +Hunting through folders by hand | +
| + Perform QC on datasets + | ++ Aggregate counts per group; filter for anomalies; + cross-validate columns + | +Custom scripts per dataset | +
| + Manage image inventory + | ++ Host metadata file; browse by any column; shareable filtered + URLs + | +Building and maintaining a web portal | +
| + Compare across experimental dimensions + | ++ Pivot/group across multiple metadata axes (e.g., cell line × + staining × condition); rapidly switch views + | +Rewriting analysis scripts per comparison | +
| + Collaborative data exploration + | ++ Share filtered views; maintain consistent dataset state + across users; parallel exploration + | +Back-and-forth file exchange and re-alignment | +
| + Publish interactive datasets + | ++ Share public BFF links tied to figures; enable readers to + explore full datasets in-browser + | +Building custom portals or static supplements | +
+ A high-content screening run produces tens of thousands of images across + hundreds of wells, multiple plates, and several time points. The + pipeline outputs a Parquet or CSV manifest linking each image file to + its well position, compound treatment, concentration, cell line, and + measured phenotype scores. +
++ Load the manifest into BFF and immediately group files by Plate > + Treatment > Concentration to see how many images exist at each + condition. Filter to a specific compound and sort by phenotype score to + surface the most interesting wells. Click into a well to see thumbnails + of every image at that position. Share the filtered view with a + colleague by copying the URL — they see exactly the same subset without + re-running any queries. +
++ A genomics core runs CRISPR screens and outputs per-guide results as a + CSV. Researchers load it into BFF to filter by gene target, sort by + effect size, and quickly identify which guides to follow up on — without + writing R or Python code. +
+ > + ), + }, + { + id: "validate-metadata", + heading: "Validate metadata", + level: 3, + body: ( + <> ++ Before publishing a dataset or submitting to a repository, you need to + confirm that every file has complete, consistent metadata — no missing + cell lines, no mislabeled plates, no blank file paths. +
++ Load your metadata file and use BFF's filters to find gaps. Group + by "Cell Line" and look for a blank or "(No value)" + group — those are your missing entries. Sort by "File Path" to + spot duplicates or malformed paths. Filter for rows where + "Treatment" is empty to find unlabeled conditions. Use the + aggregate count at each folder level to verify expected file counts per + condition (e.g., "I should have 96 images per plate — any plate + with fewer has missing data"). Export the problematic subset as a + CSV for correction. +
++ A museum digitization team loads their specimen catalog CSV into BFF to + check for records missing accession numbers, blank taxonomic + classifications, or broken file paths to scans — catching errors before + ingesting into their collection management system. +
+ > + ), + }, + { + id: "inspect-subsets", + heading: "Inspect subsets of images", + level: 3, + body: ( + <> ++ You don't want to look at all 50,000 images. You want to look at a + very specific slice — maybe failed QC images, or images from a + particular experimental condition, or everything captured on a specific + date. +
++ Apply filters to narrow down to exactly the subset you care about: + "Cell Line = iPSC" AND "Plate = 007" AND "QC + Status = Failed". The file list updates instantly to show only + matching files. Click any file to see its full metadata in the detail + panel. Open the image directly in your preferred viewer (FIJI, AGAVE, + Neuroglancer, or the browser) to visually inspect it. Navigate through + the filtered list with arrow keys to quickly scan through the subset + one-by-one. +
++ A pathology lab filters their slide inventory to all H&E-stained + tissue sections from a specific patient cohort and date range, then + opens each in their whole-slide viewer to confirm stain quality before + analysis. +
+ > + ), + }, + { + id: "perform-qc", + heading: "Perform QC on datasets", + level: 3, + body: ( + <> ++ Quality control means systematically checking that your data meets + expectations — correct file counts, valid value ranges, no corrupted + entries, consistent naming. Doing this manually in Excel breaks down + past a few thousand rows. +
++ Load a 2-million-row Parquet manifest and immediately see the total file + count in the aggregate info bar. Group by "Experiment > Plate + > Well" and check that each well has the expected number of + images — any well showing a lower count is a red flag. Filter for files + where "File Size" is 0 to find corrupt or empty files. Sort by + "Date Acquired" to verify temporal consistency. Group by + "Instrument" to check that all files came from the expected + microscope. Apply multiple filters simultaneously to cross-validate: + "If Plate = Control, then Treatment should be DMSO" — filter + for Control plates with non-DMSO treatments to find mislabeled rows. +
++ A data engineer receives a new batch of sequencing metadata from a + collaborator and loads it into BFF to check for duplicate sample IDs, + verify that every file path resolves to an existing object in S3 (by + sorting/filtering paths), and confirm that all expected runs are + represented before ingesting into the pipeline. +
+ > + ), + }, + { + id: "manage-inventory", + heading: "Manage image inventory", + level: 3, + body: ( + <> ++ You or your team have accumulated a large collection of files over + months or years. They live across local drives, shared network storage, + or cloud buckets. You have metadata about them — maybe a database + export, maybe a painstakingly maintained spreadsheet — and you need an + easy way to browse, search, and share access to this inventory without + maintaining a server. +
++ Export your inventory as a Parquet file (or maintain it as a CSV) with + columns for file path, file name, and any annotations that matter to + your team (project, investigator, organism, imaging modality, date, + etc.). Host the file on a web server, S3 bucket, or just keep it local. + Point BFF at it. Your entire team can now browse the inventory by any + column, search for specific files, and open them directly. Add a source + metadata file to provide human-readable descriptions for each column. + When someone asks "do we have any confocal images of iPSC-derived + cardiomyocytes from 2024?", the answer is three clicks away instead + of a Slack thread. +
++ A natural history museum has 200,000 digitized specimen records in a CSV + exported from their collection database. They host a BFF instance on + their website so visiting researchers can browse specimens by taxonomy, + collection site, and date — filtering to exactly the subset relevant to + their study and downloading a manifest of matching file paths. +
+ > + ), + }, + { + id: "real-world-scenarios", + heading: "Real-world scenarios", + body: ( + <> ++ + I have thousands of images and I just want to find the right ones. + +
++ You ran a plate screen last week and now need to find every image from + Well A3 treated with Drug X. Your files are scattered across folders, + drives, or cloud storage, with no easy way to search by experimental + conditions. BFF lets you load a spreadsheet of your file metadata and + instantly filter, sort, and group by any column—cell line, treatment, + plate, date, or anything else you need. No coding, no databases, no IT + tickets. Just drag, drop, and find your files. +
++ + I want to query millions of files without writing a pipeline to do + it. + +
++ You have a Parquet manifest with 10 million rows of imaging metadata. + You need to pull a specific subset for your next analysis run. BFF runs + full SQL queries in your browser via DuckDB—no server, no cluster, no + credentials. Filter by any combination of annotations, copy out the file + paths you need, and get back to your actual work. Share your exact query + with a collaborator by copying the URL. +
++ + I want to give users self-service data access without building a + portal. + +
++ Your team maintains the imaging pipeline. Scientists keep asking you to + "just pull all the files where..." and it turns into a JIRA + ticket every time. BFF is a zero-infrastructure frontend: point it at a + Parquet file on S3 or a CSV on a web server and your users can explore, + filter, and export on their own. No backend to deploy, no API to + maintain, no accounts to manage. Host a static web page and you're + done. +
++ I need to make my shared data actually usable. +
++ You run a core imaging facility or oversee a lab generating terabytes of + data. Your shared drive has 50,000 files and a naming convention that + made sense two years ago. BFF turns any metadata spreadsheet into a + searchable, filterable, shareable interface. Publish a dataset with a + BFF link and reviewers, collaborators, or new lab members can explore it + immediately—no software to install, no accounts to create. +
++ I want to make my collection metadata interactive. +
++ You have a CSV with 200,000 digitized specimens, each with accession + numbers, taxonomic classifications, collection dates, and file paths to + high-resolution scans. BFF turns that spreadsheet into a browsable, + filterable, groupable interface—right in the browser. Let researchers + explore your collection by species, date range, or geographic origin. + Share a filtered view as a URL. No web developer needed. +
+ > + ), + }, + ], + }, + + "real-world-use-cases/example-aics": { + title: "The cell science accelerator at Allen Institute", + intro: + "BioFile Finder (BFF) was used in publication by the cell science accelerator at Allen Institute.", + sections: [ + { + id: "publication", + heading: "", + body: ( + <> +
+
+ Open publication{" "}
+
+
+ View dataset in BFF{" "}
+
+ In this study on epithelial-to-mesenchymal transition (EMT), the authors + generated a large-scale microscopy dataset consisting of 3,538 3D + Z-stack datasets across 37 experimental conditions, 8 cell lines, and 9 + antibody stainings. BioFile Finder (BFF) was used to organize and + explore this complex dataset without relying on a fixed folder + hierarchy. Instead, BFF enabled dynamic filtering, grouping, and + navigation based on metadata, allowing users to analyze the data across + multiple dimensions (e.g., comparing stainings across cell lines) + without duplicating files. This approach improved collaboration between + experimental and computational researchers, supported parallel analysis + workflows, and reduced friction in large-scale data exploration. + Additionally, BFF was used to share the dataset publicly, enabling + readers to directly access figure-associated data, explore full 3D + timelapse datasets in the browser, and interact with the dataset using + the same flexible metadata-driven framework. +
+++ > + ), + }, + ], + }, + + "real-world-use-cases/example-aibs": { + title: "The brain science accelerator at Allen Institute", + intro: + "BioFile Finder (BFF) was used in publication by the brain science accelerator at Allen Institute.", + sections: [ + { + id: "publication", + heading: "", + body: ( + <> ++ “Every organizational choice comes at the cost of another. In + other words, every choice is a bad choice.” — Antoine + Borensztejn, author +
++ “BioFile Finder (BFF) allowed us to break away from this + constraint entirely.” — Antoine Borensztejn,{" "} + author +
++ “We believe this approach sets a new standard for FAIR data + sharing, and will significantly improve the accessibility, + transparency, and reuse of complex biological datasets.” + — Antoine Borensztejn, author +
+
+
+ Open publication{" "}
+
{" "}
+
+ View dataset in BFF{" "}
+
+ Yoav Ben-Simon from the Allen Institute for Brain Science describes + using BioFile Finder (BFF) as a flexible data management and sharing + platform for imaging datasets related to viral vector targeting in the + brain. BFF was used to organize datasets in a spreadsheet-like + interface, enabling intuitive querying, filtering, and restructuring of + data without requiring custom software development. The tool allowed + users to quickly create and curate datasets, organize them + hierarchically based on relevant features, and visualize grouped image + sets with thumbnails. This significantly lowered the barrier to entry + for data management and sharing, enabling non-engineers to deploy and + share datasets via simple links rather than building dedicated web + interfaces. Additionally, BFF facilitated collaboration by allowing + teams to interact with shared datasets dynamically and supported reuse + across different domains, extending from cell imaging to brain section + and genomic data visualization. +
+++ > + ), + }, + { + id: "video", + heading: "", + body: <>{/* TODO: Add link to video once it is made public */}>, + }, + ], + }, + + "real-world-use-cases/example-isas": { + title: "AMBIOM at ISAS", + sections: [ + { + id: "publication", + heading: "", + body: ( + <> + {/* TODO: Add publication link once available */} ++ “BioFile Finder is a data management tool… like a fancy + spreadsheet so that you can interact with it in multiple different + ways.” — Yoav Ben-Simon, author +
++ “I can create and curate data sets in two or three clicks of a + button.” +
++ “It doesn't require exchanging of files—it just + requires exchanging of links.” — Yoav Ben-Simon,{" "} + author +
++ “It was really easy for us to repurpose it… from + looking at individual cells to looking at images of brain sections + and genomic data.” — Yoav Ben-Simon, author +
+
+ Content coming soon. +
+ > + ), + }, + ], + }, + + "real-world-use-cases/other-examples": { + title: "Other examples", + sections: [], + }, +}; diff --git a/packages/web/src/components/UserGuide/content/types.ts b/packages/web/src/components/UserGuide/content/types.ts new file mode 100644 index 000000000..2c5d10852 --- /dev/null +++ b/packages/web/src/components/UserGuide/content/types.ts @@ -0,0 +1,18 @@ +import * as React from "react"; + +export interface PageSection { + id: string; + heading?: string; + /** + * Heading level rendered for this section (h2, h3, or h4). + * Defaults to h2 if omitted. Use h3/h4 for subsections within a page. + */ + level?: 2 | 3 | 4; + body: React.ReactNode; +} + +export interface PageContent { + title: string; + intro?: string; + sections: PageSection[]; +} diff --git a/packages/web/src/components/UserGuide/index.tsx b/packages/web/src/components/UserGuide/index.tsx new file mode 100644 index 000000000..d05f155ff --- /dev/null +++ b/packages/web/src/components/UserGuide/index.tsx @@ -0,0 +1,75 @@ +import * as React from "react"; +import { Navigate, useLocation, useParams } from "react-router-dom"; + +import { PrimaryButton } from "../../../../core/components/Buttons"; +import DocPage from "./DocPage"; +import Sidebar from "./Sidebar"; +import styles from "./UserGuide.module.css"; + +export default function UserGuide() { + const { sectionSlug, pageSlug } = useParams<{ + sectionSlug: string; + pageSlug: string; + }>(); + const location = useLocation(); + const [menuOpen, setMenuOpen] = React.useState(false); + const contentRef = React.useRef