This repository proposes a possible next step in the evolution of free-text data processing originally implemented in CogStack-Pipeline, moving towards a more modular, Platform-as-a-Service (PaaS) approach.
CogStack-NiFi demonstrates how to use Apache NiFi as the central data workflow engine for clinical document processing, integrating services such as text extraction and natural language processing (NLP). Each component runs as a standalone service, with NiFi handling data routing between components and data sources/sinks.
All NLP services are expected to implement a uniform RESTful API, allowing seamless integration into existing pipelines—making it easy to incorporate any NLP application into the stack.
This project is under active development. New features or services may impact existing deployments. Please review the release notes and documentation before upgrading.
Need help? Feel free to:
- Open an issue on the GitHub Issue Tracker
- Start a discussion on our Discourse forum (actively monitored by the dev team)
Folder | Description |
---|---|
nifi |
Custom Apache NiFi Docker image with workflows, configs, drivers, and user resources. |
security |
Scripts for generating SSL certificates and other security-related tools. |
services |
NLP and auxiliary services, each with its own configs and resources. |
deploy |
Example deployment setup, combining NiFi and related services. |
scripts |
Helper scripts (e.g., setup tools, sample DB ingestion, Elasticsearch ingestion). |
data |
Place any test or ingested data here. |
Prerequisites:
- Docker (mandatory)
- Basic knowledge of Python and Linux/UNIX systems
📖 Official documentation: cogstack-nifi.readthedocs.io
🚀 New to the project? Start with the deployment guide for example setups and workflows.
🐞 For troubleshooting or bug reports, consult the Known Issues section before opening a ticket.
Check the IMPORTANT_NEWS section regularly for:
- Major changes to project structure or configuration
- Security advisories or vulnerabilities affecting deployments