Welcome to NLP server repo

System overview

The purpose of this system is to provide NLP services to platforms like DEEP. At the moment, DEEP is the major consumer of the services but in future we hope to have more consumers.

There are various NLP services deployed in AWS like text extraction, classification, summarization, etc which make use of different AWS services like lambda and ecs. The NLP server provides a consistent REST wrapper for all of those services.

Besides the NLP services, this system is also highly intended to monitor the performance of NLP services, primarily classification service. Thus, data from DEEP is periodically pulled and stored in the database in order to run the monitoring scripts.

More details on the infrastructure can be found here.

Core Entities

ToFetchProject: Project in DEEP for which the data needs to be fetched and processed/monitored. Typically this is to fetch the entry and leads for that project.
DeepDataFetchTracker: To track when were the last analysis framework and orgs fetched from DEEP.
Organization: To store relevant information for Organization in DEEP
AFMapping: To store relevant information for Analysis Framework in DEEP that included tags and mappings
Project: Corresponding Project in DEEP but with subset of project info.
Lead: Corresponding Lead in DEEP but with subset of project info.
Entry: Corresponding Entry in DEEP but with subset of project info.
NLPRequest: To keep track of each incoming request and its status.
FailedCallback: To keep track of failed tasks in ECS which were triggered from here.
ClassificationModel: To keep track of classification models(name, url, version) deployed in AWS.

Performance related Entities

ClasificationPredictions
ProjectWisePerfMetrics
TagWisePerfMetrics
AllProjectPerfMetrics
CategoryWiseMatchRatios
ProjectWiseMatchRatios
ComputedFeatureDrift

Development

Local Setup

Clone the repo
Set DEEP database details in .env file. Generally, you might need to set up a proxy to deep database using ssh if data is being accessed from alpha/prod server.
Run docker-compose up
Use poetry to add/remove pacakges.

Authentication

Every request to nlp-server should contain a valid token in the header as Authorization: Bearer <token>.

Token generation

From the admin panel(https://HOST:PORT/admin), create a user if not already created and then create a token for the user.
Distribute the token to clients.

Periodic/Background Tasks

fetch_new_projects: This fetch newly added active projects and stores them as ToFetchProject.
fetch_deep_data: This fetches leads, entries, afs, orgs from DEEP based on ToFetchProject.
calculate_model_metrics: This calculates and stores various performance metrics for NLP models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!