Skip to content

Welcome to NLP server repo

Bibek Pandey edited this page Jul 31, 2023 · 2 revisions

System overview

The purpose of this system is to provide NLP services to platforms like DEEP. At the moment, DEEP is the major consumer of the services but in future we hope to have more consumers.

There are various NLP services deployed in AWS like text extraction, classification, summarization, etc which make use of different AWS services like lambda and ecs. The NLP server provides a consistent REST wrapper for all of those services.

Besides the NLP services, this system is also highly intended to monitor the performance of NLP services, primarily classification service. Thus, data from DEEP is periodically pulled and stored in the database in order to run the monitoring scripts.

More details on the infrastructure can be found here.

Core Entities

  • ToFetchProject: Project in DEEP for which the data needs to be fetched and processed/monitored. Typically this is to fetch the entry and leads for that project.
  • DeepDataFetchTracker: To track when were the last analysis framework and orgs fetched from DEEP.
  • Organization: To store relevant information for Organization in DEEP
  • AFMapping: To store relevant information for Analysis Framework in DEEP that included tags and mappings
  • Project: Corresponding Project in DEEP but with subset of project info.
  • Lead: Corresponding Lead in DEEP but with subset of project info.
  • Entry: Corresponding Entry in DEEP but with subset of project info.
  • NLPRequest: To keep track of each incoming request and its status.
  • FailedCallback: To keep track of failed tasks in ECS which were triggered from here.
  • ClassificationModel: To keep track of classification models(name, url, version) deployed in AWS.

Performance related Entities

  • ClasificationPredictions
  • ProjectWisePerfMetrics
  • TagWisePerfMetrics
  • AllProjectPerfMetrics
  • CategoryWiseMatchRatios
  • ProjectWiseMatchRatios
  • ComputedFeatureDrift

Development

Local Setup

  • Clone the repo
  • Set DEEP database details in .env file. Generally, you might need to set up a proxy to deep database using ssh if data is being accessed from alpha/prod server.
  • Run docker-compose up
  • Use poetry to add/remove pacakges.

Authentication

Every request to nlp-server should contain a valid token in the header as Authorization: Bearer <token>.

Token generation

  • From the admin panel(https://HOST:PORT/admin), create a user if not already created and then create a token for the user.
  • Distribute the token to clients.

Periodic/Background Tasks

  • fetch_new_projects: This fetch newly added active projects and stores them as ToFetchProject.
  • fetch_deep_data: This fetches leads, entries, afs, orgs from DEEP based on ToFetchProject.
  • calculate_model_metrics: This calculates and stores various performance metrics for NLP models.