[Feature] Implement Centralized Pydantic Data Validation and Normalization Pipeline for Crawler Services

### 🎯 Is your feature request related to a problem?
As more spiders are introduced to the crawler system (such as the recent Glassdoor and Internshala additions), data consistency becomes a challenge. Different job boards structure their raw scraped payloads slightly differently (e.g., varying date formats, mismatched casing, or missing optional fields like salary ranges or exact company locations). 

Currently, pushing raw dictionaries directly down the pipeline can cause silent database insertion failures or structural inconsistencies in PostgreSQL.

### ✨ Describe the proposed solution
I propose introducing a strict data validation and normalization layer using **Pydantic v2** right before data is dispatched to Redis Streams or the Postgres database layer. 

By defining unified data models, we can:
1. Guarantee runtime type safety and fail-fast validation for all inbound scraped jobs/contacts.
2. Implement custom Pydantic validators (`@field_validator`) to normalize data on the fly (e.g., converting strings to standard datetime objects, stripping whitespace, and enforcing lowercased email fields).
3. Provide safe fallback defaults for non-mandatory missing attributes.

### 🛠️ Technical Implementation Steps
* **Define Schemas:** Create a centralized `schemas/` directory or update models in the backend to define `JobIngestModel` and `ContactIngestModel` using Pydantic.
* **Data Cleansing:** Add field validators to clean and sanitize text fields, format URLs, and enforce structural constraints.
* **Pipeline Integration:** Wrap the incoming message consumer or spider output pipeline in a validation block:
  ```python
  try:
      validated_job = JobIngestModel(**raw_scraped_data)
  except ValidationError as e:
      logger.error(f"Drop invalid job payload: {e.json()}")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Implement Centralized Pydantic Data Validation and Normalization Pipeline for Crawler Services #128

🎯 Is your feature request related to a problem?

✨ Describe the proposed solution

🛠️ Technical Implementation Steps

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Feature] Implement Centralized Pydantic Data Validation and Normalization Pipeline for Crawler Services #128

Description

🎯 Is your feature request related to a problem?

✨ Describe the proposed solution

🛠️ Technical Implementation Steps

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions