Skip to content

TakemFidel/datascience-stunning-engine

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spark Text Lab

Quick start

  1. Create a Python virtualenv and install dependencies:
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt
python -m nltk.downloader punkt
  1. Start MongoDB locally (or use a remote URI). Update MONGO_URI in the script or pass it as an argument.

  2. Run quick test with sample corpus:

python spark_text_lab.py --corpus sample_corpus.txt --mongo-uri mongodb://localhost:27017

Files

  • spark_text_lab.py: main script with implementations for Exercises 1-6.
  • sample_corpus.txt: small sample corpus for quick testing.

Notes

  • The notebook Spark_Text_Exercises.ipynb shows step-by-step usage (created next).
  • The script uses PySpark in local mode; to run on a cluster, adjust SparkSession builder settings.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 98.6%
  • Other 1.4%