Skip to content

Mario5T/Sleep_Deeprived

Repository files navigation

Sleep_Deprived: Exploratory Data Analysis on Dua Lipa Lyrics & Zipf's Law

This project investigates whether Dua Lipa's lyrics exhibit characteristics consistent with Zipf's Law, a fundamental linguistic phenomenon.
We performed a comprehensive Exploratory Data Analysis (EDA) on 246 of her songs, exploring word frequencies, token distributions, and lyrical structures.


๐Ÿ“Œ Objective

  • Analyze Dua Lipaโ€™s lyrical data for statistical patterns.
  • Test Zipfโ€™s Law, which states that word frequency is inversely proportional to its rank.
  • Explore lexical diversity, word distributions, and structural insights into her songwriting.

๐Ÿ› ๏ธ Tech Stack & Tools

  • Python 3
  • Jupyter Notebook
  • Libraries:
    • pandas โ€“ data preprocessing
    • numpy โ€“ numerical analysis
    • matplotlib / seaborn โ€“ visualization
    • nltk โ€“ natural language processing


๐Ÿงน Data Cleaning & Preprocessing

  • Handled missing values to ensure dataset integrity.
  • Normalized text: converted to lowercase, removed punctuation.
  • Tokenized lyrics into individual words.

๐Ÿ” Analysis Performed

1. Univariate Analysis

  • Distribution of word frequencies โ†’ revealed a long-tailed distribution.
  • Top 20 most frequent words included common terms like โ€œyouโ€, โ€œIโ€, etc.

2. Zipfโ€™s Law Validation

  • Log-log plot of rank vs frequency showed a linear relationship.
  • Regression slope โ‰ˆ -1.57, close to the expected -1.
  • Rยฒ = 0.96, indicating strong adherence to Zipfโ€™s Law.

3. Bivariate & Multivariate Analysis

  • Correlation heatmaps revealed relationships between word counts, unique words, and average word length.
  • Scatterplots/pairplots showed structural patterns in songwriting.

๐Ÿ“Š Key Findings

  1. Zipfโ€™s Law Adherence

    • Dua Lipaโ€™s lyrics strongly align with Zipfโ€™s Law.
  2. Lexical Diversity

    • Rich vocabulary with a few high-frequency words and many rare words.
  3. Structural Insights

    • Complexity in songwriting highlighted through relationships between unique word counts, word length, etc.

๐ŸŽฏ Learning Outcomes

Through this project, we gained experience in:

  • Text preprocessing & NLP basics.
  • Applying statistical linguistics (Zipfโ€™s Law).
  • Data visualization & correlation analysis.
  • Collaborative research and presentation of results.

๐Ÿ”ฎ Future Work

  • Extend analysis to other artists for comparison.
  • Perform sentiment analysis on lyrics.
  • Create an interactive dashboard (Streamlit/Dash).

๐Ÿ‘ฅ Team Members

  • Aditya Singh โ€“ Data collection & cleaning, EDA framework.
  • Pratik Kumar Pan โ€“ Multivariate analysis, heatmaps, visualizations.
  • Shreyas Sarkar โ€“ Zipfโ€™s Law analysis & interpretation.
  • Priyank Gaur โ€“ Documentation & presentation design.

๐Ÿ“œ License

This project is for educational purposes only.
Lyrics dataset is used under fair use for linguistic analysis.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published