Skip to content

A sophisticated C++ data analysis platform for processing and analyzing World Bank Development Indicators, featuring multiple advanced data structures and algorithms for efficient data storage, retrieval, and relationship analysis between countries.

Notifications You must be signed in to change notification settings

SaqAsh/World-Development-Analytics-Engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

World Development Analytics Engine

A sophisticated C++ data analysis platform for processing and analyzing World Bank Development Indicators, featuring multiple advanced data structures and algorithms for efficient data storage, retrieval, and relationship analysis between countries.

Project Overview

This system processes large-scale economic, social, and environmental datasets to uncover relationships between countries based on various development indicators. It supports complex queries, statistical analysis, and graph-based relationship modeling, making it ideal for economic research and policy analysis.

🏗️ Technical Architecture

Core Data Structures

Binary Search Tree (BST)

  • Purpose: Efficient country data organization and range queries
  • Implementation: Custom BST with multi-country nodes supporting deletions, insertions, and traversals
  • Algorithms:
    • In-order traversal for sorted data retrieval
    • Recursive insertion/deletion with tree balancing
    • Min/Max finding for statistical analysis
  • Time Complexity: O(log n) for search, insert, delete operations

Graph with Adjacency Lists

  • Purpose: Country relationship modeling based on statistical thresholds
  • Implementation: Unordered map-based adjacency lists with custom Edge objects
  • Algorithms:
    • Breadth-First Search (BFS) for pathfinding between countries
    • Dynamic edge creation based on statistical comparisons
    • Connectivity analysis for relationship networks
  • Time Complexity: O(V + E) for BFS traversal

Hash Table with Double Hashing

  • Purpose: O(1) country lookup and data access
  • Implementation: Custom double hashing with primary and secondary hash functions
  • Features:
    • Collision resolution using secondary hashing
    • Dynamic load factor management
    • Efficient country code to data mapping
  • Time Complexity: O(1) average case for lookup, insert, delete

Linked List

  • Purpose: Time series data management for each country
  • Implementation: Custom linked list with TimeSeries nodes
  • Features: Dynamic memory management with proper cleanup

Dynamic Arrays

  • Purpose: Efficient time series data storage with automatic resizing
  • Implementation: Custom resizing algorithms for optimal memory usage
  • Features: Automatic capacity management and data validation

📊 Advanced Algorithms

Statistical Analysis

  • Mean Calculation: Efficient computation for large datasets
  • Monotonic Sequence Detection: Pattern recognition in time series
  • Linear Regression: Best-fit line calculation (y = mx + b)
  • Threshold-based Comparisons: Statistical relationship analysis

Graph Algorithms

  • Breadth-First Search: Shortest path finding between countries
  • Connectivity Analysis: Determining if countries are related through indicators
  • Dynamic Graph Construction: Real-time edge creation based on data analysis

Search & Retrieval

  • Range Queries: Efficient searching within statistical ranges using BST properties
  • Multi-criteria Filtering: Complex queries across multiple indicators
  • Pattern Matching: Country identification based on statistical patterns

🔧 Key Features

Data Management

  • LOAD: Import World Bank CSV datasets
  • INSERT: Add new countries and time series data
  • UPDATE: Modify existing data points
  • DELETE: Remove countries or specific indicators
  • LOOKUP: O(1) country data retrieval

Analysis Operations

  • LIST: Display country information and available indicators
  • RANGE: Statistical range analysis across indicators
  • LIMITS: Find minimum and maximum values
  • FIND: Search countries meeting specific criteria
  • BUILD: Construct BST for efficient querying

Graph Operations

  • INITIALIZE: Build country relationship graph
  • UPDATE_EDGES: Create connections based on statistical thresholds
  • ADJACENT: Find countries with direct relationships
  • PATH: Determine connectivity between any two countries
  • RELATIONSHIPS: Analyze specific country pair connections

🎯 Real-World Applications

  • Economic Policy Analysis: Compare countries with similar economic indicators
  • Development Pattern Recognition: Identify trends across nations
  • Regional Analysis: Find countries with shared characteristics
  • Investment Decision Support: Analyze market relationships
  • Academic Research: Large-scale comparative studies

📁 Project Structure

├── InputHandler.cpp         # Main program and command processing
├── Graph.h/cpp             # Graph implementation with BFS algorithms
├── BST.h/cpp               # Binary Search Tree with advanced operations
├── Countries.h/cpp         # Hash table implementation for country management
├── Country.h/cpp           # Individual country data management
├── TimeSeries.h/cpp        # Statistical analysis and data processing
├── LinkedList.h/cpp        # Time series data organization
├── g_node.h/cpp           # Graph node implementation
├── Edge.h                  # Graph edge representation with operator overloading
├── BST_Node.h/cpp         # BST node implementation
├── Node.h/cpp             # Linked list node implementation
└── Makefile               # Build configuration

⚡ Performance Characteristics

Operation Data Structure Time Complexity Space Complexity
Country Lookup Hash Table O(1) avg O(n)
Range Query BST O(log n + k) O(n)
Path Finding Graph (BFS) O(V + E) O(V)
Statistical Analysis Dynamic Array O(n) O(n)
Relationship Building Graph O(V²) O(V + E)

🛠️ Installation & Usage

Prerequisites

  • C++17 compatible compiler
  • Make utility

Compilation

make

Running the Program

./a.out

Sample Commands

LOAD lab2_multidata.csv
INITIALIZE
UPDATE_EDGES EG.ELC.ACCS.ZS 95.0 >
ADJACENT CAN
PATH USA CHN
RELATIONSHIPS USA CAN

📈 Dataset Compatibility

The system processes World Bank Development Indicators in CSV format, supporting various metrics including:

  • Economic indicators (GDP, inflation, trade)
  • Social indicators (education, healthcare, demographics)
  • Environmental indicators (energy, emissions, resources)
  • Infrastructure indicators (electricity access, internet penetration)

🎓 Technical Highlights

  • Multi-paradigm Design: Object-oriented architecture with efficient memory management
  • Advanced Data Structures: Custom implementations of BST, Graph, Hash Table, and Linked List
  • Algorithm Optimization: BFS pathfinding, statistical analysis, and dynamic memory management
  • Real-world Application: Processing large-scale economic datasets for policy analysis
  • Performance Engineering: O(1) lookups, O(log n) queries, and efficient graph operations
  • System Architecture: Modular design with proper abstraction and encapsulation

🔍 Code Quality Features

  • Memory Management: Custom destructors and proper cleanup for all data structures
  • Error Handling: Robust input validation and edge case management
  • Modular Design: Clean separation of concerns across multiple classes
  • Documentation: Comprehensive inline documentation and design decisions
  • Testing: Extensive validation with real-world datasets

This project demonstrates advanced C++ programming skills, algorithm implementation, and real-world data analysis capabilities suitable for roles in software engineering, data science, and quantitative analysis.

About

A sophisticated C++ data analysis platform for processing and analyzing World Bank Development Indicators, featuring multiple advanced data structures and algorithms for efficient data storage, retrieval, and relationship analysis between countries.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published