This project demonstrates a real-time stock price analysis pipeline using Apache Flink, AWS Kinesis, S3, and Lambda functions. The pipeline computes technical indicators and detects anomalies in stock prices.
- Objective: Compute CMGR (Compound Monthly Growth Rate), 10-day EMA (Exponential Moving Average), and detect anomalous price drops (≥8%) in real-time stock data.
- Data: Historical AMD stock prices (2021–2022) stored in CSV format.
- Pipeline Flow:
- Historical CSV data is loaded into a Kinesis stream via
parse_csv_lambda. - Flink application processes the stream to compute indicators and detect anomalies.
- Results are stored in S3 (
part_a,part_b,part_c) and can be accessed viaoutput_lambda. input_lambdacan simulate real-time stock price updates for testing the pipeline.
- Historical CSV data is loaded into a Kinesis stream via
CSV (AMDprices2021-2022.csv)
│
▼
parse CSV lambda / input lambda
│
▼
Kinesis Data Stream (mp10-new-data-stream)
│
▼
Flink Managed Job
┌─────────────┬──────────────┬─────────────┐
│ Exercise A │ Exercise B │ Exercise C │
│ CMGR │ EMA │ Anomaly │
└─────────────┴──────────────┴─────────────┘
│ │ │
▼ ▼ ▼
S3/part_a S3/part_b S3/part_c
│ │ │
└──────────┴──────────────┘
│
▼
output lambda
project/
├─ notebooks/
│ ├─ Exercise_A.ipynb # CMGR calculation
│ ├─ Exercise_B.ipynb # 10-day EMA calculation
│ ├─ Exercise_C.ipynb # Anomaly detection using MATCH_RECOGNIZE
│ ├─ Exercise_D.ipynb # Integrated Flink pipeline (A+B+C)
├─ lambda/
│ ├─ parse_csv_lambda.py # Load historical CSV into Kinesis
│ ├─ input_lambda.py # Simulate real-time stock data into Kinesis
│ └─ output_lambda.py # Read results from S3
├─ example_data/
│ └─ AMDprices2021-2022_sample.csv # Sample data
- Implemented CMGR UDF in PyFlink to compute compound monthly growth rates.
- Developed 10-day EMA calculation in Flink SQL with sliding window.
- Designed anomaly detection using
MATCH_RECOGNIZEfor sudden price drops. - Integrated all analyses into a single Flink pipeline (
Exercise_D.ipynb) writing results to S3. - Configured Lambda functions to simulate real-time stock data and retrieve analysis results.
- Upload sample CSV to S3.
- Deploy Lambda functions (
parse_csv_lambda,input_lambda,output_lambda) with appropriate IAM permissions. - Submit the Flink pipeline (
Exercise_D.ipynb) to AWS Kinesis Managed Flink. - Use
input_lambdato push real-time data for testing. - Access results via
output_lambdaor S3 JSON files.
- Historical data is used to initialize the pipeline and compute baseline indicators.
input_lambdasimulates real-time data, allowing Flink to update indicators and detect anomalies dynamically.- This project demonstrates stream processing, UDF creation, window functions, and anomaly detection with Apache Flink on AWS.