This repository contains infrastructure code and application code for a comprehensive factor mining platform on AWS.
The platform consists of several components:
- Data Storage: Clickhouse database and S3 buckets
- Data Collection: Lambda functions to collect market data, SEC filings, financial reports, and web search results
- Processing: AWS Step Functions and AWS Batch for factor mining
- Visualization: Streamlit application for visualizing results
-
terraform/: Terraform code for provisioning AWS resources0-prepare/: Initial setup and prerequisites1-networking/: VPC, subnets, and networking components2-clickhouse/: Clickhouse database infrastructure3-jump-host/: Jump host for secure access to resources4-market-data/: Market data collection resources5-web-search/: Web search data collection resources6-sec-filing/: SEC filings data collection resources7-financial-report/: Financial report processing resources8-factor-mining/: Factor mining infrastructure9-visualization/: Visualization application infrastructuremodules/: Reusable Terraform modulesdeployAll.sh: Script to deploy all modulesdestroyAll.sh: Script to destroy all resources
-
src/: Application source codemarket-data/: Market data collection codesec-filing/: SEC filings data collection codeweb-search/: Web search data collection codefactor-modeling-model/: Factor modeling algorithms and modelsstep-function/: AWS Step Functions workflow definitionsvisualization/: Streamlit visualization applicationfinancial-report-processor/: Financial report processing code
The platform is designed to be deployed in modules, allowing customers to choose which components to deploy.
- AWS account
- Terraform 1.0+
- Python 3.12
- AWS CLI configured
The modules should be deployed in the following order to ensure dependencies are satisfied:
- Prepared lambda layers
- Networking
- Clickhouse
- Jump host (requires networking and ClickHouse to be deployed first)
- 5-7. Data collection modules (can be deployed independently)
- Factor mining
- Visualization
-
Prepare the environment with Lambda layers:
cd terraform/0-prepare ./deploy.sh -
Deploy base networking:
cd terraform/1-networking ./deploy.sh -
Deploy Clickhouse:
cd terraform/2-clickhouse ./deploy.sh -
Deploy jump host for secure access to Clickhouse:
cd terraform/3-jump-host ./deploy.sh -
Deploy data collection modules to calculate factors:
# Market data cd terraform/4-market-data ./deploy.sh # Web search cd terraform/5-web-search ./deploy.sh # SEC filing cd terraform/6-sec-filing ./deploy.sh # Deploy financial report processor cd terraform/7-financial-report ./deploy.sh -
Deploy factor mining process:
cd terraform/8-factor-mining ./deploy.sh -
Deploy visualization:
cd terraform/9-visualization ./deploy.sh
The visualization component provides critical insights into factor performance and statistical significance:
T-stat measures statistical significance of factors. It's crucial in factor modeling to determine which factors reliably predict returns. The following chart displays T-statistic distributions for various factor types in financial modeling after running the same factor for different data ranges on DJIA 30.
R-squared measures the proportion of variance in the stock return explained by the factors. The chart shows R-squared distributions for different factor types after running the same factor for different data ranges on DJIA 30. This helps identify which factors explain more variance in stock returns compared to other factor types.
The following chart takes Beta into consideration, which measures a factor's sensitivity to market movements. The chart displays R-squared as bubble size, representing a factor's explanatory power.
You can go to the folder you deployed, and run
terraform destroy
Alternatively, you can use the provided script to deploy and destroy all modules:
./terraform/deployAll.sh
or
./terraform/destroyAll.sh
- Use remote state for production deployments
- Use variables for customization
- Follow the principle of least privilege for IAM roles
- Tag all resources appropriately
- Use consistent naming conventions



