Python script to integrate four disparate datasets from ACM, IEEE, Springer, and Web of Science. Automated the merging, deduplication, and cleaning of records to create unified research repositories for Computer Science and Marketing.
This repository contains a suite of Python scripts designed to automate the consolidation of academic publication data from various databases (ACM, IEEE, Springer, and Web of Science).
Filename: combine_cs_sec_files.py
This script targets Computer Science Security research. It reads CSV files prefixed with [cs_sec], standardizes the column headers, and tags each entry with its source database.
Plaintext
user@terminal:~/project$ python combine_cs_sec_files.py
Starting the combination script for CS Security files...
Successfully processed: [cs_sec] acm.csv
Successfully processed: [cs_sec] IEEE.csv
Successfully processed: [cs_sec] springer.csv
Successfully processed: [cs_sec] WoS.csv
-------------------------------------------
Success! All CS Security files have been combined.
Total rows in the new file: 1,420
Output saved as: Combined_CS_Security_Publications.csv
-------------------------------------------
Filename: combine_cs_files.py
This script is used for general Computer Science datasets. It handles the bulk merging of files labeled with the [cs] prefix and ensures that "Item Title" and "Item DOI" are converted into a unified format.
Plaintext
user@terminal:~/project$ python combine_cs_files.py
Starting the combination script for CS files.....
Successfully processed: [cs] acm.csv
Successfully processed: [cs] IEEE.csv
Successfully processed: [cs] springer.csv
Successfully processed: [cs] WoS.csv
-------------------------------------------
Success! All CS files have been combined.
Total rows in the new file: 3,850
Output saved as: Combined_CS_Publications.csv
-------------------------------------------
Filename: combine_mk_files.py
Specifically tailored for Marketing research datasets ([mk]). This script uses latin1 encoding during the read process to ensure that special characters often found in international marketing journals (like accents or symbols) are preserved without crashing the script.
Plaintext
user@terminal:~/project$ python combine_mk_files.py
Starting the combination script for Marketing files... 🚀
Successfully processed: [mk] acm.csv
Successfully processed: [mk] IEEE.csv
Successfully processed: [mk] springer.csv
Successfully processed: [mk] WoS.csv
-------------------------------------------
Success! All Marketing files have been combined.
Total rows in the new file: 945
Output saved as: Combined_Marketing_Publications.csv
-------------------------------------------
Bash
pip install pandas