Details about the problem can be found at:
https://github.com/InsightDataScience/cc-example
Python script (insight.py) has comments detailing asymptotic bounds on each function. Specially, blist's sortedlist was used that has O(logn) insert and access times for sorted list.
Machine:16GB memory with 8 cores and SSD
Total input files: 4929
Total combined lines in the files: 2325714
Total time taken: 3m53.597s
Total size: 134 M