DEPRICATED: please see for updated workflow.
- Split files alphabetically 5 ways: 0-Dm, Dn-Dz, E-K, L-Nd, Ne-Z.
- Combine each of five files with corresponding file from previous load.
- Add additional columns (same as serials).
- Save files as UTF-8 CSV.
- Load each file into OpenRefine.
a. Uncheck "Parse cell text into...", "Store blank rows"
b. Set character encoding to UTF-8 - Remove OSTI and ebrary titles by faceting on URL column.
- Create new column by combining values from Title, Resource, and URL columns. Name new column "Combined".
- Run duplicate facet on Combined.
For Add/Change:
- Remove "true" from duplicate facet. (delete all matching rows)
- Facet by Sheet Date. Delete "Earlier"
- Remove Row Number, Sheet Date, and Combined.
- Export spreadsheet as CSV.
For Delete:
- Remove "true" from duplicate facet. (delete all matching rows)
- Facet by Sheet Date. Delete "Later"
- Remove Row Number, Sheet Date, and Combined.
- Export spreadsheet as CSV.
After running all five spreadsheets through OpenRefine, combine results into a single spreadsheet for all Add/Change and all Delete.