Skip to content

UCLA-BD2K/Aztec-Duplicate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Download new set of data from Solr.

Run main.java to create mappings. - Change id on line 355 in BaseCases.java to updated id - Final results are in all_name_combined_tools.json and all_name_mapping.csv. - Train folder should be filled with data

Run preprocess.py to set up for doc2vec. Run doc2vec_train.py to train doc2vec and get vectors.

Notes: If changes are made to the metadata, the combination function should be updated. Currently matching on names; update code block at line 44 on Main.java if different criteria wanted.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors