erichpeterson/pgfim
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Repository files navigation
IMPORTANT: Two pieces of software must be installed to compile PGFIM: 1. The high-precision static library ARPREC, which one can download and install from the following website: http://crd-legacy.lbl.gov/~dhbailey/mpdist/ 2. The mathematics library Boost, which one can download and install from the following website: http://www.boost.org/ COMPILING: To compile on a Linux-based machine, modify the Makefile file, to your environment and locations of the two software libraries installed above. One can find where the locations are set in the Makefile, next to the CXX_CFLAGS and CXX_LDFLAGS variables. Once your environment is set correctly, simply type 'make' at the command-line in the directory of the source code, and the executable PGFIM should be created. RUNNING: ./PGFIM -tau <TAU> -inputDB <INPUT_DB_FILENAME> -inputXML <INPUT_XML_FILENAME> [optional_args] Sample Execution: ./PGFIM -tau 0.9 -minsup 3000 -inputDB input.data -inputXML taxonomy.xml Command Line Options: -minsup minimum support threshold: [0.0-Size of DB] default: 1 -tau Tau frequent probability threshold: (0.0-1.0] (Required) -inputDB Input DB file name (Required) -inputXML Input XML taxonomy (Required) -calc Frequentness calculation method: {dynamic} default: dynamic -output Output file name -exec Execution stats file name -print Print found itemsets: {0 | 1} (0 = false, 1 = true) default 0 DATABASE FILE & TAXONOMY FORMAT: Attributes in the database file are to be space-delimited, and transactions/instances should be new-line delimited. Currently, only the natural numbers can be used as attribute names / items. Following each attribute name is a colon and then the probability of that attribute occurring. IMPORTANT: The last item and probability of each transaction must be followed by a space. Example database with 3 transactions: 1:0.9 2:0.23 3:0.6 2:0.2 4:0.8 1:0.4 2:0.54 An XML file is used to represent the taxonomy to be used. See a sample taxonomy in the datasets directory, as well as, example databases. CITATION: Erich A. Peterson and Peiyi Tang. Mining Probabilistic Generalized Frequent Itemsets in Uncertain Databases. In Proceedings of the 51tst ACM Southeast Conference (ACMSE), Savannah, GA, USA, April 2013.