The Analysis Tool is a simple command-line tool written in Java to compute different precision metrics from a given Excel file. It was originally developed for analyzing exported result files from LimeSurvey. Additionally, it was adpated to a certain structure used for analyzing and evaluating user competences. The input Excel files need to be in this certain structure which is described under [Excel structure] (../Analysis#Excel-Structure). However, the tool can be easily adopted to other LimeSurvey structures. Supported metrics are: Precision@k, Mean Average Precision (MAP) and normalized Discounted Cumulative Gain (nDCG).
The analysis tool requires Apache Ant and Apache Ivy and has been developed and tested with JAVA 8.
You can download the Analysis Tool including the source code form this Github Repository
The Analysis tool comes with a data folder where all Excel files have to be put in. (@ToDo: add data folder with sample file!) The first 8 columns are LimeSurvey specific columns:
- Response ID,
- Date submitted,
- Last page,
- Start language,
- Date started,
- Date last action,
- IP address,
- Referrer URL
These columns can be configured in LimeSurvey but the Analysis tool assumes that the full structures has been exported. Starting from column 9 the excel contains survey specific columns. In our case we had a survey with 50 question groups (competences) each consisting of 3 questions. While the first row displays the column headers, the second row contains the actual user responses.
Original competence rank | What is your competency with respect to Algorithm? | Comment |
---|---|---|
14 | Research - a topic that you know well and you are/have been doing research on |
Thus, in our example file a competence always spans 3 columns where the first column contains the actual rank which is hidden for the user, the second column denotes the user rating for a given competence and the third column contains user comments. In our case, the first two columns are always filled (mandatory) but comments are optional. In total the example structure comprises 50 competences which results in 150 + 8 final columns.
Start the tool with the Ant task
ant run
from the folder where the build.xml is located. The program will ask for which threshold you want to run the analysis. The threshold is related to the computation of Precision@k and Mean Average Precision and can be in the range of 0-3 refering to the three possible competence ratings Irrelevant (0), General (1), Technical (2), Research (3). Since Precision@k and MAP are binary metrics ( relevant/non-relevant), we need to define a threshold what should be considered as relevant. For instance, a threshold 0 would consider all ratings above 0 as relevant (all General, Technical, Research ratings). For an equal distribution a threshold of 1 would consider all ratings above 1 as relevant.
The tool creates a result folder and generates a new metrics file. It sorts the competences horizontally and computes the different metrics.