Skip to content

milossramek/office-interoperability-tools

Repository files navigation

The goal of the office-interoperability-tools package is assessment of interoperability of various office applications using different office document formats.
 
The package has three parts: 
1. Batch conversion and printing of test documents. For this, one needs the bash shell and an office program to be used in testing - the rest is included in the package. The office program must provide a command line interface. 
Conversion and printing was tested on Linux and Windows

2. Evaluation and reporting. For this one needs the bash shell, Python with some installed modules. This was tested on Linux.

3. Automated bisection of interoperability errors using the LibreOffice bibisection repositories. Tested on Linux using MSOffice 2010 runnig in Wine.

-------------------------
INSTALLATION
-------------------------

Download the package and unpack it somewhere in the file system, or clone it directly from git.
Assume that name of this folder is OIT

--------
On Linux 
--------
Set environment variables which define applications to be tested on the current system by adding the following lines to the $HOME/.bashrc file.  Modify them according to the actual situation. Comment out unwanted options or add new ones:

# use full paths
export FTPATH="$HOME/OIT"
# to use Google Docs, you need your own key
export GOOGLEDOC_PK_FILE="$HOME/.ssh/google-docs-privatekey.p12"
export GDCONVERT="$FTPATH/gdconvert/gdconvert"
export LO52PROG="/opt/libreoffice5.2/program/soffice"
export LO51PROG="/opt/libreoffice5.1/program/soffice"
export LO50PROG="/opt/libreoffice5.0/program/soffice"
export LO44PROG="/opt/libreoffice4.4/program/soffice"
export LO43PROG="/opt/libreoffice4.3/program/soffice"
export LO42PROG="/opt/libreoffice4.2/program/soffice"
export LO41PROG="/opt/libreoffice4.1/program/soffice"
export LO40PROG="/opt/libreoffice4.0/program/soffice"
export LO36PROG="/opt/libreoffice3.6/program/soffice"
export LO35PROG="/opt/libreoffice3.5/program/soffice"
export LO33PROG="/opt/libreoffice3.3/program/soffice"
export WINEPROG="/usr/bin/wine"

#these are specific settings for OO/AO, which should be run in a server mode
export OO33PORT=8133
export OO33PROG="$FTPATH/scripts/DocumentConverter.py"
export OO33PATH="/opt/openoffice.org33/program"
export AO34PORT=8134
export AO34PROG="$FTPATH/scripts/DocumentConverter.py"
export AO34PATH="/opt/openoffice.org3/program"
export AO40PORT=8140
export AO40PROG="$FTPATH/scripts/DocumentConverter.py"
export AO40PATH="/opt/openoffice40/program"
export AO41PORT=8141
export AO41PROG="$FTPATH/scripts/DocumentConverter.py"
export AO41PATH="/opt/openoffice41/program"


export AWORDPROG="/usr/bin/abiword"
export CWORDSPROG="/usr/bin/calligrawords"

# required only for bisection of LO interoperability bugs 
export LO_BISECT_PATH="$HOME/LOBisect"

----------
On Windows
----------
Install bash from the CygWin package. 
Include the following (or similar) to the .bashrc file:

export FTPATH=/cygdrive/e/OIT
export MS13PROG=$FTPATH/OfficeConvert/OfficeConvert.exe

Replace MS10 by MS07 or MS13 or anything else.
You can use only one MSOffice program on one system.
The OfficeConvert.exe program was taken from http://code.officeshots.org/trac/officeshots/browser/trunk/OfficeConvert. It was recompiled and support for one additional format was added. 

Comment: LO is in Windows installed with path, where directory names contain spaces ("Program Files"). These scripts cannot work with such paths. Workaround: create link within the cygwin space, for example in your home:
ls -s  /cygdrive/c/Program\ Files\ \(x86\)/LibreOffice\ 4/program/LibreOffice43
and in .bashrc add 
export LO43WIN=/home/xxx/LibreOffice43/soffice.exe

Comment: If an application can directly print and convert files from a command line, it is listed there directly (LO41PROG), if not, a helper application is listed (AO34PROG, MS13PROG)

Comment: In the tested setup, Windows was installed in a virtual machine with access to the Linux file system. 
Thus, only one instance of software and test files existed, with different configuration files in each system. 
This saved a lot of work and mess caused by eventual copying of files.

----------------------------------
Evaluation and creation of reports
----------------------------------
This part runs only on Linux (instructions for Ubuntu, Mint and similar):

1. install the required tools by
sudo apt-get install python-pip python-setuptools python-numpy python-scipy python-opencv python-ipdb libpython2.7-dev libjpeg-dev libjpeg-dev libz-dev libtiff5-dev libfreetype6-dev exiftool (maybe some others). Do not install python-tiffile 
2.  Enter OIT/scripts and run
sudo python setup.py develop
This will install also a few additional python packages 

-----
SETUP
-----
Properties of various office applications are defined in the officeconf.sh file.
Modification is needed only if you add a new application.

-------
Testing
-------

The tests require a special folder structure, see the 'config.sh' script in the archives in folder 'roundtrip'

To create a new test:
1. create a a new subfolder in the roundtrip folder
2. copy config.sh from another subfolder
3. add test files

A set of applications (rtripapps in config.sh) can be tested at once in respect to one source application 
(sourceapp in config.sh)

Names of the applications are specified in officeconf.sh

Instructions how to run the necessary scripts can be found in comments of the config.sh files.
The convall.sh script will run the 'rtripapps' applications, the printall.sh scrip will run the 'sourceapp' application.
Both should be run on the corresponding system, the  convall.sh maybe on both (depending which applications are tested); 

WARNING: During conversion and printing no other instance of LO and AOO can run

The remaining scripts should by run on Linux.

-------
Results
-------

The genods.py script creates a report spreadsheet in an ods file with two sheets, one for print tests and roundtrip tests.  Each file and each application is graded by four grades in the range 0 (pixel identical result) to 5 (very different), 6 (created pdf was empty) and 7 (conversion failed).  The grades are color coded (green-red scale), if all grades are below 3, they are in blue.
Meaning of grades can be found in column headers.

The print test: Input file is printed by the tested application (LO) to pdf, which is subsequently compared to pdf printed by the source application (MSO).

The roundtrip test: Input file is loaded by the tested application (LO) and stored in the same format. 
This file is then opened by the source application (MSO) and printed to pdf.  These two pdfs are compared.

Four different views are generated for each test:
- side-by-side view (files xxx-s.pdf)
- page overlay with no alignment (files xxx-p.pdf)
- page overlay with verically alignned lines (files xxx-l.pdf)
- page overlay with verically and horizontally alignned lines (files xxx-z.pdf)

One can open these files directly from the spreadsheed by clicking on cells with the ">" character.  


-------------
Document rank
-------------

File roundtrip/gtagfreq.pickle contains information how often individual tags occur in documents. This information was extracted from about 1600 docx document downloaded from internet.

It can be used by the genods script to add this information to the report:
1. Create ranks.csv file with list of used tags for each tested document by
docxtags.py -r path_to/gtagfreq.pickle path_to/*.docx > ranks.csv 
2. Use it to create report:
../genods.py -i all.csv -o rslt.ods  -r rank.csv
Information about the file rank (i.e. frequency of the least frequent tag in the document) and list of tags will be added at the end of each row. The tags are sorted by their decreasing occurence frequency.

 
-------------------
Automated bisection
-------------------

Instructions can be found in file Readme.bibisecting

About

Tools for evaluation of interoperability of office packages

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published