Skip to content
infiro edited this page May 25, 2012 · 11 revisions

About

Differ project generates the differences between two version of an input file. The input file can be any text file (UTF-8) but not binary files. The Differ returns two results - Insert object list and Delete object list.

Details

Differ is a wrapper class around Diff-match-patch library developed by Neil Fraser. The Diff Match and Patch libraries offer robust algorithms to perform the operations required for synchronizing plain text. This library implements Myer's diff algorithm which is generally considered to be the best general-purpose diff. A layer of pre-diff speedups and post-diff cleanups surround the diff algorithm, improving both performance and output quality.

Basically, the algorithm does the following:

  1. Compare two text files (character-wide or line-wide).
  2. Create list of diff objects (Equal, Insert or Delete type)
  3. Clean up the diff objects by merging all the junks to make human readable content.

Note that, we are comparing Character-By-Character not Line-by-Line, so the differ result can look like junk (not human readable). However, character-wide diff will be easier to processed later on and provide more precise output.

The remark about the diff is you can reconstruct the new version and old version based on the diff object:

  • NEW VERSION == EQUAL + INSERT
  • **OLD VERSION == EQUAL + DELETE **

Limitations

The differ results sometime are inconsistent with Git diff results, because they are probably using different algorithms. As a result, we need to be careful to work/test on Diff-match-patch result, not Git diff result.

The differ can only parse UTF-8 file(Text file) but not binary file. If the input is binary, the differ will throw exception.

AST parser provides the same characters numbers for function as in Diff-match-patch.

Clone this wiki locally