HTMLDiff is a Ruby gem that generates HTML-formatted diffs between two text strings. It can be used in your app to highlight additions, deletions, and modifications of text using HTML and CSS.
- Simple and opinionated API—it just works™.
- Generates diffs of text using the LCS (Longest Common Subsequence) algorithm.
- Diff preserves whitespace and HTML tags, HTML entities, URLs, and email addresses.
- Multi-language support (Cyrillic, Greek, Arabic, Hebrew, Chinese, Japanese, Korean, etc.)
- Customizable output formatting (see examples below).
- diffy - Far more complex and feature-rich, but less opinionated.
- diff-lcs - The underlying gem used by HTMLDiff.
Add this line to your application's Gemfile:
gem 'htmldiff'
require 'htmldiff'
old_text = "The quick red fox jumped over the dog."
new_text = "The red fox hopped over the lazy dog."
diff = HTMLDiff.diff(old_text, new_text)
Output:
The <del class="diffdel">quick </del>fox <del class="diffmod">jumped</del><ins class="diffmod">hopped</ins> over the <ins class="diffins">lazy</ins> dog.
HTMLDiff includes a highly customizable HtmlFormatter
that gives you fine-grained control over the HTML output. This formatter allows you to specify different HTML tags and CSS classes for various diff elements.
old_text = "The quick red fox jumped over the dog."
new_text = "The red fox hopped over the lazy dog."
diff = HTMLDiff.diff(old_text, new_text, html_format: {
tag: 'span',
class_delete: 'highlight removed',
class_insert: 'highlight added'
})
Output:
The <span class="highlight removed">quick </span>red fox <span class="highlight removed">jumped</span><span class="highlight added">hopped</span> over the <span class="highlight added">lazy</span> dog.
HTMLDiff.diff(html_format:)
supports the following options:
Option | Description |
---|---|
:tag |
Base HTML tag to use for all change nodes (default: none) |
:tag_delete |
HTML tag for deleted content (overrides :tag , default: "del" ) |
:tag_insert |
HTML tag for inserted content (overrides :tag , default: "ins" ) |
:tag_replace |
HTML tag for replaced content (overrides :tag_delete , :tag ) |
:tag_replace_delete |
HTML tag for deleted content in replacements (overrides :tag_replace , :tag_delete , :tag ) |
:tag_replace_insert |
HTML tag for inserted content in replacements (overrides :tag_replace , :tag_insert , :tag ) |
:tag_unchanged |
HTML tag for unchanged content (optional) |
:class |
Base CSS class(es) for all change nodes |
:class_delete |
CSS class(es) for deleted content (overrides :class ) |
:class_insert |
CSS class(es) for inserted content (overrides :class ) |
:class_replace |
CSS class(es) for replaced content (overrides :class_delete , :class_insert , :class ) |
:class_replace_delete |
CSS class(es) for deleted content in replacements (overrides :class_replace , :class_delete , :class ) |
:class_replace_insert |
CSS class(es) for inserted content in replacements (overrides :class_replace , :class_insert , :class ) |
:class_unchanged |
CSS class(es) for unchanged content (optional) |
diff = HTMLDiff.diff(old_text, new_text, html_format: {
tag_unchanged: 'span',
class_unchanged: 'unchanged',
tag: 'span',
class_delete: 'deleted',
class_insert: 'inserted'
})
Output:
<span class="unchanged">The </span><span class="deleted">quick </span><span class="unchanged">red fox </span><span class="deleted">jumped</span><span class="inserted">hopped</span><span class="unchanged"> over the </span><span class="inserted">lazy</span><span class="unchanged"> dog.</span>
diff = HTMLDiff.diff(old_text, new_text, html_format: {
tag_delete: 'span',
tag_insert: 'div',
tag_replace: 'mark',
class_delete: 'deleted',
class_insert: 'inserted',
class_replace_delete: 'replaced deleted',
class_replace_insert: 'replaced inserted'
})
Output:
The <span class="deleted">quick </span>red fox <mark class="replaced deleted">jumped</mark><mark class="replaced inserted">hopped</mark> over the <div class="inserted">lazy</div> dog.
If the HTML formatting options above aren't sufficient for your use case, or if you'd like to output to an alternative format (e.g. XML, JSON, etc.), you can further customize the output by creating your own formatter.
Your formatter may be any object that responds to the #format
method,
and it can return whatever object type you'd like (typically a String).
module MyCustomFormatter
def self.format(changes)
changes.each_with_object(+'') do |(action, old_string, new_string), content|
case action
when '=' # equal
content << new_string if new_string
when '-' # remove
content << %(<removed>#{old_string}</removed>) if old_string
when '+' # add
content << %(<added>#{new_string}</added>) if new_string
when '!' # replace
content << %(<removed>#{old_string}</removed>) if old_string
content << %(<added>#{new_string}</added>) if new_string
end
end
end
end
# Test your custom formatter
example_changes = [
['=', 'The ', 'The '],
['+', nil, 'quick '],
['=', 'red fox ', 'red fox '],
['!', 'jumped', 'hopped'],
['=', ' over the ', ' over the '],
['-', 'lazy ', nil],
['=', 'dog.', 'dog.']
]
MyCustomFormatter.format(example_changes)
#=> "The <added>quick </added>red fox <removed>jumped</removed>" \
# "<added>hopped</added> over the <removed>lazy </removed>dog."
# Use your custom formatter in the diff method
diff = HTMLDiff.diff(old_text, new_text, formatter: MyCustomFormatter)
You can customize how text is split into tokens by creating your own tokenizer.
A tokenizer can be any object that responds to the #tokenize
method and returns
an Array of Strings (i.e. the tokens).
It is useful to think of tokens as the "unsplittable" unit in your diff. For example,
if you tokenize each word ["Hello", "beautiful", "world"]
, the diff output will
never split these mid-word. However, if you tokenize each character ["H", "e", "l", "l", "o"]
,
the diff output can split words mid-character, for example, HTMLDiff.diff("Hello", "Help", tokenizer: ...)
would
return "Hel<del>lo</del><ins>p</ins>"
.
Your custom tokenizer's output array should include whitespace tokens, such that the output can be joined to match the original string.
module MyCustomTokenizer
def self.tokenize(string)
string.split(/(\b|\s)/).reject(&:empty?)
end
end
# Check that your tokenizer output matches the original string when joined
test = MyCustomTokenizer.tokenize("Hello, world!") #=> ["Hello", ",", " ", "world", "!"]
test.join #=> "Hello, world!"
# Use your custom tokenizer in the diff method
diff = HTMLDiff.diff(old_text, new_text, tokenizer: MyCustomTokenizer)
HTMLDiff uses a three-step process:
- Tokenization: The input strings are broken into an array of tokens by the
HTMLDiff::Tokenizer
module. - Diff Generation: The
HTMLDiff::Differ
module uses the LCS (Longest Common Subsequence) algorithm to find the differences between the token arrays. - Formatting: The differences are formatted into HTML by a formatter.
HTMLDiff is maintained by the team at TableCheck based in Tokyo, Japan. We use HTMLDiff in our products to help our restaurant users visualize the edit history of their customer and reservation data. If you're seeking your next career adventure, we're hiring!
Original implementation by Nathan Herald, based on an unknown Wiki article.
HTMLDiff uses the fantastic diff-lcs gem under the hood.
This project is licensed under the MIT License.