← Blog
Text ProcessingApril 29, 20259 min

Text Diff: How String Comparison Algorithms Work

Learn how text diff algorithms work, from Levenshtein distance to the Myers diff algorithm used in Git and modern editors.

What Is a Text Diff?

A text diff identifies the minimum set of changes needed to transform one string into another. It answers: what was added, removed, or kept the same? This is the foundation of version control systems, code review tools, and any editor that shows tracked changes.

Levenshtein Distance

The simplest measure of string similarity is edit distance — the minimum number of single-character edits (insertions, deletions, substitutions) needed to change one string into another.

Edit Distance Example

kitten → sitting 1. kitten → sitten  (substitute k→s) 2. sitten → sittin  (substitute e→i) 3. sittin → sitting (insert g) Edit distance: 3

Levenshtein distance is computed with dynamic programming in O(m×n) time where m and n are the string lengths. It is useful for spell checkers and fuzzy search but tells you the cost without showing you exactly where changes are.

The Myers Diff Algorithm

Eugene Myers published his O(ND) diff algorithm in 1986 and it remains the standard in Git, GNU diff, and most code editors. Rather than measuring cost, it finds the longest common subsequence (LCS) between two texts and reports everything outside it as a change.

The key insight: finding the LCS is equivalent to finding the shortest edit script. Myers does this by exploring an edit graph — a grid where horizontal moves represent deletions, vertical moves represent insertions, and diagonal moves represent matching characters.

Myers diff output (unified format)

- const greeting = "hello";
+ const greeting = "hello, world";
  console.log(greeting);

Line-Level vs Character-Level Diffs

Most diff tools operate at the line level first — they split both texts into arrays of lines and run the diff algorithm on those arrays. This is fast and human-readable. Within a changed line, a second pass can highlight character-level changes (the red/green inline highlights you see in GitHub).

  • Line-level diff: Fast, great for code review
  • Word-level diff: Better for prose, documents
  • Character-level diff: Exact but noisy; usually shown only within changed lines

Unified vs Side-by-Side Views

There are two main ways to present a diff:

  • Unified (inline): Both old and new lines in a single column. Deletions are marked -, additions +. Compact and easy to read in terminals.
  • Side-by-side (split): Original on the left, new version on the right. Easier to follow large changes at a glance.

Practical Use Cases

Version Control

Git stores diffs between commits as delta patches. git diff runs Myers on your working tree. git log -p shows the diff for every commit. This is how Git reconstructs any file version from the object store.

Code Review

Pull request UIs (GitHub, GitLab, Bitbucket) all run a diff between the base and head branches. Understanding diffs helps you write PR descriptions that match what reviewers will actually see.

Testing String Output

Snapshot testing tools like Jest show a diff when a snapshot changes. Reading these diffs quickly is a core developer skill — the changed lines are what you need to verify.

Document Collaboration

Google Docs, Notion, and CMS platforms show diffs for tracked changes. Legal and publishing workflows depend on accurate diff views to sign off on edits.

Whitespace and Normalization

Diff tools often offer flags for ignoring whitespace changes (git diff -w), trailing newlines, or case. These filters prevent trivial formatting changes from burying real content changes. When comparing API responses or config files, normalizing before diffing (sorting keys, formatting JSON) avoids false positives.

Common Pitfalls

  • Encoding mismatches: LF vs CRLF line endings cause every line to show as changed — use .gitattributes to normalize
  • Binary files: Standard diff algorithms don't work on binary data; use specialized tools for images or compiled artifacts
  • Large files: Myers scales as O(ND) where D is the number of differences — heavily changed large files can be slow to diff
  • Moved blocks: Most diff algorithms treat a moved block as a delete + insert; some tools (like git diff --color-moved) detect moves separately

Try it yourself

Compare two texts side-by-side with our free browser-based String Diff tool.

Open String Diff →