Text Diff: How String Comparison Algorithms Work
Learn how text diff algorithms work, from Levenshtein distance to the Myers diff algorithm used in Git and modern editors.
What Is a Text Diff?
A text diff identifies the minimum set of changes needed to transform one string into another. It answers: what was added, removed, or kept the same? This is the foundation of version control systems, code review tools, and any editor that shows tracked changes.
Levenshtein Distance
The simplest measure of string similarity is edit distance — the minimum number of single-character edits (insertions, deletions, substitutions) needed to change one string into another.
Edit Distance Example
kitten → sitting 1. kitten → sitten (substitute k→s) 2. sitten → sittin (substitute e→i) 3. sittin → sitting (insert g) Edit distance: 3Levenshtein distance is computed with dynamic programming in O(m×n) time where m and n are the string lengths. It is useful for spell checkers and fuzzy search but tells you the cost without showing you exactly where changes are.
The Myers Diff Algorithm
Eugene Myers published his O(ND) diff algorithm in 1986 and it remains the standard in Git, GNU diff, and most code editors. Rather than measuring cost, it finds the longest common subsequence (LCS) between two texts and reports everything outside it as a change.
The key insight: finding the LCS is equivalent to finding the shortest edit script. Myers does this by exploring an edit graph — a grid where horizontal moves represent deletions, vertical moves represent insertions, and diagonal moves represent matching characters.
Myers diff output (unified format)
- const greeting = "hello";
+ const greeting = "hello, world";
console.log(greeting);Line-Level vs Character-Level Diffs
Most diff tools operate at the line level first — they split both texts into arrays of lines and run the diff algorithm on those arrays. This is fast and human-readable. Within a changed line, a second pass can highlight character-level changes (the red/green inline highlights you see in GitHub).
- Line-level diff: Fast, great for code review
- Word-level diff: Better for prose, documents
- Character-level diff: Exact but noisy; usually shown only within changed lines
Unified vs Side-by-Side Views
There are two main ways to present a diff:
- Unified (inline): Both old and new lines in a single column. Deletions are marked
-, additions+. Compact and easy to read in terminals. - Side-by-side (split): Original on the left, new version on the right. Easier to follow large changes at a glance.
Practical Use Cases
Version Control
Git stores diffs between commits as delta patches. git diff runs Myers on your working tree. git log -p shows the diff for every commit. This is how Git reconstructs any file version from the object store.
Code Review
Pull request UIs (GitHub, GitLab, Bitbucket) all run a diff between the base and head branches. Understanding diffs helps you write PR descriptions that match what reviewers will actually see.
Testing String Output
Snapshot testing tools like Jest show a diff when a snapshot changes. Reading these diffs quickly is a core developer skill — the changed lines are what you need to verify.
Document Collaboration
Google Docs, Notion, and CMS platforms show diffs for tracked changes. Legal and publishing workflows depend on accurate diff views to sign off on edits.
Whitespace and Normalization
Diff tools often offer flags for ignoring whitespace changes (git diff -w), trailing newlines, or case. These filters prevent trivial formatting changes from burying real content changes. When comparing API responses or config files, normalizing before diffing (sorting keys, formatting JSON) avoids false positives.
Common Pitfalls
- Encoding mismatches: LF vs CRLF line endings cause every line to show as changed — use
.gitattributesto normalize - Binary files: Standard diff algorithms don't work on binary data; use specialized tools for images or compiled artifacts
- Large files: Myers scales as O(ND) where D is the number of differences — heavily changed large files can be slow to diff
- Moved blocks: Most diff algorithms treat a moved block as a delete + insert; some tools (like
git diff --color-moved) detect moves separately
Try it yourself
Compare two texts side-by-side with our free browser-based String Diff tool.
Open String Diff →