Sync with Git 2.22.1
[git/git.git] / contrib / diff-highlight / README
CommitLineData
927a13fe
JK
1diff-highlight
2==============
3
4Line oriented diffs are great for reviewing code, because for most
5hunks, you want to see the old and the new segments of code next to each
6other. Sometimes, though, when an old line and a new line are very
7similar, it's hard to immediately see the difference.
8
9You can use "--color-words" to highlight only the changed portions of
10lines. However, this can often be hard to read for code, as it loses
11the line structure, and you end up with oddly formatted bits.
12
13Instead, this script post-processes the line-oriented diff, finds pairs
14of lines, and highlights the differing segments. It's currently very
15simple and stupid about doing these tasks. In particular:
16
34d9819e
JK
17 1. It will only highlight hunks in which the number of removed and
18 added lines is the same, and it will pair lines within the hunk by
19 position (so the first removed line is compared to the first added
20 line, and so forth). This is simple and tends to work well in
21 practice. More complex changes don't highlight well, so we tend to
22 exclude them due to the "same number of removed and added lines"
23 restriction. Or even if we do try to highlight them, they end up
24 not highlighting because of our "don't highlight if the whole line
25 would be highlighted" rule.
927a13fe
JK
26
27 2. It will find the common prefix and suffix of two lines, and
28 consider everything in the middle to be "different". It could
29 instead do a real diff of the characters between the two lines and
30 find common subsequences. However, the point of the highlight is to
31 call attention to a certain area. Even if some small subset of the
32 highlighted area actually didn't change, that's OK. In practice it
33 ends up being more readable to just have a single blob on the line
34 showing the interesting bit.
35
36The goal of the script is therefore not to be exact about highlighting
37changes, but to call attention to areas of interest without being
38visually distracting. Non-diff lines and existing diff coloration is
39preserved; the intent is that the output should look exactly the same as
40the input, except for the occasional highlight.
41
42Use
43---
44
45You can try out the diff-highlight program with:
46
47---------------------------------------------
48git log -p --color | /path/to/diff-highlight
49---------------------------------------------
50
51If you want to use it all the time, drop it in your $PATH and put the
52following in your git configuration:
53
54---------------------------------------------
55[pager]
56 log = diff-highlight | less
57 show = diff-highlight | less
58 diff = diff-highlight | less
59---------------------------------------------
a0b676aa 60
bca45fbc
JK
61
62Color Config
63------------
64
65You can configure the highlight colors and attributes using git's
66config. The colors for "old" and "new" lines can be specified
67independently. There are two "modes" of configuration:
68
69 1. You can specify a "highlight" color and a matching "reset" color.
70 This will retain any existing colors in the diff, and apply the
71 "highlight" and "reset" colors before and after the highlighted
72 portion.
73
74 2. You can specify a "normal" color and a "highlight" color. In this
75 case, existing colors are dropped from that line. The non-highlighted
76 bits of the line get the "normal" color, and the highlights get the
77 "highlight" color.
78
79If no "new" colors are specified, they default to the "old" colors. If
80no "old" colors are specified, the default is to reverse the foreground
81and background for highlighted portions.
82
83Examples:
84
85---------------------------------------------
86# Underline highlighted portions
87[color "diff-highlight"]
88oldHighlight = ul
89oldReset = noul
90---------------------------------------------
91
92---------------------------------------------
93# Varying background intensities
94[color "diff-highlight"]
95oldNormal = "black #f8cbcb"
96oldHighlight = "black #ffaaaa"
97newNormal = "black #cbeecb"
98newHighlight = "black #aaffaa"
99---------------------------------------------
100
101
0c977dbc
JK
102Using diff-highlight as a module
103--------------------------------
104
105If you want to pre- or post- process the highlighted lines as part of
106another perl script, you can use the DiffHighlight module. You can
107either "require" it or just cat the module together with your script (to
108avoid run-time dependencies).
109
110Your script may set up one or more of the following variables:
111
112 - $DiffHighlight::line_cb - this should point to a function which is
113 called whenever DiffHighlight has lines (which may contain
114 highlights) to output. The default function prints each line to
115 stdout. Note that the function may be called with multiple lines.
116
117 - $DiffHighlight::flush_cb - this should point to a function which
118 flushes the output (because DiffHighlight believes it has completed
119 processing a logical chunk of input). The default function flushes
120 stdout.
121
122The script may then feed lines, one at a time, to DiffHighlight::handle_line().
123When lines are done processing, they will be fed to $line_cb. Note that
124DiffHighlight may queue up many input lines (to analyze a whole hunk)
125before calling $line_cb. After providing all lines, call
126DiffHighlight::flush() to flush any unprocessed lines.
127
128If you just want to process stdin, DiffHighlight::highlight_stdin()
129is a convenience helper which will loop and flush for you.
130
131
a0b676aa
JK
132Bugs
133----
134
135Because diff-highlight relies on heuristics to guess which parts of
136changes are important, there are some cases where the highlighting is
137more distracting than useful. Fortunately, these cases are rare in
138practice, and when they do occur, the worst case is simply a little
139extra highlighting. This section documents some cases known to be
140sub-optimal, in case somebody feels like working on improving the
141heuristics.
142
1431. Two changes on the same line get highlighted in a blob. For example,
144 highlighting:
145
146----------------------------------------------
147-foo(buf, size);
148+foo(obj->buf, obj->size);
149----------------------------------------------
150
151 yields (where the inside of "+{}" would be highlighted):
152
153----------------------------------------------
154-foo(buf, size);
155+foo(+{obj->buf, obj->}size);
156----------------------------------------------
157
158 whereas a more semantically meaningful output would be:
159
160----------------------------------------------
161-foo(buf, size);
162+foo(+{obj->}buf, +{obj->}size);
163----------------------------------------------
164
165 Note that doing this right would probably involve a set of
166 content-specific boundary patterns, similar to word-diff. Otherwise
167 you get junk like:
168
169-----------------------------------------------------
170-this line has some -{i}nt-{ere}sti-{ng} text on it
171+this line has some +{fa}nt+{a}sti+{c} text on it
172-----------------------------------------------------
173
174 which is less readable than the current output.
175
1762. The multi-line matching assumes that lines in the pre- and post-image
177 match by position. This is often the case, but can be fooled when a
178 line is removed from the top and a new one added at the bottom (or
179 vice versa). Unless the lines in the middle are also changed, diffs
180 will show this as two hunks, and it will not get highlighted at all
181 (which is good). But if the lines in the middle are changed, the
182 highlighting can be misleading. Here's a pathological case:
183
184-----------------------------------------------------
185-one
186-two
187-three
188-four
189+two 2
190+three 3
191+four 4
192+five 5
193-----------------------------------------------------
194
195 which gets highlighted as:
196
197-----------------------------------------------------
198-one
199-t-{wo}
200-three
201-f-{our}
202+two 2
203+t+{hree 3}
204+four 4
205+f+{ive 5}
206-----------------------------------------------------
207
208 because it matches "two" to "three 3", and so forth. It would be
209 nicer as:
210
211-----------------------------------------------------
212-one
213-two
214-three
215-four
216+two +{2}
217+three +{3}
218+four +{4}
219+five 5
220-----------------------------------------------------
221
222 which would probably involve pre-matching the lines into pairs
223 according to some heuristic.