DIFF - compare source files.

Syntax:

DIFF file1[,opts] file2[,opts] file3[,opts] ... [option]*
DIFF file1[,opts] cat [option]*

Global Options

(+|-)Alter (-)            (+|-)Blanks (+)
(+|-)Differences          (+|-)Escaped (-)
(+|-)Headings (-)         (+|-)LineNumbers (-)
(+|-)Matching (-)         (+|-)Nulllines (+)
(+|-)Respectcase (+)      (+|-)Verbose (+)
Bytes=N (256)             Columns=start-end;...
IGnore="chars"            Resynchronize=nlines (2)
SeParator=string          Stack=number (1000)

Local File Options

(+|-)lineNumbers (-)      +Original
Columns=start-end;...     Name=shortname

Examples:

diff /oldsource /newsource -bl
diff res=3 +verbose tempfile /permfile  >output
diff merge=file1 file2 file3 file4 
diff file1,+orig,+ln  file2,+ln  file3,name=XXX
diff file1 file2 file3 file4 col=1-72

Global Options

cat
is the name of a catalog containing a file with the same base name as "file1". DIFF compares "file1" to the file under "cat". For example,
diff user/cat1/abc user/cat2
compares the two files
user/cat1/abc
user/cat2/abc
+Alter
creates a set of alters which can be used in order to change "file1" into "file2". You cannot use this option with more than two input files. Use the SeParator= option if you want DIFF to display something other than an $ALTER card. +Alter may not be combined with +Matching or +Differences.
-Blanks
ignores multiple whitespace characters. Whitespace characters are blanks, carriage returns, tabs, vertical tabs, and rubouts. This is equivalent to an IGnore= option specifying the whitespace characters. For more information, see the "Comparison Algorithm" section below.
Bytes=N
controls the size of the major marker blocks used to divide input files into difference sections. The default is 256 bytes. For more information about resynchronization, see the section on "How DIFF Works" below.
Columns=start-end;start-end;...
.oc_ 'Columns=start:end/start:end/...' only compares selected sections within each input line. Sections are specified by a list of start and end column positions, where the "start" column must always come before the "end" column. Columns are numbered beginning at 1. You may put several Columns= options on the command line, in which case the column definitions are combined into a single list. For example,
col=1-72 col=73-80
is equivalent to
col=1-72;73-80
For more information, see "Column Sections" below.
-Differences
does not display lines that differ from one file to the next. Typically, you would use this in conjunction with +Matching.
+Differences
displays lines that differ between files. This is the default unless you specify +Matching. +Differences automatically implies -Matching unless you explicitly specify +Matching.
+Escaped
displays output in a format that uses C escape sequences to show special characters. For example, tabs are written as "\t" and unprintable characters are written as "\nnn" (using octal digits). For more information, see the "Escaped Lines" section below.
+Headings
displays a title line. This is useful when you redirect the DIFF output to a file for printing.
IGnore="chars"
specifies a set of characters to be ignored during comparisons. For more details, see the "Comparison Algorithm" section below. IGnore= can be specified more than once on the command line, in which case all the given characters are ignored. For example, IG=";" and IG="%" is equivalent to IG=";%". Also, the -Blanks option is treated as an IGnore= option for the whitespace characters.
+LineNumbers
says that the files are line numbered and that the line numbers should not enter into the comparison. A +LineNumbers option on the command line specifies a default for all the files being compared. You can override this default by putting the local -LineNumbers option on a specific file.
+Matching
displays lines that are the same, rather than lines that are different. If you want to display both lines that are the same and lines that are different, use
+Matching +Differences
+Matching automatically implies -Differences unless you explicitly specify +Differences.
-Nulllines
throws away null lines for the purpose of resynchronizing DIFF's comparisons. Null lines will still be displayed in output (as appropriate).
-Respectcase
ignores case distinctions. For example, "ABC" would be considered the same as "abc".
Resynchronize=nlines
affects how sets of differences are displayed in output. For more information, see the section on "How DIFF Works" below. The default is Resynchronize=2.
Stack=number
specifies the maximum number of lines in each file that DIFF will look at before being forced to close off a difference section at the largest available marker block. For an explanation of what this means, see the section on "How DIFF Works" below. The default stack size is 1000.
SeParator=string
uses a different header for separation rather than the usual dashes (or $ALTER card). If "string" is null, DIFF doesn't display separators or line numbers. If the option only appears once, the given string is used as the primary separator in DIFF output (and the secondary separator remains the default). If the option appears twice, as in
sep="**********" sep="%%%%%%%%%%"
the first string is used as the primary separator and the second as the secondary separator. For more on the use of separators, see the "Output Format" section below.
-Verbose
does not display any "compared equal" messages.

Local File Options

Local file options can be listed after any file name given on the command line. Options are separated with commas, as in

file1,+orig,name="ABC",col=1-16;73-80
The options are:
+lineNumbers
says that this file is line numbered and that the line numbers should not enter into the comparison.
-lineNumbers
says that this file is not line numbered. This is used when the command line contains the +LineNumbers option; the local -lineNumbers overrides the global default.
Columns=start-end;start-end;...
.oc_ 'Columns=start:end/start:end/...' specifies column sections within this particular file. Columns are numbered beginning at 1. For more information, see "Column Sections" below.
Name="string"
specifies the name that should be used for this file in DIFF's output. The default is to use the file's full name. Name= is useful if you are using DIFF in conjunction with some other program that "hides" file names.
For example, suppose you are using a source control package that checks out files and places them in temporary files with different names. If you want to compare the temporary files, you can use Name= to associate the original file names with the temporary files, making the output easier to understand.
+Original
specifies this file as the basis for comparisons with all other files. The general purpose of +Original is to determine conflicts between changed versions of the same original file. Only one file on the command line can be marked with this option. To use +Original, there must be at least three files specified on the command line. For more information, see "The Effect of +Original" section below.

Description:

The default behavior of DIFF is to run through input files, comparing lines and displaying lines where files differ. This default behavior can be changed by various options, as described in the rest of this explain file.

If all files compare equal, DIFF normally displays a message to this effect on the terminal. If the files compare equal, DIFF also turns off bit 26 of the program switch word; if they differ, DIFF turns the bit on. In either case, bits 18-25 are turned off. Turning on bits 18-25 indicates an error.

How DIFF Works

DIFF is intended to be used on files that are different versions of the same basic text. For example, you might use DIFF to compare different versions of a program's source code, or different versions of output data produced by a program.

Roughly speaking, DIFF assumes that each input file will have some sections that are the same as all the other files. The sections that are the same in all files allow DIFF to resynchronize its comparisons; in other words, DIFF looks for blocks of identical text in order to break the input files into corresponding sections, then compares the corresponding sections for differences.

For example, suppose that File1 and File2 are different versions of the same program source code. The marker blocks are made up of the lines that are the same in both files. In between the marker blocks are lines that are different in the two files. DIFF identifies the marker blocks first in order to "synch up" the structures of the files being compared. In this way, DIFF compares corresponding sections of the two files.

Ideally, DIFF would search out all possible marker blocks in the files being compared, then sift out the differences that occur in between those markers. In practice, however, this would require too much memory and processor time, especially with large files. Therefore, DIFF starts at the beginning of eeach file and looks for marker blocks that are "reasonably large". The sections of text between these large marker blocks are called "difference sections" and the large marker blocks are called "major" marker blocks.

Once DIFF has identified a difference section (bounded by major marker blocks), DIFF scans the section for progressively smaller marker blocks, until DIFF has divided the section into lines that are the same in all files and lines that are different.

The Bytes=N option controls the size of the major marker blocks used to divide the files into difference sections. Each major marker block will be a number of whole lines containing at least N bytes. A major marker block may be considerably larger than N bytes if there are a large number of lines that are the same in all files.

If you set the Bytes= size too small, you may get false resynchronizations. For example, suppose that you set Bytes=10. This means that DIFF only needs to see one or more lines with the same 10 bytes in all input files to think that it has come to the end of one set of differences. The result may be that the file comparisons get out of synch, and DIFF ends up comparing sections that don't actually correspond to each other. This creates a lot of excess output, since DIFF ends up comparing completely different sections that have nothing in common. If you find that DIFF seems to be "getting lost" in its comparisons, try increasing the size of Bytes= to make marker blocks bigger.

The Resynchronize= option affects the output that DIFF produces when performing line-by-line comparisons within a difference section. As an example, suppose Resynchronize=3 and suppose that in a particular difference section, DIFF finds two lines that are different in the files being compared. If the differences are within three lines of each other (Resynch=3), DIFF creates a single output block containing the different lines and the lines between them; if the differences are farther apart than three lines, DIFF creates two separate output blocks, one for each difference.

In other words, the Resynchronize= option has the effect of combining small sets of differences into one large set, if the original small sets are close enough together. DIFF shows you intervening lines between differences, even if those lines are the same in all files. In this way, you can get a better view of the context surrounding differences.

The Stack= option can be thought of as the maximum number of differing lines in a difference section (plus the major marker block that marks the end of the difference section). The default is 1000 lines. This means that if DIFF reads through 1000 lines from each input file, and still hasn't found a marker block that is at least N bytes long (Bytes=N), then DIFF chooses the largest possible marker block that has already been found and uses that to mark the end of the difference section.

This occurs when large input files have a lot of differences spread out throughout the files, without any reasonably large areas where all the files match each other. In this case, DIFF is forced to use smaller marker blocks; this increases the chance of false resynchronization (DIFF "getting lost" in trying to divide the files into corresponding sections). If you find DIFF getting lost, try increasing the Stack= size as well as Bytes=.

Column Sections

The Columns= option lets you restrict comparisons to selected parts of an input line. For example,

Columns=1-72
only compares input based on the first 72 characters of each line. Anything past column 72 is ignored. As another example,
Columns=1-16;73-80
only compares input based on the first 16 columns and columns 73-80.

The Columns= option can be specified as a global option, or as a local file option. The global option specifies a default for all files being compared. Local file options override the default. For example,

diff col=1-72 file1 file2 file3,col=9-80
compares columns 1 to 72 in "file1" and "file2" but columns 9-80 in "file3".

There must be the same number of column sections in each file. For example,

diff file1,col=1-72  file2,col=1-16;17-72    **ERROR**
is invalid because there is only one column section given for "file1" but two for "file2".

If there are no IGnore= characters (or -Blanks), corresponding columns must be the same length. For example,

diff col=1-72 file1 file2 file3,col=9-100    **ERROR**
is invalid because the column section in "file3" is longer than the (default) column section in "file1" and "file2". However, if there are IGnore= characters, sections do not have to be the same length.

The Columns= option can be given using various syntaxes. The following examples are all equivalent:

col=1-16;17-72;73-80
col=1:16;17:72;73:80
col=1-16/17-72/73-80
col=1:16/17:72/73:80
col=1-16,17-72,73-80
col=1:16,17:72,73:80

Column sections may overlap with each other, as in

col=1-20;16-40
However, this sort of arrangement is only useful in very limited cases.

Column sections do not have to be defined in ascending order. For example,

file1,col=1-20;21-40 file2,col=21-40;1-20
compares the first 20 characters of "file1" with the second 20 characters of "file2", and vice versa.

Comparison Algorithm

To compare a line from one file with a line from another, DIFF begins by extracting column sections from each line (as discussed in "Column Sections" above). Column sections are compared individually.

After a line has been broken into column sections, each section is reduced according to the IGnore= options. (Note that -Blanks is treated as an IGnore= option.) Every string of one or more IGnore= characters is replaced by a single special character that shows where characters have been ignored. Trailing strings of ignored characters are discarded.

As an example of how this works, suppose that you are using IGnore=" " to ignore blank characters. Then all of the following are equivalent:

"a b"
"a     b"
"a   b   "
All of these are reduced to "aXb", where X stands for the special character which shows where characters were ignored. Notice that the final result is NOT the same as "ab". IGnore= does not discard characters in the middle of strings; it simply reduces multiple occurrences to a single character.

Note that leading characters are not discarded. For example,

"   a b"
" a  b   "
both reduce to "XaXb" (where X stands for the special character). This is not the same as "aXb".

In comparing column sections, empty columns compare equal to one another. For example, suppose there is a column section running from columns 73-80; if DIFF is comparing two lines that are both shorter than 73 characters, the 73-80 column section is empty for both lines. Therefore, the lines are considered equal in that column section.

The same holds for partial column sections. For example, if the column section runs from 73-80, but both input lines end at column 74, DIFF only compares the characters in columns 73 and 74. If these characters are equal, the input lines are considered equal in that column section.

Output Format

There are four possible output formats, depending on the settings of the Matching and Differences options.

If you specify both -Matching and -Differences, you just get a summary of the number of differences between the files involved.

If you specify +Matching and -Differences, you get the lines that are found in all the files being compared. Groups of matching lines are separated into blocks: the first group of lines that are common to all files, the next group of lines that are common to all files, and so on. Blocks are separated by the primary separator. By default, the separator is

-------------------------------------------------
but this can be changed with the SeParator= option. The line containing the primary separator also contains file names and line numbers to show where the block of matching lines appears in each file, as in
---------- file1,10 file2,14 file3,5

If you specify +Differences and -Matching, you get the lines where the files differ from each other. The output is divided into blocks. Each block shows one or more lines where the input files differed. Blocks are divided into subsections, with each subsection showing the contents of one or more files. Here is a typical example of a block:

---------------------- file1,line# file3,line#
 
some lines in file1
"a same line"
 
- - - - - - - - - - -  file2,line# file4,line#
 
the equivalent (changed) lines in file2
"a same line, as above"
 
 .....
  1. The first line in the block begins with the primary separator string. The default primary separator is "---------" but you can specify a different string with the SeParator= option.
  2. On the same line is a set of file names with line numbers. All of these files have the same data in this part of the file. In the example above, "file1" and "file3" have the same contents in this part of the file.
  3. Next come some lines taken from the file(s) named on the primary separator line. These lines are the same in all the given files.
  4. Next comes a line beginning with the second separator string. The default secondary separator is "- - - -" but you can specify a different string with the SeParator= option.
  5. On the same line is a set of file names with line numbers. All of these files have the same data in this part of the file.
  6. Next come some lines taken from the file(s) named on the secondary separator line. These lines are the same in all the given files.

In summary, the block begins with a primary separator line and consists of subsections which each begin with a secondary separator line. Each subsection within the block gives a set of lines that appear in one or more files. These subsections show where the groups of files differ.

If you specify +Matched and +Differences, you get a complete listing of where the files match and where they are different. Blocks of matching lines are marked (on the end) with primary separators, without file names. Blocks of differing lines are marked in the same way as in the previous output format.

Escaped Lines

If you specify the +Escaped option on the command line, lines are displayed in the following format:

"colsect1" "colsect2" ... : original input line
The line begins with the individual column section strings that were used in the comparison. Within this string, special characters are shown as C escape sequences (e.g. \t, \033, and so on). In other words, the output shows exactly what DIFF used in the comparison. The output ends with the full original input line. This format is useful when you want to know exactly what DIFF was looking at.

Note: In this format, empty column sections are shown as a single dash, not enclosed in quotes:

"abc" "def" - : abcdef

The Effect of +Original

The purpose of +Original is to compare different versions of the same original file. A typical application would be

diff file1,+orig file2 file3
In this example, "file1" is considered the original file, while "file2" and "file3" are both modified versions of "file1". Presumably, you want to DIFF to detect conflicts between the changes made in "file2" and the changes made in "file3".

In precise terms, +Original has the following effect:

  1. When there are only two versions of a given difference section in all the files being compared, DIFF does not consider that the files conflict. In other words, DIFF does not consider this a difference that should be reported in output. If the +Matching option has been specified, DIFF outputs the modified version of the given difference section (that is, the version that was NOT in the file marked as +Original).
  2. When there are several versions of the same difference section in the files being compared, DIFF writes its normal output. This shows that there is a conflict between the original file and the versions that contain modifications. In this case, it will typically take human intervention to resolve the conflict.

The presence of +Original affects all four output formats:

-Matching -Differences
The summary of differences does NOT include differences where there are only two versions of a given line or set of lines. The summary only lists places where there are conflicts between different modifications of the original file.
+Matching -Differences
DIFF outputs all lines where the files match. Also, when there are only two versions of a given line or set of lines (the original version and the modified version), the output gives the modified version. In other words, DIFF considers this situation a "match".
+Differences -Matching
When there are only two versions of a given line or set of lines, the situation is NOT considered to be a conflict. Therefore, DIFF does not produce output showing this situation. DIFF only displays places where different modified files conflict with each other.
+Differences +Matching
This combines the output of the previous two possibilities.

Determining Conflicts Between Two Modified Files

One useful feature of DIFF is the ability to compare two sets of modifications to the same original file. For example, suppose two programmers begin with "file1" and make two different sets of changes: "file2" and "file3". The following command determines places where the two modified files conflict with each other:

diff file1,+orig file2 file3
The +Original option labels "file1" as the original. On lines where two of the three files match but the third does not, DIFF does not consider this a conflict; the difference is not reported. On lines where the three files all differ, DIFF reports the conflict.

You can also use DIFF to produce a single file that merges the changes found in the two modified files. The command line for doing this is

diff +match +diff file1,+orig file2 file3 >merged
The "merged" file contains the merged output. The +Matching and +Differences options tell DIFF to write both matching and different lines to the output. If there are conflicts between the changes in "file2" and "file3", the output file will contain blocks marked with primary separators, showing the places and ways in which the files differ.

Merge files can be created with a maximum of six input files. One and only one of the input files must be marked with +Original.

Notes:

The -Nulllines option has a number of side effects in producing output. For example, consider the situation where different files begin with different numbers of null lines, followed by a number of lines that are the same in all files. The files are considered to match each other because null lines are being ignored in the comparison. However, if you specify +Matching, DIFF must decide how many null lines to produce in the output (since the files do not agree on a specific number of null lines). In this case, DIFF follows a heuristic algorithm that may be subject to change.

When both +LineNumbers and -Nulllines are in effect, a line that only contains a line number is considered a null line (since line numbers are being ignored).

Normally, if the lines in a file contain line numbers, DIFF uses the given line numbers in its output rather than assigning line numbers of its own. If DIFF is ignoring line numbers because of +LineNumbers, and if -Nulllines is also in effect, DIFF cannot determine the line number associated with a null line. If DIFF must display information about such a null line in its output, it uses the line number found on the first non-null line after the given null line.

Copyright © 2000, Thinkage Ltd.