DIFF file1[,opts] file2[,opts] file3[,opts] ... [option]* DIFF file1[,opts] cat [option]*
(+|-)Alter (-) (+|-)Blanks (+) (+|-)Differences (+|-)Escaped (-) (+|-)Headings (-) (+|-)LineNumbers (-) (+|-)Matching (-) (+|-)Nulllines (+) (+|-)Respectcase (+) (+|-)Verbose (+) Bytes=N (256) Columns=start-end;... IGnore="chars" Resynchronize=nlines (2) SeParator=string Stack=number (1000)
(+|-)lineNumbers (-) +Original Columns=start-end;... Name=shortname
diff /oldsource /newsource -bl diff res=3 +verbose tempfile /permfile >output diff merge=file1 file2 file3 file4 diff file1,+orig,+ln file2,+ln file3,name=XXX diff file1 file2 file3 file4 col=1-72
diff user/cat1/abc user/cat2compares the two files
user/cat1/abc user/cat2/abc
col=1-72 col=73-80is equivalent to
col=1-72;73-80For more information, see "Column Sections" below.
+Matching +Differences+Matching automatically implies -Differences unless you explicitly specify +Differences.
sep="**********" sep="%%%%%%%%%%"the first string is used as the primary separator and the second as the secondary separator. For more on the use of separators, see the "Output Format" section below.
Local file options can be listed after any file name given on the command line. Options are separated with commas, as in
file1,+orig,name="ABC",col=1-16;73-80The options are:
The default behavior of DIFF is to run through input files, comparing lines and displaying lines where files differ. This default behavior can be changed by various options, as described in the rest of this explain file.
If all files compare equal, DIFF normally displays a message to this effect on the terminal. If the files compare equal, DIFF also turns off bit 26 of the program switch word; if they differ, DIFF turns the bit on. In either case, bits 18-25 are turned off. Turning on bits 18-25 indicates an error.
DIFF is intended to be used on files that are different versions of the same basic text. For example, you might use DIFF to compare different versions of a program's source code, or different versions of output data produced by a program.
Roughly speaking, DIFF assumes that each input file will have some sections that are the same as all the other files. The sections that are the same in all files allow DIFF to resynchronize its comparisons; in other words, DIFF looks for blocks of identical text in order to break the input files into corresponding sections, then compares the corresponding sections for differences.
For example, suppose that File1 and File2 are different versions of the same program source code. The marker blocks are made up of the lines that are the same in both files. In between the marker blocks are lines that are different in the two files. DIFF identifies the marker blocks first in order to "synch up" the structures of the files being compared. In this way, DIFF compares corresponding sections of the two files.
Ideally, DIFF would search out all possible marker blocks in the files being compared, then sift out the differences that occur in between those markers. In practice, however, this would require too much memory and processor time, especially with large files. Therefore, DIFF starts at the beginning of eeach file and looks for marker blocks that are "reasonably large". The sections of text between these large marker blocks are called "difference sections" and the large marker blocks are called "major" marker blocks.
Once DIFF has identified a difference section (bounded by major marker blocks), DIFF scans the section for progressively smaller marker blocks, until DIFF has divided the section into lines that are the same in all files and lines that are different.
The Bytes=N option controls the size of the major marker blocks used to divide the files into difference sections. Each major marker block will be a number of whole lines containing at least N bytes. A major marker block may be considerably larger than N bytes if there are a large number of lines that are the same in all files.
If you set the Bytes= size too small, you may get false resynchronizations. For example, suppose that you set Bytes=10. This means that DIFF only needs to see one or more lines with the same 10 bytes in all input files to think that it has come to the end of one set of differences. The result may be that the file comparisons get out of synch, and DIFF ends up comparing sections that don't actually correspond to each other. This creates a lot of excess output, since DIFF ends up comparing completely different sections that have nothing in common. If you find that DIFF seems to be "getting lost" in its comparisons, try increasing the size of Bytes= to make marker blocks bigger.
The Resynchronize= option affects the output that DIFF produces when performing line-by-line comparisons within a difference section. As an example, suppose Resynchronize=3 and suppose that in a particular difference section, DIFF finds two lines that are different in the files being compared. If the differences are within three lines of each other (Resynch=3), DIFF creates a single output block containing the different lines and the lines between them; if the differences are farther apart than three lines, DIFF creates two separate output blocks, one for each difference.
In other words, the Resynchronize= option has the effect of combining small sets of differences into one large set, if the original small sets are close enough together. DIFF shows you intervening lines between differences, even if those lines are the same in all files. In this way, you can get a better view of the context surrounding differences.
The Stack= option can be thought of as the maximum number of differing lines in a difference section (plus the major marker block that marks the end of the difference section). The default is 1000 lines. This means that if DIFF reads through 1000 lines from each input file, and still hasn't found a marker block that is at least N bytes long (Bytes=N), then DIFF chooses the largest possible marker block that has already been found and uses that to mark the end of the difference section.
This occurs when large input files have a lot of differences spread out throughout the files, without any reasonably large areas where all the files match each other. In this case, DIFF is forced to use smaller marker blocks; this increases the chance of false resynchronization (DIFF "getting lost" in trying to divide the files into corresponding sections). If you find DIFF getting lost, try increasing the Stack= size as well as Bytes=.
The Columns= option lets you restrict comparisons to selected parts of an input line. For example,
Columns=1-72only compares input based on the first 72 characters of each line. Anything past column 72 is ignored. As another example,
Columns=1-16;73-80only compares input based on the first 16 columns and columns 73-80.
The Columns= option can be specified as a global option, or as a local file option. The global option specifies a default for all files being compared. Local file options override the default. For example,
diff col=1-72 file1 file2 file3,col=9-80compares columns 1 to 72 in "file1" and "file2" but columns 9-80 in "file3".
There must be the same number of column sections in each file. For example,
diff file1,col=1-72 file2,col=1-16;17-72 **ERROR**is invalid because there is only one column section given for "file1" but two for "file2".
If there are no IGnore= characters (or -Blanks), corresponding columns must be the same length. For example,
diff col=1-72 file1 file2 file3,col=9-100 **ERROR**is invalid because the column section in "file3" is longer than the (default) column section in "file1" and "file2". However, if there are IGnore= characters, sections do not have to be the same length.
The Columns= option can be given using various syntaxes. The following examples are all equivalent:
col=1-16;17-72;73-80 col=1:16;17:72;73:80 col=1-16/17-72/73-80 col=1:16/17:72/73:80 col=1-16,17-72,73-80 col=1:16,17:72,73:80
Column sections may overlap with each other, as in
col=1-20;16-40However, this sort of arrangement is only useful in very limited cases.
Column sections do not have to be defined in ascending order. For example,
file1,col=1-20;21-40 file2,col=21-40;1-20compares the first 20 characters of "file1" with the second 20 characters of "file2", and vice versa.
To compare a line from one file with a line from another, DIFF begins by extracting column sections from each line (as discussed in "Column Sections" above). Column sections are compared individually.
After a line has been broken into column sections, each section is reduced according to the IGnore= options. (Note that -Blanks is treated as an IGnore= option.) Every string of one or more IGnore= characters is replaced by a single special character that shows where characters have been ignored. Trailing strings of ignored characters are discarded.
As an example of how this works, suppose that you are using IGnore=" " to ignore blank characters. Then all of the following are equivalent:
"a b" "a b" "a b "All of these are reduced to "aXb", where X stands for the special character which shows where characters were ignored. Notice that the final result is NOT the same as "ab". IGnore= does not discard characters in the middle of strings; it simply reduces multiple occurrences to a single character.
Note that leading characters are not discarded. For example,
" a b" " a b "both reduce to "XaXb" (where X stands for the special character). This is not the same as "aXb".
In comparing column sections, empty columns compare equal to one another. For example, suppose there is a column section running from columns 73-80; if DIFF is comparing two lines that are both shorter than 73 characters, the 73-80 column section is empty for both lines. Therefore, the lines are considered equal in that column section.
The same holds for partial column sections. For example, if the column section runs from 73-80, but both input lines end at column 74, DIFF only compares the characters in columns 73 and 74. If these characters are equal, the input lines are considered equal in that column section.
There are four possible output formats, depending on the settings of the Matching and Differences options.
If you specify both -Matching and -Differences, you just get a summary of the number of differences between the files involved.
If you specify +Matching and -Differences, you get the lines that are found in all the files being compared. Groups of matching lines are separated into blocks: the first group of lines that are common to all files, the next group of lines that are common to all files, and so on. Blocks are separated by the primary separator. By default, the separator is
-------------------------------------------------but this can be changed with the SeParator= option. The line containing the primary separator also contains file names and line numbers to show where the block of matching lines appears in each file, as in
---------- file1,10 file2,14 file3,5
If you specify +Differences and -Matching, you get the lines where the files differ from each other. The output is divided into blocks. Each block shows one or more lines where the input files differed. Blocks are divided into subsections, with each subsection showing the contents of one or more files. Here is a typical example of a block:
---------------------- file1,line# file3,line# some lines in file1 "a same line" - - - - - - - - - - - file2,line# file4,line# the equivalent (changed) lines in file2 "a same line, as above" .....
In summary, the block begins with a primary separator line and consists of subsections which each begin with a secondary separator line. Each subsection within the block gives a set of lines that appear in one or more files. These subsections show where the groups of files differ.
If you specify +Matched and +Differences, you get a complete listing of where the files match and where they are different. Blocks of matching lines are marked (on the end) with primary separators, without file names. Blocks of differing lines are marked in the same way as in the previous output format.
If you specify the +Escaped option on the command line, lines are displayed in the following format:
"colsect1" "colsect2" ... : original input lineThe line begins with the individual column section strings that were used in the comparison. Within this string, special characters are shown as C escape sequences (e.g. \t, \033, and so on). In other words, the output shows exactly what DIFF used in the comparison. The output ends with the full original input line. This format is useful when you want to know exactly what DIFF was looking at.
Note: In this format, empty column sections are shown as a single dash, not enclosed in quotes:
"abc" "def" - : abcdef
The purpose of +Original is to compare different versions of the same original file. A typical application would be
diff file1,+orig file2 file3In this example, "file1" is considered the original file, while "file2" and "file3" are both modified versions of "file1". Presumably, you want to DIFF to detect conflicts between the changes made in "file2" and the changes made in "file3".
In precise terms, +Original has the following effect:
The presence of +Original affects all four output formats:
One useful feature of DIFF is the ability to compare two sets of modifications to the same original file. For example, suppose two programmers begin with "file1" and make two different sets of changes: "file2" and "file3". The following command determines places where the two modified files conflict with each other:
diff file1,+orig file2 file3The +Original option labels "file1" as the original. On lines where two of the three files match but the third does not, DIFF does not consider this a conflict; the difference is not reported. On lines where the three files all differ, DIFF reports the conflict.
You can also use DIFF to produce a single file that merges the changes found in the two modified files. The command line for doing this is
diff +match +diff file1,+orig file2 file3 >mergedThe "merged" file contains the merged output. The +Matching and +Differences options tell DIFF to write both matching and different lines to the output. If there are conflicts between the changes in "file2" and "file3", the output file will contain blocks marked with primary separators, showing the places and ways in which the files differ.
Merge files can be created with a maximum of six input files. One and only one of the input files must be marked with +Original.
The -Nulllines option has a number of side effects in producing output. For example, consider the situation where different files begin with different numbers of null lines, followed by a number of lines that are the same in all files. The files are considered to match each other because null lines are being ignored in the comparison. However, if you specify +Matching, DIFF must decide how many null lines to produce in the output (since the files do not agree on a specific number of null lines). In this case, DIFF follows a heuristic algorithm that may be subject to change.
When both +LineNumbers and -Nulllines are in effect, a line that only contains a line number is considered a null line (since line numbers are being ignored).
Normally, if the lines in a file contain line numbers, DIFF uses the given line numbers in its output rather than assigning line numbers of its own. If DIFF is ignoring line numbers because of +LineNumbers, and if -Nulllines is also in effect, DIFF cannot determine the line number associated with a null line. If DIFF must display information about such a null line in its output, it uses the line number found on the first non-null line after the given null line.
Copyright © 2000, Thinkage Ltd.