Thinkage Ltd.
85 McIntyre Drive
Kitchener, Ontario
Canada N2R 1H6
Copyright © 2008 by Thinkage Ltd.
1. General Principles 1.1 The Purpose of LD 1.1.1 Reading Object Code 1.1.2 Linking and Editing Object Code 1.1.3 Writing Object or Executable Code 1.2 Relocation 1.3 Standard Record Format 1.4 Directive Types 1.5 Dvalues 1.6 Other Terminology 1.7 Notation 1.8 Continuation Directives 1.9 Extended Records 1.10 The End of a Module 2. Segments 2.1 Predefined Segments 2.2 Reference Numbers (SEGREFs) 2.3 Creating New Segments 2.4 Naming Locations in Segments 2.5 Alignment 2.6 Segment Information 2.7 Symbol References 2.8 System-Dependent Characteristics 2.9 Other Segment Manipulation Directives 3. Control Directives 3.1 LC_LDVERSION 3.2 LC_FILENAME 3.3 LC_REVISION 3.4 LC_CPR 3.5 LC_TITLE 3.6 LC_MODULE 3.7 LC_TARGET_INFO 3.8 LC_LIB_HEADER 3.9 LC_PATCH 3.10 LC_PATCH_RECORD 4. Data 4.1 Data Definitions 4.1.1 The Relocation Origin 4.1.2 Relocation Codes 4.1.3 In Practice 4.2 Bit Data 4.3 Literals 4.3.1 Creating Literals 4.3.2 Creating a Literal Pool 5. Debugger Directives 5.1 Defining Variable Lists 5.2 Specifying Type Information 5.3 Pointer Types 5.3.1 Type Qualifiers 5.3.2 Function Types 5.3.3 Array Types 5.3.4 Structure Types 5.3.5 Bit Fields 5.3.6 Union and Enumerated Types 5.3.7 Typedef Declarations 5.4 Describing Symbols 5.5 Names with Associated Scopes 5.6 Setting Scope Attributes 5.7 Line Numbers 5.8 Debug Directive Summary 6. Final Object Format 6.1 The Marker Directive 7. Object Libraries 7.1 Library Length 7.2 Library Index Directives 7.2.1 Number of Modules 7.2.2 Module Location 7.2.3 Specifying Names 7.2.4 Index Entries 7.2.5 Module Information 8. Introduction to Run-Units 8.1 Glossary 9. Parts of an RU 9.1 The Body of an RU 9.1.1 Information in the Body 9.2 The External Profile 9.3 The Internal Profile 10. Program Invocation 11. Sharing 11.1 Sharing Subpartitions 11.2 Sharing Entries 11.3 Shared Library Units 11.3.1 Library Versions 11.3.2 Using an SLU 11.3.3 Building an SLU 11.3.4 Naming Routines 12. RU Libraries 13. RU Format 13.1 LD Directives 13.2 The Instantiation Process 14. Dynamic Linking and Demand Segmentation 14.1 Dynamic Linking 15. LD Directives for Run-Units 15.1 RU_REFER 15.2 RU_PARTITION 15.2.1 Possible Partition Flags 15.3 RU_SUBPARTITION 15.4 RU_EXPORT 15.5 RU_ENTRY 15.6 RU_PRIMARY_ENTRY 15.7 RU_DATA 15.8 RU_RELOC 15.9 RU_LOCATOR Appendix A: Summary of Directives A.1 RU Directives Appendix B: Deprecated Constructs B.1 LC_SECONDARY Appendix C: Future Directions C.1 LD_DELETE C.2 Needed Enhancements Appendix D: Target Machine Dependencies D.1 Bull HN DPS-8 Family D.1.1 Relocation Codes D.1.2 DPS-8 Symbol Options Appendix E: LD Utility Routines E.1 General Concepts E.2 File Manipulation E.2.1 Open an LD Output File E.2.2 Open an LD Input File E.2.3 Conditionally Open an LD Input File E.2.4 Open an LD Library for Updating E.2.5 Close LD Output File E.2.6 Close LD Input File E.2.7 Close an LD Library E.2.8 Change Library to Write Mode E.2.9 Change Library to Read Mode E.2.10 Change Output File to Input File E.2.11 Reposition Input File E.2.12 Change Position in LD Output File E.2.13 Change Position in Input File E.2.14 Obtain Current Position in Input File E.2.15 Obtain Current Position in Output File E.3 Building Output Directives E.3.1 Start Building a Directive for Output E.3.2 Close Off Output Directive E.3.3 Write a TWORD E.3.4 Write a Byte E.3.5 Write Dvalue E.3.6 Write Block of Data E.3.7 Write String E.3.8 Write a ULONG E.3.9 Write a Time Value E.4 Generating Data E.4.1 Write Out Data for LD_DATA E.4.2 Write Relocation Information E.4.3 Flush Data and Relocation Information E.4.4 Flush and Close Literal` E.5 Reading from the Input File E.5.1 Read Directive from Input File E.5.2 Read a Byte E.5.3 Read a ULONG from LD Input File E.5.4 Read a Block of Data E.5.5 Read a String from LD Input File E.5.6 Read Static String from LD Input File E.5.7 Read VString from LD Input File E.5.8 Read a Dvalue from LD Input File E.5.9 Read a TWORD from LD Input File E.5.10 Read a Time from LD Input File E.5.11 Obtain Length of Remaining Record E.5.12 Test for End of Directive E.6 Miscellaneous Routines E.6.1 Copy Data from Input File to Output File Appendix F: Output Formats F.1 LD Format (Bull HN DPS-8) F.2 RU Format (Bull HN DPS-8, GCOS8 NS Mode) F.3 OM Format (Bull HN DPS-8, GCOS8 NS Mode) Appendix G: Outstanding Issues G.1 Sharing Subpartitions and Entries G.2 Debugging Breakpoints G.3 Dynamic Linking
LD object code is a stream of bytes. Difficulties immediately arise if the size of a byte on the host machine is not the same as the size on the target machine. Therefore each byte in the object file only uses the number of bits that can be contained in the smaller byte size of the two machines.
For example, suppose the host machine has 9-bit bytes while the target machine has 8-bit bytes. The LD object code on the host machine will only use the (low order) eight bits of every byte; the uppermost bit is always ignored. This avoids complications if the object file is shipped to the target machine.
From this point onward, the byte size of an object file should be thought of as the number of significant bits in a byte.
Large programs are easier to work with if they are split over several source files. Such files may be compiled or assembled separately, producing an object file for each source file. When all source files have been compiled, the resulting object files may be linked together to form the complete program. Later in the chapter, we'll discuss this linking process in more detail.
An object library is a single file containing a number of object files, stored in a convenient and compact way. Libraries reduce the amount of disk space needed to store a program's object code and simplify the task of linking the program (since you only have to specify a single library file rather than a lot of separate object files).
The LD program has three phases:
We'll examine each phase in detail.
LD can read several different kinds of object code. It can read a single object file stored in the LD object format that is described in this document; and it can read an object library containing LD object code (usually called an LD library). The LD object format is system- independent.
LD can also read system-dependent object formats. The GCOS8 version of LD can read so-called OM files, OM libraries, B* files, and B* libraries. The MARK III version of LD can read MARK III object code and libraries.
As LD reads object code, it converts the code to the LD object format if the code is not already in that form. LD can read many input files and libraries as input. These are specified on the LD command line as in
ld file1 file2 lib1 lib2 ...
LD reads the entire contents of such files and puts them together into a single large unit that is passed on to the next phase of the LD program.
Object code contains several types of information. Typically, it will contain executable code and descriptions of data. It may also contain debugging information. The linking and editing phase of LD collects object code from object files and libraries and merges it into a coherent whole.
For example, compilers tend to create debugging information "on the fly", as the source code is compiled. As a result, debugging information is scattered throughout the whole object file. The linking and editing phase of LD can gather the scattered debugging information and store it all together. This makes the information easier to find later on.
The linking and editing phase also does as much relocation as it can. Relocation is discussed in more detail later in this chapter.
The result of the linking and editing phase is LD object code in a more coherent format, with all of the input object code merged into a single unit. This unit will be written out in the third phase.
The third phase of LD writes out object code or an executable program. LD can write a single object file stored in the LD object format that is described in this document; and it can write an LD object library.
LD can also write system-dependent object and executable formats. The GCOS8 version of LD can write bound OM files, OM libraries, B* files, B* libraries, Q* files, and GCOS8 run-units. The MARK III version of LD can write MARK III object code and libraries, and MARK III run-units.
LD object code consists of symbol definitions (SYMDEFs) and symbol references (SYMREFs). For example, if a source file defines a subprogram named X, the object code that results from compiling that source file will contain a SYMDEF for X. If a source file contains a call to the subprogram X, the object code that results from compiling that source file will contain a SYMREF to X.
A typical object file has many SYMREFs to symbols not found in the file: a SYMREF for every external subprogram and variable that the code uses. Such SYMREFs are said to be unresolved, because they refer to symbols whose location is not currently known.
When a program is loaded for execution, the program loader chooses where each symbol will be placed in the computer's memory. The loader has a great deal of freedom in doing this: for example, the independent subprograms of a program may be arranged in any order. As a result of this freedom, the memory location of a symbol can only be determined when the program is loaded.
Many machine instructions in the executable code make use of memory locations. For example, the process of calling a subprogram must contain a machine instruction which jumps to the beginning of that subprogram. Since the location of the subprogram can't be determined until the program is loaded, the correct form of the jump machine instruction cannot be determined either. The object code can only contain a partial form of the jump machine instruction, with the understanding that the location of the subprogram will be filled into the partial instruction when the program is loaded.
Object code therefore contains many partial machine instructions. Object code must also contain any information that the program loader needs to allow the machine instruction to be completed. This information almost always involves a SYMREF.
For example, consider the "jump to subprogram" machine instruction that we have been discussing. The loader needs the partial jump instruction and a SYMREF naming the subprogram to which the jump instruction jumps. When the program loader loads subprogram X into memory, the loader can then look for SYMREFs to X, and fill in the true address of X into partial machine instructions that need it.
This process of completing partial machine instructions with memory locations is called relocation. The relocation process typically uses the following pieces of information:
The relocation code provides more information about the relocation process. For example, the code indicates the format of the address that should be supplied (e.g. byte address or word address) and where the address should be placed in the partial machine instruction (e.g. top half of a machine word or bottom half). The relocation codes that may be specified correspond to the ways in which addresses may be used in machine instructions.
Data objects may also require relocation. For example, if a variable is initialized to hold a pointer to another object, the correct pointer value cannot be determined until the program is loaded. The object code will contain all the information required to calculate the correct pointer value; in place of the true pointer, the object code will either have some sort of partial pointer (if this makes sense in machine architecture) or a simple placeholder.
Relocation codes are machine-dependent. Appendix D specifies the relocation codes recognized on each machine where LD is implemented.
The bytes in an object file are organized into variable length records. Each record represents a loader directive.
The standard record format is
+-----+-----+----------------------------+-----+ | DIR | LEN | DATA | CHK | +-----+-----+----------------------------+-----+
DIR is a byte that indicates what kind of directive the record is. In all current directives, this byte contains a printable ASCII character and different directives use different ASCII character identifiers.
LEN is a byte indicating the length of the DATA field. This length is given in bytes.
DATA is the data for the directive. The maximum length for DATA is the maximum number that can be represented in the byte LEN (255 if bytes are eight bits, 511 if bytes are nine bits). If a directive is longer than this because it contains a long string, it must be represented as in the long directive format or carried over onto a continuation record (as described at the end of this chapter).
The final byte of the directive is a checksum for the record (CHK). This is the exclusive OR (XOR) of all the bytes that precede CHK in the record (including DIR and LEN).
Each type of directive has a name beginning with LD_. For example, LD_BEGIN is name of the directive that usually marks the beginning of an object module. These names will be used throughout this document. Appendix A summarizes the various directives and their contents.
The loader reads the object file one byte at a time, figures out what each directive is and how to handle it, and verifies that the checksum is correct. The output produced by the loader is object code in the format that is required by the system under which the compiler is running.
The DATA field of a directive often contains several arguments for the directive. Thus there must be a way of indicating where one argument ends and the next begins. This is often done by expressing the argument in Dvalue format.
The Dvalue format represents numeric values with a series of consecutive bytes that have their high order bit off (0), plus a final byte with its high order bit on (1). For example, if an object file has bytes that are eight (significant) bits long, arguments will be broken up into seven-bit chunks and stored in bytes that have the high order bit off. The high order bit on the last seven- bit chunk is turned on to indicate the end of the Dvalue.
Chunks are specified in the DATA with the least significant one first; thus an argument having the binary configuration
0110000001
would be broken up into seven-bit chunks as
011 0000001
and then represented as
00000001 10000011
The lower chunk comes first with its high order bit set to 0. The higher chunk comes last with its high order bit set to 1.
When an argument has been reconstructed from its seven-bit chunks, the argument's high order bit should be propagated to the proper alignment boundary. This is the usual sign extension process. Thus the 32-bit integer for -1 can be represented by the single byte
11111111 -- eight ones
The high order 1 indicates that this is the only byte in the Dvalue. The argument is therefore
1111111 -- seven ones
and the high order bit of this is propagated out to 32 bits.
Because of sign extension process, the Dvalue form is often shorter than the full-length argument it represents. On the other hand, the Dvalue can also be slightly longer. For example, suppose we wanted to represent the hexadecimal number 7F consisting of seven ones. This has to be done in two bytes.
01111111 10000000
The first byte gives the desired value; the second byte is necessary so that sign extension will propagate zeroes instead of ones.
On machines where bytes have nine bits instead of eight (e.g. the Bull HN DPS-8), the principle for creating a Dvalue is much the same. In this case, the value to be represented is broken up into eight-bit chunks. The ninth bit is 1 in the last byte of the Dvalue and zero in all the preceding bytes. Again, the high order bit of the last eight-bit chunk is propagated out to the appropriate length.
In addition to Dvalue, several other terms will be used frequently throughout this document.
The DATA field of certain directives can contain symbol names expressed as ASCII strings. By an ASCII string, we simply mean a sequence of ASCII characters. The number of characters in any ASCII string is usually determined from the LEN field of the directive that contains the string -- unlike strings in C, there is no \0 byte to mark the end of the string.
A ULONG number is a binary integer written as a sequence of four bytes. Only the bottom seven bits of each byte are significant. Thus a ULONG value is actually a 28-bit number, made up of the four 7-bit numbers joined together from left to right. The high order bit of this 28-bit number is propagated out to the left to get a (long) integer for the appropriate machine (usually 32 or 36 bits).
A time value represents a date and time. Such a value is made from two ULONG values. The first ULONG gives a number of days from January 1, 1900. The second ULONG gives a number of milliseconds from midnight on the day in question.
A TWORD value represents the smallest chunk of memory to which relocation can be applied on the target machine. TWORD values in an object file are made up of a (fixed length) sequence of bytes. The number of (significant) bits in these bytes is greater than or equal to the number of bits in the corresponding chunk of memory on the target system.
In most cases, a TWORD will represent a machine word on the target machine. A TWORD value will be made up of a sequence of bytes containing sufficient (significant) bits to match a word on the target machine.
A local name is one that should not be known outside the source module in which it appears. Separately compiled modules of the same program will not be able to refer to such an object by name. The most common type of local name that will be visible in LD code is a static variable, either within a function or outside the scope of any function. Other local data that appears in source code (e.g. auto variables and function parameters) isn't usually visible in LD object code, because such items are usually resolved as part of compilation rather than linking.
A global name is one that can be referenced by separately compiled modules. Here are some examples of global data objects:
Secondary global symbols are special global symbols that usually appear in compiled modules that are stored in object libraries. When the linker searches a library in its attempt to resolve SYMREFs, it will not see the secondary global definitions that appear in the library. However, if the linker brings in an object module that happens to contain secondary global definitions, it can then use the secondary global definitions to resolve outstanding SYMREFs.
An example will make this situation clearer. The C library contains a standard function called open for opening files. This open function is fairly large, since it is designed to deal with a wide variety of file types (terminals, disk files, tape files, random, sequential, etc.). Whenever you make a normal call to open, the linker will search through the C library, find the full-sized open function and link it into your program.
If you are only going to do I/O to and from the terminal, you don't need the full-sized version of open. You can therefore reduce memory requirements by using a stripped down version. To get this stripped down version, all you have to do is specify
use=tty_only;
on the command line for the final link operation. The linker will resolve this reference by running through the library and obtaining the module that defines the symbol tty_only. This module also contains a secondary global definition for open and a stripped down version of the open routine. When the module is brought in to resolve the reference to tty_only, the linker also obtains the definition for open. Thus any subsequent references to open will be resolved with the stripped down version that is already available, and the linker will not search through the library for the full-sized open.
As this example shows, secondary global definitions can be used inside a module to create the module's "personalized" version of a routine. Those using the module will get the personalized version; if the module is not obtained, the linker will search through the normal global definitions in the library and will find the standard version of the routine. Note that the order of linking is important in this case -- the secondary definition must be found before the "normal one" or else the linker will search the library and find the standard definition.
Another concept that crops up in connection with object libraries is that of a secondary reference (not related to secondary global definitions). A secondary reference is a reference to a symbol that may or may not exist within the modules and library routines that are being linked together. When a secondary reference is found, the linker attempts to resolve the reference in the usual way, by searching through the compiled modules and various object libraries. If a corresponding definition for the symbol is found, the reference is resolved; if not, the linker simply creates a null definition and resolves the reference that way. No error messages are issued in this process, since secondary references are used for "optional" data objects.
When we describe the format of a directive, the following conventions will be used:
As a simple example of how this notation is used, we'll describe the LD_MODULE directive. This directive appears close to the beginning of an object file and indicates the start of an object module. The format of the directive can be written as
{M}{length}"module_name"{chk}
The parts of the directive are described below.
As an example, suppose that we have a machine with 8-bit bytes and a module named X. The LD_MODULE directive for this module could be written in hex as
4D 01 58 14 -- 4D is ASCII 'M' -- 01 is data length, 1 byte -- 58 is ASCII 'X', module name -- 14 is checksum
As noted earlier, the number of bytes in a single standard format directive is limited by the length that can be expressed by the {length} byte. If this is not long enough to hold a piece of data required by the data (e.g. a long string) and it is not appropriate to use the long directive format, a continuation directive is required. The directive is called LD_CONTINUATION, and it has the form
{&}{length}data{chk}
where {&&} is the ASCII ampersand character. The data in this directive is considered a direct continuation of that data in the preceding directive.
An LD_CONTINUATION directive is always required when the {length} byte of the preceding directive has the maximum possible value. For example, on a machine where a byte has eight bits, a {length} byte has a maximum value of 255. If a directive happens to have exactly 255 bytes of data, there must still be an LD_CONTINUATION directive (with a zero length).
The LD_CONTINUATION directive is one way of handling long directives. Another is to use the extended record format.
An extended record is called LP_EXTEND. It has the format
{+}(length){type}contents{checksum}
The beginning of the directive is the ASCII '+' character. After this comes a ULONG (length) giving the length of contents of the record (plus one for the checksum). Since the length is given as a ULONG rather than a byte, the maximum length of this kind of directive is 2**28 characters.
The {type} byte is the type of another record (LD_MODULE, LD_DATA, etc.). After this come the arguments that would normally be given for that type of directive (except that the usual length byte is not specified).
The checksum at the end of the record is a checksum for the entire record, from the '+' at the beginning to the end of the contents.
In the rest of this manual, we will write up directives as if they were in standard format. However, any directive may be written in the extended format if the length of the data makes this necessary.
As another example of a simple directive, LD_END marks the end of an object module. It can be written
{E}{length}{chk}
The first byte is the ASCII character 'E', identifying an LD_END directive. The {length} byte will always be 0 (because there is no DATA area) and the {chk} byte will always be the ASCII character 'E' (which is the exclusive OR of 'E' and 0).
After the LD_END directive, there should be another marker consisting of
{\0}{\0}{\0}
where all three bytes are 0-bits (ASCII NUL characters). This is used to mark the very end of an LD file, and the end of modules in LD libraries.
LD object code describes a program in terms of segments. Each segment corresponds to a block of memory that will be used when the program executes. For example, there might be one segment that holds all the data while another that holds all the executable code.
There is no direct relationship between LD segments and other constructs that might be called "segments" on a particular machine (e.g. hardware segments). The only connection is that each LD segment will always be contained entirely within a single hardware segment.
Segments are often defined within other segments. For example, external variables may be defined as separate segments within the data segment. When a segment is created inside another segment, LD places the contents of the sub-segment at the end of anything else that is currently in the enclosing segment.
To define a segment fully, LD object code must specify several pieces of information:
Sometimes the contents of the segment do not need to be specified, as in the case of uninitialized data areas. Also, the size of a segment can often be inferred from the highest initialized location.
In summary, the major characteristics of a segment are:
Predefined segments are segments which do not appear in the specified input files, but are required for a specific output format. For example, if the GCOS8 NS mode version of LD is asked to prepare a run-unit, LD must implicitly create a linkage segment to contain the descriptors for the segments that are explicitly specified in the input files.
The number and nature of predefined segments is system-dependent, and is also dependent on the output format that LD is asked to generate. For example, the GCOS8 NS mode version of LD does not put in a linkage segment when it is creating an LD object library; the linkage segment is only needed for the run-unit format.
Predefined segments are sometimes added by the writer for the particular output format, but the preferred method is to use an extra module rather than innate knowledge of the format.
Segments and symbols are identified by reference numbers or SEGREFs. Each time a new segment or symbol is referenced or defined, it is implicitly assigned the next sequential SEGREF. LD object code always uses SEGREFs to refer to segments.
Conceptually, SEGREF numbers begin at one and increase from there. In practice, however, it is sometimes desirable to change the beginning point.
The LD_BEGIN directive gives a value that should be used as the next SEGREF. Subsequent SEGREFs will follow sequentially from this value, until a new LD_BEGIN directive is encountered. The directive has the format
{B}{length}<init_seg_no>{chk}
where
In future directive descriptions, we will usually omit the {length} and {chk} bytes because these are always present. Thus we might describe LD_BEGIN in the abbreviated format
{B}<init_seg_no>
Note that an LD_BEGIN directive can leave gaps, i.e. SEGREF numbers for which there is no corresponding reference or definition. Attempting to use such a SEGREF is an error.
An LD_BEGIN directive can also specify an initial segment number that has already been used for some other segment. In this case, new references or definitions will hide the previous ones with the same SEGREF numbers. The previous segments will still exist, but they can no longer be mentioned by LD directives -- their SEGREF numbers now refer to the new segments.
New segments are usually created with the LD_CRSEG directive. This has the format
{S}{flags}<parent>"name"
(Remember that we are leaving out the {length} and {chk} in our format descriptions, even though these will be present in the actual directive.) The fields of LD_CRSEG are described below.
1 -- GLOBAL 2 -- COMMON 4 -- SECONDARY
The COMMON flag indicates the Fortran type of common block (and similar constructs in other languages). This allows multiple definitions of the same segment, with the final size being the largest of the sizes of all the definitions. On some machines, certain C constructs must be expressed as COMMON segments, because of loader requirements. For example, on the Bull HN DPS-6, the linker requires every external variable to be put in its own COMMON block. Flags are ORed together. Thus the flags for a secondary global symbol would have a 1 ORed with a 4.
When the loader reads an LD_CRSEG directive in the input, it assigns the next sequential SEGREF to the new segment and records the parent segment that contains the new segment. It also creates a SYMDEF for the segment, using the name that is given in the directive.
Notice that the SEGREF of the segment being defined does not actually appear in the LD_CRSEG directive. It is obtained implicitly by incrementing the count of the number of segments that have already been assigned SEGREFs.
Here is an example of an LD_CRSEG directive (written in a combination of ASCII characters and hex digits).
S 08 01 82 ..code CHK -- 'S' identifies LD_CRSEG directive -- 08 is length, eight bytes -- 01 is flag, indicating global segment -- 82 is Dvalue; says parent segment has SEGREF 2 -- "..code" is name of segment -- CHK is checksum, whatever it has to be
Note that the length of the name can be determined from the {length} byte. LD knows how long the {flags} and <parent> are, so the rest of the length must be the name.
The LD_NAME directive associates a symbol name with an offset from the beginning of a segment that has already been created with LD_CRSEG. For example, the first function in a program is usually defined as an offset of zero from the beginning of the code segment. A symbol defined with LD_NAME is assigned the next available SEGREF number, just like segments defined with LD_CRSEG.
LD_NAME has the format
{N}{flags}<offset><parent>"name"
where
If the {flags} indicate a global symbol, the loader will generate a SYMDEF for the symbol being defined.
Here is an example of an LD_NAME directive.
N 07 01 80 8A main CHK -- 'N' indicates LD_NAME -- 07 is length, 7 bytes -- 01 is flag, indicate global symbol -- 80 is Dvalue offset (0) -- 8A is Dvalue SEGREF of parent segment (hex A, segment 10) -- "main" is name of symbol -- CHK is checksum
The LD_ALIGN directive describes the alignment of a segment when the alignment is important. For example, a segment containing a double-precision floating point variable has to start on a double word boundary on many machines. LD_ALIGN would indicate this requirement. The directive has the format
{A}<segref><align_in_bits>
where
For example,
A 02 8B A0 -- 'A' for LD_ALIGN -- 02 is data length, two bytes -- 8B is Dvalue for segment, hex B, segment 11 -- A0 is Dvalue for 32
indicates that segment 11 should be aligned on a 32-bit boundary (probably four 8-bit bytes).
An LD_ALIGN directive for a segment may appear long after the LD_CRSEG that creates the segment, since alignment requirements are sometimes only discovered long after the segment is first defined.
When an alignment requirement is specified for a particular segment, the same requirement is automatically inherited by any segments containing that segment. If a particular data object must be aligned on a double word boundary, then all enclosing segments should have double word alignment (or better) so that there will be no problem getting the alignment that the data object needs. In this way, one LD_ALIGN directive may dictate alignment requirements for several (nested) segments.
In order to eliminate superfluous LD_ALIGN directives and to make it easier to lay out the segments of the program, the LD object format has a directive that summarizes information about each segment. The LD_SEGINFO directive has the format
{s}<segref><length_in_twords><align_bits>
where
One LD_SEGINFO directive eliminates the need for all LD_ALIGN directives describing the alignment of the segment. It also eliminates the need for LD_DATA and LD_RELOC directives stating the length of the segment. (LD_DATA and LD_RELOC are described in Chapter 4 of this document.)
An LD_REFER directive creates a reference to a symbol name that has not yet been defined. Local names must be defined later in the module; global names may be defined elsewhere in the module or in another module. If a global name is not defined in the module that contains the reference, different output writers deal with the situation in format-dependent ways.
LD_REFER has the format
{R}{flags}"name"
where the {flags} are the same as for LD_CRSEG and the "name" is the name of the symbol which has been referenced. The name cannot be null.
The symbol named in an LD_REFER is assigned the next available SEGREF number just like symbols defined with LD_CRSEG and LD_NAME directives.
If a module contains a definition for a name, as well as LD_REFER directives, the flags in the definition directive take precedence over those in any LD_REFER directive. Of course, if the definition is marked as global and the LD_REFER is local (or vice versa), the directives refer to different objects and the flags have no effect on each other. However, if both the definition and LD_REFER are global, the flags on the definition will be used and the flags on the LD_REFER will be ignored.
The LD_SYMOPTS directive specifies one or more characteristics for a segment, SYMREF, or SYMDEF. The format of the directive is
{o}<segref>"characteristics"
where
Characteristics vary from machine to machine and the format of the "characteristics" string is system-dependent.
The LD_LITERAL directive is used to create a segment that contains a literal. Such a literal may be "folded" with any other literal that has the same value. For more information about LD_LITERAL, see Chapter 4.
The LD_CONTROL directive is used to specify various kinds of information about a program or a module. It has the format
{#}{length}sub-directive{chk}
where {length} and {chk} are the usual length and checksum bytes, and {#} is the ASCII character '#'.
Sub-directives for LD_CONTROL all consist of a single ASCII character, followed by one or more pieces of data. Sub-directives are known by names that begin with LC_.
This specifies the version of the LD object format that a file contains. It should be the first directive in a file or library. The format of the sub-directive is
{V}{version}'create_time'{type}
where {V} is the ASCII character 'V' (upper case), {version} is an integer, and 'create_time' is a time value indicating when the file or library was last updated. The {type} byte is optional and indicates the type of the file: a value of LT_LD_OBJECT (0) indicates a regular LD file containing object code; a value of LT_RUN_UNIT (1) indicates a run-unit or run-unit library (on systems where run-units are supported). If this byte is omitted, the default is LT_LD_OBJECT.
LD version numbers began at zero. This manual describes version 2 of the LD format.
This gives the name of the source file that contained the original source code. The sub- directive has the form
{F}"filename"
where {F} is the ASCII character 'F' and "filename" is the name of the source file.
An object file may contain several LC_FILENAME directives (i.e. LD_CONTROL directives with LC_FILENAME sub-directives). For example, suppose the file cprog contains the C code
/* Program starts here */ #include <stdio.h> main() { ...
There will be an LC_FILENAME directive for cprog when the program first begins compilation. Immediately after that will be an LC_FILENAME directive for <stdio.h> include file. If the <stdio.h> include file #includes other files, there will be LC_FILENAME directives for all those. When the compiler has finished with the <stdio.h> file, it will output another LC_FILENAME for cprog to show that it has returned to the original source file.
The LC_REVISION sub-directive has the format
{R}"revision"
where {R} is the ASCII character 'R' and "revision" is a string. The contents of "revision" may give a version number to the program being linked. In C programs, the "revision" string is obtained from a #pragma version preprocessor directive.
The LC_CPR sub-directive has the format
{C}"copyright_string"
where {C} is the ASCII character 'C' and "copyright_string" is a string. The contents of "copyright_string" may state copyright information for the program being linked. In C programs, the "copyright_string" is obtained from a #pragma copyright preprocessor directive.
The LC_TITLE sub-directive has the format
{T}"title"
where {T} is the ASCII character 'T' and "title" is a string. The contents of "title" may state a name or title for the program. In C program, the "title" is obtained from a #pragma title preprocessor directive.
The LC_MODULE sub-directive has the format
{M}"module_name"
where {M} is the ASCII character 'M' and "module_name" is a string. The contents of "module_name" may state a name for the module.
Note that this LD_CONTROL sub-directive provides the same information as the LD_MODULE directive we discussed in Chapter 1. Older versions of LD use LD_MODULE, while newer ones use LD_CONTROL and LC_MODULE.
The LC_TARGET_INFO sub-directive describes important aspects of the target machine. It has the format
{I}{byte_len}{reloc_align}{origin_size} "machine_name"
where
The LC_TARGET_INFO directive should be one of the first directives in an object file, since it strongly influences how the rest of the data in the file should be treated. In particular, Dvalues and TWORDs cannot be interpreted properly without the information provided by this directive.
The LC_LIB_HEADER directive is used at the beginning of object libraries. It provides information about the object library. It is often followed by unused space, since library modules usually start on a specific alignment boundary (e.g. a disk sector boundary) to improve performance.
The LC_LIB_HEADER directive has the format
{H}{version}(index_seek)(end_seek) LH_LIB_NAME(gap)'time'{type}
where
The LC_PATCH directive is used in run-units to indicate a patch. This is a change that should be made to the object code at the time that a run-unit is loaded or prepared for a debugging session.
For example, the original source code initialized a variable to 0 and in a later debugging session, you discover that the value should be 1. To avoid recompiling the original source code, some debuggers let you patch the object code, changing the 0 initialization value into a 1.
Typically, patches are used with programs that are already in active use, especially ones that are distributed to other sites. Rather than recompile the program and send out a new release of the software, it may be more convenient just to send out a collection of patch directives which can be added to the existing program.
LD offers two ways to patch code. The first is to leave the original object code as is, and to add an LC_PATCH directive describing the change you want to make. The second is to change the object code in the way you wish, and then add an LC_PATCH_RECORD directive describing what the object code originally said. This section discusses LC_PATCH, while the next discusses LC_PATCH_RECORD.
An LC_PATCH directive is a combination of the LD_DATA and LD_RELOC directives described in Chapter 4. In order to understand the contents of the directive, it is better to read Chapter 4 first. LC_PATCH has the format
{P}'time'<refno><offset>[data_word] <nrelocs>[{code}<symbol>]*"comment"
where
Patch records are only created by debuggers. They are not generated by compilers or by programs like LD or LEDIT.
The LC_PATCH_RECORD directive is used when a patch has been made to the run-unit. The directive is a record of what the run-unit contained before the patch was made. Thus the body of the object code contains the new material and the LC_PATCH_RECORD directive records the old. Contrast this with LC_PATCH, where the body of the object code contains the old material and the LC_PATCH directive provides the new.
LC_PATCH_RECORD has the same format as LC_PATCH. The [data_word] records the old contents of the machine word that was patched, and the relocation pairs record the relocation information that was associated with the old data word. The rest of the LC_PATCH_RECORD fields have the same meaning as in LC_PATCH.
Data directives are used to initialize the contents of a segment. Usually, the segment being initialized will be one of those defined with a LD_CRSEG or LD_NAME directive in the same input file. However, it can happen that input contains initialization directives for segments that were only referenced by the input (with an LD_REFER directive). In this situation, the result depends on the output format.
There are two directives for producing absolute (constant) data:
After data has been specified, it can be relocated with the LD_RELOC directive. LD_RELOC is also used to specify the load origin(s) for the preceding LD_DATA directive, so LD_RELOCs may appear even if no relocation is required.
Fixed data is assigned in the order it is seen in input, so only the most recently stored value in each bit is kept. Only after all constant data has been merged is the relocation performed. If some TWORDS have only been partly initialized (through LD_BIT_DATA directives), the result depends on the output format chosen.
The LD_DATA directive specifies data that should be stored in a segment.
Every LD_DATA directive is immediately followed by an LD_RELOC directive which gives relocation information for the data in the LD_DATA directive. The relocation information tells which segment contains the data, the offset of the data within the segment, and how to relocate the data.
The format of an LD_DATA directive is
{L}[data_word][data_word]...
where
Each of the TWORD values represents a fixed length chunk of data that can be assigned on a boundary to which relocation may be applied. For example, on the DPS-8, each TWORD word of data represents a machine word (36 bits). In this way, the contents of a segment may be given one word at a time.
The format of an LD_RELOC directive is
{O}reloc-triplet,reloc-triplet,...
where {O} is the ASCII character 'O' (upper case "oh"). The DATA field of the directive consists of a sequence of relocation triplets, one for each origin and for each relocatable TWORD in the corresponding LD_DATA directive. Some of the TWORDs in the LD_DATA will not need relocation (e.g. many machine instructions in a code segment), so there are often more TWORDs in the LD_DATA than relocation triplets in the LD_RELOC.
Each relocation triplet consists of three components:
We will refer to these three components as the CODE, the WORD#, and the SEGREF.
Most relocation codes are (target) system-dependent. However, there is one system-independent CODE value; this is given the symbolic name ORIGIN_RELOC, and has a value of zero. This CODE indicates that the corresponding TWORD(s) in the LD_DATA directive do not specify an initialization value but an offset (in TWORDs) from the symbol indicated by the triplet's SEGREF. Effectively, an ORIGIN_RELOC triplet says, "Start writing out data at this offset from this symbol." The first relocation triplet in any LD_RELOC directive must be an ORIGIN_RELOC triplet that tells where to write the first TWORD of the preceding LD_DATA directive.
The number of TWORDs required to specify the offset for ORIGIN_RELOC is system- dependent. This number is given by the {origin_size} value in the LC_TARGET_INFO directive that describes the target machine (see Chapter 3). Usually, a single TWORD is large enough to hold such an offset; but on a machine with small TWORDs (e.g. the PC, where TWORDs are only 8 bits) several TWORDs may be needed.
To show how ORIGIN_RELOC works, suppose a C program contains a declaration of the form
int K = 3;
We'll suppose that K is an external variable and has a segment all to itself. We could write directives to initialize K to the proper value in the following way (omitting LENGTH and CHK bytes to simplify things).
{L}[0][3] {O}{ORIGIN_RELOC}{0}<K's SEGREF>
The LD_DATA (L) directive has DATA field consisting of two TWORDs with the values 0 and 3. The LD_RELOC (O) directive has a DATA field consisting of one relocation triplet. This triplet has the following components:
The relocation triplet says that the loader should begin laying down data in K's segment, beginning at the offset 0. This offset is obtained from TWORD zero of the LD_DATA directive. All the remaining data in the LD_DATA directive (just the TWORD [3]) is laid down sequentially in K's segment. The result is that the value 3 is put in the segment that represents K.
Multiple initializations are handled in a similar way. For example,
int I[] = {1,2,3,4};
could be written with LD_DATA and LD_RELOC directives as
{L}[0][1][2][3][4] {O}{ORIGIN_RELOC}{0}<I's SEGREF>
As before, the relocation triplet indicates that the loader should begin laying down data at offset 0 in I's segment. TWORDs one through four are laid down sequentially in this segment.
The situation gets slightly more complicated when some of the data TWORDs require relocation. Suppose that we are working with
int C[10]; int *P = &C[5];
where both P and C are external variables and therefore have their own segments. To initialize P on the DPS-8, we could use the following directives.
{L}[0][05000000] {O} {ORIGIN_RELOC}{0}<P's SEGREF> {PTR_RELOC}{1}<C's SEGREF>
The LD_RELOC (O) directive has two relocation triplets. The first indicates that the loader should lay down data as TWORDs, beginning at offset 0 in P's segment. The offset of is obtained from TWORD {0} in the LD_DATA directive.
The second relocation triplet tells how TWORD {1} should be relocated when it is laid down in P's segment. {PTR_RELOC} is another symbolic CODE, standing for a type of pointer relocation on the DPS-8. With this relocation code, the corresponding TWORD is expected to contain a (word) offset value in the upper half of the word (which is why the corresponding TWORD value is 05000000 octal). This offset is relocated by adding on the address of C's segment (as indicated by the relocation triplet). The final address laid down in P is therefore the address formed from the start of C's segment and a word offset of 5.
ORIGIN_RELOC is a special relocation code that is used on all machines. Apart from this, all relocation codes are system-dependent.
For example, on the DPS-8 machine, addresses are often stored in the upper 18 bits of the machine word. Therefore there is a frequently used relocation code indicating that LD should relocate the upper 18 bits of a particular data value and leave the rest alone. A different relocation code is used if the address is stored in the lower 18 bits of the word; this different relocation code tells LD to relocate the lower 18 bits of the data value and leave the upper bits alone. In some cases, a word may hold two addresses, one in its upper half and one in its lower. In this case, there would be two relocation triplets for the same word: one for each address that needed relocating.
On other machines, where addresses may be stored in a variety of locations within a chunk of data, the relocation codes tell what part of the data is the address and what part should not be touched. Appendix C describes the relocation codes for various systems.
It is usually possible to apply several relocations to the same TWORD (although some combinations may be meaningless).
On the other hand, a single relocation triplet may affect more than one TWORD; for example, in NS mode on the DPS-8, there will be a CODE that relocates a descriptor made up of two TWORDs. In this case, the TWORD named in the triplet will always be the one with the smallest (numerically least) offset.
LD_DATA and LD_RELOC directives can be used to provide information about the length of a segment. In essence, you use an ORIGIN_RELOC relocation triplet to set the relocation origin to the end of the segment. For example, consider
int C[10];
on the DPS-8. If C is an external variable and therefore a segment on its own, we could write
{L}[10] {O}{ORIGIN_RELOC}{0}<C's SEGREF>
This sets a relocation origin for the segment to an offset of 10 words (given by TWORD {0}) in C's segment. This indicates that C's segment is at least 10 words long, even though zero words of data were supplied.
The examples we have given so far in this chapter show how LD_DATA and LD_RELOC could be used. In practice, however, the directives may be used in a slightly different way (by our C compilers, for example).
Let us consider the four extern declarations
int K = 3; int I[] = {1,2,3,4}; int C[10]; int *P = &C[5];
When encountering one of these declarations, the compiler will put out directives in the following order.
The directives listed above create space for the variables but do not give initialization values. Initialization values will be given at the end of the object file, using large LD_DATA/LD_RELOC directives that initialize several data objects at once. For example, you might see
{L}[0][3] [0][1][2][3][4] [0][05000000] {O}{ORIGIN_RELOC}{0}<K's SEGREF> {ORIGIN_RELOC}{2}<I's SEGREF> {ORIGIN_RELOC}{7}<P's SEGREF> {PTR_RELOC}{8}<C's SEGREF>
The above directives combine all the initializations we have discussed into a single LD_DATA/LD_RELOC pair.
It is sometimes desirable to initialize part of a TWORD, while leaving other parts untouched. Such bit data is specified with the LD_BIT_DATA directive. This has the form
{1}<symbol><bit_offset><Nbits>[data][data]...
where
Literals are values that are expected not to change in the course of program execution. Typically, literals are numeric constants or constant strings, although there are other kinds of literals too.
Each literal is created as a separate segment. Collections of literals are grouped into literal pools. A literal pool is associated with a particular segment. The pool will contain all the literals used by that segment and the segment's child segments.
If several literals in a pool have the same value, the literals are folded together during the loading process. By this, we mean that the pool will only contain one literal with that value, and all references to that value will be aimed at this unique literal.
Before identical literals are folded, each will have its own SEGREF (since each literal is its own segment). This means that the folding process associates all these separate SEGREFs with the segment that holds the (unique) folded literal.
A literal value is created using the LD_LITERAL and LD_END_LITERAL directives.
LD_LITERAL has the format
{[}<ref_segment>
where
An LD_LITERAL directive implicitly associates the next available SEGREF with the literal being created. In this way, the SEGREF can be used to refer to the literal (in the same way that SEGREFs refer to segments created with LD_CRSEG).
The LD_END_LITERAL directive marks the end of a literal whose definition has begun at a previous LD_LITERAL directive. LD_END_LITERAL has the format
{]}<lit_segref>
where
A compiler's first step in creating a literal is to output an LD_LITERAL directive giving the SEGREF of the segment that uses the literal. After the LD_LITERAL will come LD_DATA and LD_RELOC directives which generate the literal value. In some situations, other directives may also be needed to construct the literal (e.g. LD_ALIGN).
When the code generator has finished outputting directives to construct the literal value, it marks the end of the literal with LD_END_LITERAL. Once the LD_END_LITERAL has been found, LD can place the literal in the pool associated with the segment whose SEGREF appeared in the original LD_LITERAL directive. LD does this by looking at the containing segment, the segment's parent, the parent's parent, and so on, until it finds a parent segment that has an associated literal pool.
If the same literal is used several times in a program or module, the uses will be folded together to give a single value in the literal pool.
The LD_POOL directive creates a literal pool associated with a segment. The directive has the form
{=}<parent>
where
The literal pool will be contained in the segment given by <parent>. When literal values are created by the parent segment or by any of its child segments (segments contained in the parent), LD will collect them and store them in the literal pool. Depending on output format, the literals in a pool may eventually become children of the parent themselves.
The directives we have already discussed are sufficient for creating an object module that can be linked and loaded. However, compilers may generate additional directives that specify debugging information for the program being compiled.
The LD phase of the compiler assembles the information provided by these into debugging tables which are included as part of the final load module. These tables can be read by a symbolic debugger to provide information when examining a post-abort dump or when running a program. The information supplied by the debugging directives includes the type and storage class of each variable and function defined in a program.
Debugger information must always be given in the context of scope within a program. For example, information about a local variable "i" in one function may not apply to a local variable "i" in a different function. Thus, all the information specified by debugger directives must be associated with a particular scope.
Scopes are represented by reference numbers known as VLREFs. This stands for Variable List Reference numbers. Variables with the same VLREF have the same scope, e.g. all the auto variables defined at the beginning of a function. The set of all variables with a given VLREF is called the variable list of the associated scope.
Variable lists are frequently nested inside other lists. For example, the elements of a structure form their own variable list because the element names are only meaningful within the context of the structure. The structure itself will be part of another variable list (e.g. an enclosing structure). Similarly, the local variable list of a particular function is enclosed in the variable list of external variables which are also accessible to that function.
The LD_DEFVLIST directive is the most common way of marking the start of a variable list. This directive has the format
{v}<enclosing_vlref>
where
VLREFs begin at 1. Every time an LD_DEFVLIST command is issued, the next available VLREF is used for variables in the associated variable list. This is much the same as the situation for LD_CRSEG directives, where every LD_CRSEG directive issued is implicitly given the next available SEGREF number.
The variable list with a VLREF of 1 is usually the outermost variable list in the module, representing a scope covering the entire compilation unit.
There is a special predefined variable list with a VLREF of zero. It is intended to contain externally visible objects.
The end of a particular scope is marked by an LD_ENDVLIST directive. This has the format
{}}<vlref><start_counter><end_counter>
where
After an LD_ENDVLIST directive, variables with the given VLREF are no longer recognized. For example, an LD_ENDVLIST directive is issued at the end of every function to mark the end of the scope of that function's local variables.
The LD_DEFTYPE directive describes a data type used in a module. Every time an LD_DEFTYPE is used to specify a type, a number is associated with the given type. This number is known as a TREF.
Like SEGREFs, TREFs are assigned sequentially. By default, TREFs begin at 1. The LD_INITTREF directive can be used to specify a different starting point. It has the form
{0}<TREF>
where {0} is byte containing an ASCII zero, and <TREF> is a Dvalue giving a number that should be used the next time a TREF is defined. New TREFs will be assigned sequentially from this starting TREF.
The format of LD_DEFTYPE is
{T}{type_code}extra info
where
Types: 0 -- typedef 1 -- structure 2 -- union 3 -- enumerated class 4 -- pointer 5 -- array 6 -- function 7 -- bit field 8 -- const modifier 9 -- volatile modifier 10 -- far modifier 11 -- near modifier 12 -- huge modifier 13 -- void 14 -- variable argument list (... in C) 15 -- char (signed) 16 -- short 17 -- int 18 -- long 19 -- unsigned char 20 -- unsigned short 21 -- unsigned int 22 -- unsigned long 23 -- float (single precision) 24 -- double 25 -- long double 26 -- statement label 27 -- block label 28 -- signed long char 29 -- unsigned long char
(Note: new numbers may be added to this list in future releases of the compilers.) The type codes up to 11 represent type modifiers, while the remaining codes (from void on) represent basic types. The basic types correspond to the basic types of the C programming language, plus a type for block labels (26).
Whenever a new data type is encountered in the source code, an LD_DEFTYPE directive is issued to describe that data type. The LD_DEFTYPE for a type is only issued the first time a particular type is mentioned; in other words, there is only one declaration for the int type, no matter how many int variables are declared in the source code.
LD predefines TREFs for all the basic types listed above. Each is given a TREF equal to the type code. Typically then, an LD_INITTREF directive will be used to begin numbering TREFs with the number that follows the last basic type value.
If the {type_code} of an LD_DEFTYPE directive is 4 (indicating a pointer type), the format of the LD_DEFTYPE is
{T}{4}<tref>
where <tref> is a Dvalue giving the TREF of the type pointed to.
As an example,the TREF for int values is 16, so an LD_DEFTYPE directive for a pointer to int could be written as
{T}{4}<16>
The 4 indicates a pointer type; the 16 is the TREF of the int type.
If we wanted to define a pointer to a pointer to integers, the first argument of the LD_DEFTYPE directive would be a 4 and the second argument would be the TREF of the pointer to integers that we just described.
The const and volatile keywords are known as type qualifiers in the C programming language. Speaking very loosely, an object with the const qualifier should not be assigned a value in the executable code of the routine that declares the object. An object with the volatile qualifier may change its value without direct program action (e.g. a hardware clock). For a more rigorous description of these two qualifiers, see the C Reference Manual.
If the {type_code} of an LD_DEFTYPE directive is 8 (const) or 9 (volatile), the format of the directive is
{T}{type_code}<tref>
where <tref> is a Dvalue giving the TREF of the type being modified by const or volatile.
For example,
const int *p;
would result in two LD_DEFTYPE directives.
{T}{8}<16> -- const int {T}{4}<TREF of const int> -- pointer to const int
The near and far qualifiers are used when a machine has more than one pointer format. Typically, far pointers are able to address a wider range of memory than near pointers, but far pointers take up more space themselves and are less convenient to work with. Thus near pointers are generally more efficient to use, but far pointers are required in situations where addressability is a concern. For a more rigorous description of near and far pointers, see the appropriate documentation on machines that support them.
Near and far qualifiers are added to types in a manner similar to the process of adding the const and volatile qualifiers. Note that a "far pointer" is actually a pointer to a "far" object; the "far" attribute is attached to the object type, not the pointer. The same goes for "near".
If the {type_code} of an LD_DEFTYPE directive is 6 (function returning a value), the format of the LD_DEFTYPE is
{T}{6}<tref>
where <tref> is a Dvalue giving the TREF for the type of value that the function returns.
As an example, the TREF of the int type is 16, so the LD_DEFTYPE directive for a function returning int could be written as
{T}{6}<16>
If the {type_code} of an LD_DEFTYPE directive is a 5 (indicating an array), the format of the LD_DEFTYPE is
{T}{5}<tref><length>
where
Thus the LD_DEFTYPE directive for an array of 10 integers could be written
{T}{5}<16><10>
(As always, the TREF for int is 16).
If an array has more than one dimension, it will be defined with several successive LD_DEFTYPE directives. For example,
int arr[20][30];
would be broken down into
{T}{5}<16><30> -- array of 30 integers {T}{5}<TREF of int[30]><20>
The second LD_DEFTYPE directive might be read as describing an array of 20 "arrays of 30 integers".
This method of breaking type declarations into parts is used for all complex types. For example, consider
int *p[20];
which declares an array of 20 integer pointers. First of all there is a TREF for integer pointers
{T}{4}<16>
and next there is a TREF for arrays of such things.
{T}{6}<TREF of *int><20>
If an array is declared with an unspecified dimension, as in
extern int x[];
the <length> argument in the LD_DEFTYPE will be given as zero, indicating that it is unknown at present.
If the {type_code} of an LD_DEFTYPE directive is a 1 (indicating a struct type), the format of the directive is
{T}{1}<vlref>"tag"
where
An LD_DEFTYPE for a structure automatically obtains a new VLREF (scope) because the fields in the structure form their own name space. Thus an LD_DEFTYPE for a structure implies an LD_DEFVLIST directive to start a new variable list.
As an example, suppose we have the definition
struct complex { float x; float y; };
outside the scope of any function in a program. Since this is an external type, the enclosing variable list has VLREF 1. The corresponding LD_DEFTYPE declaration would have the format
{T}{1}<1>complex
(assuming that 1 was the VLREF for file scope).
No special LD_DEFTYPE declarations are needed for other types of elements of a structure (except for bit fields which are described in the next section). For example, the elements of the "complex" structure declared above just have the normal float type and do not need a special LD_DEFTYPE. If an element has a type that has not been seen before in this module, a normal LD_DEFTYPE will be constructed for the type.
If the {type_code} of an LD_DEFTYPE directive is 7 (indicating a bit field), the format of the directive is
{T}{7}<tref><length>
where
For example,
{T}{7}<16><9>
describes a bit field nine bits long and having the int type.
LD_DEFTYPE directives for union and enum types are similar to those for struct types.
{T}{2}<vlref>"tag"
describes a union type with the given "tag", inside the variable list <vlref>.
{T}{3}<vlref>"tag"
describes an enum type with the given "tag", inside the variable list <vlref>. Both LD_DEFTYPE automatically start a new variable list scope, allocating a new VLREF for the variable list.
A typedef statement is usually handled with at least two LD_DEFTYPE directives. For example,
typedef char *STRING;
could be represented with three LD_DEFTYPE directives.
The format of a typedef LD_DEFTYPE directive is
{T}{0}<tref><vlref>"name"
where
The LD_DEFVAR directive describes data objects (functions and variables) in a program. There is an LD_DEFVAR directive for every variable and function used in the program. (Note that there is NOT a separate LD_DEFTYPE directive for every variable and function. LD_DEFTYPE directives are only issued when a NEW data type is encountered.)
In order to describe a variable, one needs several pieces of information: the storage class, the data type, the name, the scope, and one or two other details depending on the type of variable being described.
The format of the LD_DEFVAR directive is
{V}{class}<tref><vlref><offset><segref>"name"
where
Storage Classes: 0 -- external 1 -- static 2 -- auto 3 -- register 4 -- argument 5 -- structure tag (*) 6 -- union tag (*) 7 -- enum tag (*) 8 -- structure element 9 -- union element 10 -- enum element 11 -- debugger use (*) 12 -- typedef (*) 13 -- display
The classes marked with a star will never actually appear in an LD object file, but the class codes are reserved for the internal use of support software.
Most of the above classes are self-explanatory, but the need for an "argument" class may need some clarification. In C, function arguments are semantically and syntactically the same as "auto" variables. However, the actual machine code that deals with arguments is sometimes radically different from the code that deals with other auto variables, and the debugger must know the difference in order to get things straight.
The "display" class indicates that the object is a compiler-generated auto which contains a pointer to the most recent stack frame of its lexical parent. This is used for languages like Pascal, where subprograms can be local to other subprograms.
Below we list the possible values of <offset> depending on the type of object.
All subsequent LD_DEFVAR directives defining elements will give the VLREF of the structure that contains the elements. When the code generator comes to the end of the structure, it will issue an LD_ENDVLIST directive to indicate the end of the variable list that contains the structure elements.
As an example, consider the following declaration.
struct complex { float x; float y; } Z1;
This would generate the following directives.
{T}<1><VLREF of enclosing VList>complex -- LD_DEFTYPE for struct {V}{8}<TREF of float><VLREF of struct><0>x -- LD_DEFVAR for x, offset 0 {V}{8}<TREF of float><VLREF of struct> <Bits in float>y -- LD_DEFVAR for y, offset is number of bits in float {}}<VLREF of struct> -- show end of struct V List {R}{01}Z1 -- reference to Z1 (global) {V}{0}<TREF of complex> <VLREF of enclosing VList><0> <SEGREF of Z1 segment>Z1 -- Z1 is external, of type complex, offset of 0 in Z1's segment {S}{01}<SEGREF of enclosing segment)Z1 -- creation of global segment for Z1 {L}[sizeof Z1] -- set size of Z1 segment {O}{ORIGIN_RELOC}{0}<SEGREF of Z1 segment> -- relocation triplet for size of Z1 {A}<SEGREF of Z1><bit alignment of struct> -- set alignment for Z1
Note that the code generator creates an LD_REFER directive for the reference to Z1 before it issues the LD_CRSEG directive that actually creates Z1's segment.
The LD_SCOPEVAR directive is used instead of LD_DEFVAR, for names that have associated scopes: the names of functions and labelled blocks. LD_SCOPEVAR specifies all the information the LD_DEFVAR does and also connects the name with its associated scope. The format of the directive is
{X}{class}<tref><vlref><offset> <segref><scope>"name"
where the <scope> argument gives the VLREF of the scope that is associated with the name, and all other arguments are the same as those for LD_DEFVAR. The <tref> will either be a function type (in which case the scope will be the outermost scope of the function given by "name"), or else a block label type (for languages that support labelled internal blocks).
The LD_SCOPEFLAGS directive lets you set or change certain attributes of a scope (variable list). The directive has the form
{z}<VLREF>{flags}{flags}...
where
A flag byte can have one of two forms:
LF_SET | LF_flag LF_CLEAR | LF_flag
(where '|' represents the C bitwise OR operation). If LF_SET is used, the attribute flag is turned on; if LF_CLEAR is used, the attribute flag is turned off. Possible flag values are:
There are four possible combinations of these flags.
The flags used by LD_SCOPEFLAGS have the following numeric values.
LF_SET 0 LF_CLEAR 1 LF_ROOT_SCOPE 2 LF_SAME_FRAME 4
By default, LF_ROOT_SCOPE is off and LF_SAME_FRAME is on (which is appropriate for a C inner scope).
The LD_LINETAB directive shows how source code is broken into text lines. This lets a debugger associate compiled code with lines in the original source file. The format of the directive is
{l}<vlref><counter><line#>{stat_type} <counter><line#>{stat_type}...
where
0 -- expression 1 -- break statement 2 -- goto 3 -- continue 4 -- return 5 -- if 6 -- test of for loop 7 -- switch 8 -- while 9 -- repeat (Pascal) 10 -- else 11 -- assignment 12 -- initialization of for loop 13 -- do of do-while 14 -- while of do-while 15 -- call 16 -- write/writeln (Pascal) 17 -- with (Pascal) 18 -- until (Pascal) 19 -- miscellaneous 20 -- increment of for loop 21 -- end of if 22 -- end of while 23 -- end of for 24 -- end of else-if 25 -- end of switch 26 -- beginning of function definition 27 -- end of function definition 28 -- return statement with expression 29 -- file name 30 -- beginning of inner scope 31 -- end of with (Pascal) 32 -- read/readln (Pascal) 33 -- start of input file 34 -- restore to previous file 35 -- line number of end of file 36 -- marks end of statement if ambiguous
Note that statements with many parts to them (e.g., for) have a different code for each part so that the parts may be distinguished.
The <counter>, <line#>, and {stat_type} values form a triplet. If a particular line has more than one statement or statement type on it, one of these triplets will be issued for each one.
In current compilers, the code generator does not put out an LD_LINETAB directive every time it comes to a new statement or statement type. Instead, it saves information about a number of statements and then puts out one large LD_LINETAB directive with a number of triplets in it. Accumulated LD_LINETAB information must be flushed before the compiler can change the name of the current input source file.
The LD_DEBUG_INFO directive provides a summary of information about the debugging directives associated with a module. By reading the LD_DEBUG_INFO directives, a program can determine important facts about the directives, in preparation for creating debugging tables for the module. LD_DEBUG_INFO has the form
{d}(codescopes)(trefs)(filenames) (linetabs)(ltabentries)(vars)
where
An object file coming fresh from a compiler's code generator has many LD_CRSEG, LD_NAME, and LD_REFER directives in it. Once the various library routines have been linked in to the object code, almost all of these can disappear.
Almost all segment creation directives can be removed -- once all the various pieces of the program have been brought together, the loader can choose an actual location for all the segments that are embedded in other segments. These segments can then be represented as offsets within the enclosing segment.
Since there are only a few segments which are not embedded in other segments, there are only a few LD_CRSEG directives required. LD_REFER and LD_NAME directives are not required because all references can be resolved. LD_ALIGN directives are not required because the embedded segments can be properly aligned at the time that the associated LD_CRSEGs are resolved. In fact, the entire object file can be reduced to a handful of LD_CRSEGs, a large number of LD_DATA/LD_RELOC pairs, and whatever debugging directives are to be included as part of the program. This format is called Final Object Format.
On most systems, object code never reaches Final Object Format. The object code is usually converted to the object format that is used on the target system. However, on a system that used LD object format as its object standard, Final Object Format would be used to store most compiled programs.
The LD_MARKER directive is used to indicate that an LD file is in Final Object Format, and to separate the file into its logical divisions. The format of LD_MARKER is
{*}
where {*} is a byte containing the ASCII character '*' (asterisk). The DATA field is empty.
LD_MARKER can be the first directive of an object module. Its presence indicates that directives of the module are arranged in Final Object Format. LD_MARKER also appears between key divisions of the object code. Below we show the divisions:
* marker LD_CONTROL directives segment creation, reference, description directives (LD_CRSEG, LD_NAME, LD_REFER, LD_SEGINFO, etc.) * marker LD_DATA/LD_RELOC directives * marker Debugging directives
If no debugging tables are desired, LD may skip the debugging directives by stopping at the third marker.
An LD object library is a collection of LD object modules, gathered into a single file for ease of use.
An object library should have the following properties.
There are several directives that are used solely inside LD libraries. These directives provide information about the library itself.
Each module has an associated LD_LENGTH directive, telling the length of in bytes. The directive has the format
{b}(length)
where (length) is a ULONG value giving the total number of bytes in the library. The LD_LENGTH directive for a module immediately precedes the module in the library.
The index of a library describes the modules contained in the library. The location of the index is given by the seek address contained in an LC_LIB_HEADER directive at the beginning of the library file. (LC_LIB_HEADER was described in Chapter 3.)
The index contains several sections, separated from one another by LD_MARKER directives. The sections are listed below, in the order in which they appear.
The directives that make up these sections are described in later sections.
The end of the index is indicated by a directive with a first byte of an ASCII NUL (octal 000). This directive has a zero {length} byte, and therefore a zero checksum. The same sort of directive is used to mark the end of each module.
The LD_INDEX_HEADER directive specifies the number of modules that are stored in the library. It has the format
{H}(number)
This is always the first directive of the index. Its location is given by the seek address in the LC_LIB_HEADER directive at the beginning of the file.
LD_LOCATOR directives appear immediately after the LD_INDEX_HEADER directive. There is one LD_LOCATOR directive for every object module in the library. These are given in the order that the modules appear in the library. The format of each directive is
{I}<objlen><spacelen>
where
This information is enough to let a program reading the file calculate the seek address of each module within the library.
LD_INDEX_NAME directives specify the name of a SYMREF or SYMDEF. Such directives are found in the SYMDEF and SYMREF sections of the index. The format of the directive is
{n}"name"
Each LD_INDEX_NAME directive is immediately followed by an LD_INDEX_ENTRY directive. These directives provide information about the symbol named in the LD_INDEX_NAME. In the SYMDEF section, the LD_INDEX_ENTRY directive lists modules that contain SYMDEFs; in the SYMDEF cross-reference section and the SYMREF section, the directive lists modules that contain SYMREFs. Modules are referenced by number; the first module is number 1. The format of LD_INDEX_ENTRY is
{e}<module_number><module_number>...
When LD_INDEX_ENTRY directives appear in the SYMDEF section of the index, the first module number in the directive is the number of the library module that contains the primary definition of the symbol named in the accompanying LD_INDEX_NAME directive. (Recall that an LD library can only contain one primary definition for each SYMDEF.) If this kind of LD_INDEX_ENTRY directive contains additional module numbers, they tell which modules contain secondary definitions for the symbol. If a symbol has secondary SYMDEFs but no primary one, the first <module_number> in the list will be zero.
If a module contains a COMMON definition or reference to the symbol, the LD_INDEX_ENTRY directive will contain the negative of that module number.
The Module Information section of the library index is made up of LD_INDEX_INFO directives. There is an LD_INDEX_INFO directive for each module in the library. The directives appear in the same order as the modules.
LD_INDEX_INFO has the format
{m}'time'{mflags}"filename"
where 'time' gives the time and date that the module was compiled or assembled, {mflags} is 000 if tables are not present and 051 if they are, and "filename" is the file that contained the original source code for the module. The file name used will be the file name in the first LC_FILENAME directive that appeared in the original LD file.
The remaining chapters of this document describe the use of an RU (run-unit) and its internal format.
An RU is a file that represents part of the in-memory image of a running program. We emphasize that it is only part of a program. A full program may be made up of the contents of an RU, plus data and software obtained from shared libraries and other RUs, as well as material supplied by the operating system.
Since an RU is only part of a program, it makes sense to use the RU format to represent anything that can be part of a program. In particular, the RU format will be used to represent shared libraries.
An RU is created at Link Time. When the RU is linked, the linker merges a number of object modules to form the RU. To reduce the amount of work that must be done in linking, the RU has the same format as an LD object module.
Since an RU only contains part of a program, it may contain references to items which are not found within the RU. Ideally, the linking process should resolve as many of these references as possible. The more references resolved at linking time, the fewer you have to resolve each time you run the RU.
We use the term instantiation for the point at which the contents of an RU are placed into virtual memory. This is not the same as "running" the program, since the contents of an RU may be placed into memory long before they are actually used. As we have said, an RU only contains part of a program, and the part that it contains may not be used for quite some time.
Since an RU only constitutes part of a program, a program may be put together from several RUs. These RUs can all be instantiated at the same time, or they can be instantiated as they are needed. For example, suppose a program occasionally needs to use an RU named X. If it usually doesn't need X, the RU loader may choose not to instantiate X when the rest of the program is invoked. Instead, it will only instantiate X if and when X is needed. This process is called dynamic linking
Before we go on, it will be useful to introduce a number of new terms and to clarify the meaning of known terms.
A partition has associated attributes. Some of these are the same as page attributes (writable, privileged, and so on). Partitions may also have attributes which are not available with hardware pages (e.g. sharable).
Segments can overlap with one another, if the corresponding segment descriptors frame overlapping areas of virtual memory. For a well-formed RU, however, the memory area associated with a segment must be entirely contained within a single partition. (You can have a segment whose memory area is the entire partition, but you cannot have a segment with memory in more than one partition.)
There are two types of segments: descriptor segments and operand segments.
Each RU is divided into three parts: the body, the external profile, and the internal profile.
The body provides the actual memory image that the RU will produce, plus all the information needed to instantiate it. The memory image is divided into partitions, which are the basic building blocks of the associated program. Each partition is separate from the other partitions, and the instantiation process may arrange partitions in any order.
The body specifies the size of each partition, plus any initialization values for the partition. The initialization values may include hardware segment type information and flags, if a partition contains descriptors.
The entire partition represents an area of virtual memory. When the partition is instantiated, the pages for this memory area will be given sufficient permissions for the attributes of the partition; for example, the pages will be writable if the partition is. However, the partition itself is just an area of memory. The processor will not let code access this memory unless the code has a segment descriptor that frames the memory.
For the purposes of RU loading and initialization, the RU loader can generate descriptors as needed, by using its privileges. On the other hand, a program obtains its initial set of descriptors via the relocation in the RU.
Partitions are independent of each other. The RU loader may arrange them in memory in whatever order happens to be convenient or efficient. When partitions are small, the RU loader may put several partitions into the same virtual memory page, provided that they all have the same attributes. This allows more efficient use of memory.
The RU loader may generate several descriptors framing disjoint areas of a partition. However, the partition is still treated as an "atomic quantity". If part of the partition is swapped in or out, the whole partition is swapped in or out.
A partition can be reduced either in size or in attributes to give a subpartition. This is analogous to the hardware shrink operation.
It is possible to create segment descriptors that only refer to a part of a partition, but you cannot have a segment that straddles two partitions. Segment descriptors referring to the same region of memory can have different attributes; for example, one segment descriptor may let you write to the memory, while another marks the memory as read-only.
An RU can define as many partitions as it wants. This gives LD's RU format an advantage over the stock GCOS8 format, which only provides four partitions:
The stock GCOS8 arrangement does not provide sufficient control over sharing; if you want to share one piece of data, you have to share all similar data. The stock arrangement also precludes demand segmentation (discussed later).
In practice, LD does not place more than one segment in each partition. This gives the RU loader the maximum freedom to organize partitions, thereby allowing more efficient use of memory.
Typically, the RU body specifies all the information needed to create the partitions of the RU, then all the information needed to initialize those partitions. The body may also contain references to items in other RUs; later on, we will describe how these are handled.
Linking between RUs takes place at the subpartition level. An RU that references a subpartition may generate further "sub-subpartitions", using descriptors that frame only part of a subpartition.
The body of the RU provides "handles" to all externally visible partitions, subpartitions and entry points. A handle is similar to a SYMDEF (but a SYMDEF is not necessarily a handle). External entities (for example, other instantiated RUs) use handles to get at partitions, subpartitions and entry points. These handles are the only access points that are available by name after an RU has been instantiated.
In addition to named entry points, an RU may have one unnamed entry, called the primary entry. Typically, an RU has named subpartitions/entries or a primary entry, but not both (although the format does not forbid it). The purpose of the primary entry is to give a point where execution of a program should start. In other words, it lets you "run the RU", without having to know a specific entry name.
The external profile of an RU is a kind of symbol table that provides information which can be used when linking other programs. For example, suppose that you're linking an RU named A, and A refers to an external function named B. When LD links A, LD is passed directives that indicate A refers to B. LD therefore searches through some set of RUs to find one that contains B.
The external profile of each RU describes its contents. LD will eventually find that function B begins at a particular offset within some partition in some RU. LD can use this information to resolve such references wherever they occur within A. (As we noted earlier, we want to resolve as many references from one RU to another at Link Time.)
When we say that references are "resolved", we do not mean that they will be completely resolved. Instead of being references to named functions or data objects, they are converted into references to offsets in partitions of other RUs. In our example above, the body of A will contain a reference (consisting of a descriptor and offset) to some point in a partition of the RU that contains B. When A is instantiated, this reference causes the instantiation of the RU that contains B.
The internal profile contains a symbol table and debugging information describing the body of the RU. This information is only relevant within the RU itself. It is only used by software like debuggers and the program that writes out dumps.
When a user invokes a program, the associated RU is instantiated. If the body of the associated RU references other subpartitions or entries, the RU loader searches for these entries and subpartitions within other RUs supplied by the user or within shared RUs recognized by the loader. Some of these RUs may already be in memory (for example, standard support software like the Operator Segment); others may need to be instantiated. Once the RU loader has instantiated any required RUs, the loader can resolve the references that caused the search in the first place.
To resolve one reference, the RU loader may have to instantiate an RU that contains other references that also need to be resolved. Thus one reference may require the instantiation of several RUs.
Each partition is instantiated as an area of virtual memory, in pages whose attributes are compatible with the partition attributes. The RU loader must record the primary entry (if it exists), plus any externally-visible entry descriptors. The names of these entry points are recorded so that they can be used in search rules (described later).
A program may specifically name the files that contain the RUs that it needs, or else these files may be located through search rules.
An instantiated RU may share material with other RUs. Sharing can take place between RUs put into execution by different users, or between RUs running simultaneously for the same user. Sharing reduces the total memory requirements for the system. It also makes possible a number of programming techniques that cannot be used when every program operates in its own separate environment.
Some people talk about "shared domains". However, on the DPS-8 and DPS-90, you do not share domains, you share segments (and thus you share the associated subpartitions). In essence, a subpartition can belong to several domains at once. It is incorrect to think of a subpartition as "belonging" to a particular domain.
If we must speak of ownership, it is better to speak of partitions being "owned" by RUs. A partition can only come into being two ways:
The ownership of a partition is mainly intended to control the deletion of the partition when the owner is finished with it.
A particular shared RU may have some partitions marked as sharable or unsharable. This marking is independent of the write permissions on the segment. When such an RU is instantiated into a shared working space, only the shared partitions are actually instantiated. The unshared ones are recorded, and the RU file is kept available for I/O. When any partition of such an RU is referenced from a user working space, all unshared partitions are instantiated and initialized in the user's working space (using information taken from the RU file). Any references from the shared RU are resolved at this point.
As noted earlier, sharing is done at the subpartition level. Making a subpartition available for sharing is easy: you simply store the parent partition in a location that is available to all programs. In hardware terms, this means that you access shared subpartitions using descriptors that refer to a working space register whose value doesn't change as programs are swapped in and out.
At instantiation time, a partition with sharable subpartitions cannot contain a reference (descriptor) to an unshared partition.
An entry definition implicitly contains a reference to a partition: the partition that contains the linkage segment defined by the entry descriptor. Thus, for instantiation purposes, resolving an entry reference implicitly resolves a partition reference as well.
A routine invoked through a shared entry cannot refer to an unsharable partition.
In order to make shared library units (SLUs) useful in the GCOS8 NS mode environment, it should be possible for outside groups to write SLU software in any supported language, using a programming style that is natural to that language. Without this requirement, it will be impossible for sites to make use of commodity software like the IMSL library. (IMSL is a library of mathematical and statistical routines.)
As a result, each SLU must have a fixed SEGID for its code and another for its static data. If an SLU does not have these fixed SEGIDs, a library routine will have great difficulty locating the static data and other library routines. (Certainly, most software could be written in a style that avoided such difficulties, but this style is not natural in most programming languages.)
There are only 1000 fixed SEGIDs available for use. Realistically, we should leave half of these for user libraries and routines. This leaves 500 for system-supplied software. Since each SLU requires one segment for its code and one for its static data, this makes it possible to have 250 shared libraries. Of these, a good number of segments (maybe half) are used up for system operations.
Each SLU has its own pair of fixed SEGIDs. Obviously, no two SLUs can have the same SEGID(s); if any two have a SEGID in common, a program will not be able to use the two libraries together. To avoid conflicts, a large number of the available SEGIDs should be reserved for known products (e.g. the IMSL library).
The first part of each code segment should be a collection of transfer vectors to the library's routines (in later parts of the segment). With this organization, it is easy to add new routines and recode existing ones transparently. It is also easy to discontinue support for old routines, by changing the transfer vector to jump to an appropriate handler routine.
The first part of each static data segment should contain externally visible data. Ideally, these should be pointers, so that they are less likely to need to be moved as the library is changed. Private data should come in later parts of the segment. This lets you change the locations of the library's private data without affecting the outward appearance of the software.
There are two ways in which a library can be changed: transparently and visibly. A transparent change is not visible to user programs--the interface to all existing routines remains the same, as does the type and location of all user-visible data objects. A simple bug fix is a common sort of transparent change. A visible change is one that changes the appearance of the SLU (for example, changing the interface to an existing routine).
A visible change makes it necessary to recompile all programs that use that SLU (if the change is not backwards compatible). Thus visible changes should be avoided whenever possible.
To accommodate change, every SLU should have two version numbers: one that describes the visible version and one that describes the actual version (including transparent changes). When a user program uses an SLU, it should only have to specify the visible version. The actual version is only of interest to those who are maintaining the library (for example, when they are trying to track down bugs).
More generally, it is desirable to let SLUs specify a range of version numbers (or a list, or some other mechanism) that tells which visible/transparent versions that the SLU supports. This allows the SLU to specify its degree of backward compatibility. To allow non-ambiguous identification of SLUs, it is a good idea to associate some kind of checksum with each library as an additional identifier.
The overall SLU version number should appear in part of the exported reference name of the SLU. This will only be updated when a non-upward compatible visible change is made.
There are two mechanisms that a program should have available when in wants to use an SLU.
When we say that a program requests access to an SLU, we do not mean that the programmer must code an explicit call. In most languages, the call is generated automatically when the program is linked, and put into the domain start-up code. An explicit call would only be needed in languages which can construct calls on the fly, and even in such languages, the call could often be generated by underlying support routines.
Note that the program specifies version numbers at the time that the library is brought in. Therefore, the routines of the library do not need to check version numbers.
Each SLU needs an initialization mechanism which is invoked when dynamic attaching takes place. This initialization code is responsible for the following:
The first 32K of every SLU code segment will be reserved for transfer vectors. The actual routines of the segment will be linked with LD, and the transfer vectors filled in appropriately.
When an SLU is linked, LD generates an appropriate run-unit, including an external profile naming all the routines in the SLU and references to any other libraries that the SLU depends on.
For example, a C version of the IMSL library would probably make calls to the standard C library. The external profile would be referenced when linking any program that called the SLU. It would also be used when creating a new version of the SLU, to make sure that every visible name is put in the same place as in the previous version.
Each SLU must have an associated version number. This is done by making version numbers part of each routine name. (Note: Readers may wonder why version numbers are not passed as arguments. Passing them as arguments just makes it that much harder to look at a routine and determine its version.)
Incorporating the version number into routine names (and into data names as well) makes it easy for one SLU to support several versions of the same routine: it just contains the different versions under their different names.
As an example of how you would support functions with version-specific names, suppose that a typical release contains enough routines to require 4K of transfer vectors. Then visible release 1 takes the first 4K, visible release 2 takes the next 4K, and so on. When the SLU finally decides it will no longer support visible release 1, the first 4K transfer vectors are freed for re-use. The version-checking mechanism makes sure that programs can no longer use the version 1 interpretations of those transfer vectors.
(Once an SLU contains a particular name associated with a location in the transfer vectors or the first part of the data segment, neither the name nor the location can be used by later releases, until support for the original release is discontinued.)
Using this scheme, a single SLU has no difficulty supporting several releases of the same routine. It is much easier and less bug-prone to use this approach, because it uses the same code as the previous release, not some attempted simulation. Also, supporting several releases eliminates the need to update a lot of code simultaneously; you can put in a new SLU without having to update a lot of old code.
An RU library is a single file that contains several RUs. For example, you might have an RU library that contains several versions of the Operator Segment.
Different versions of the same RU will have similar external profiles. For example, if you have several Operator Segments in the same RU library, they will all define routines like .CALL, .RETRN, and so on. The external profiles will not have unique entries, and therefore cannot be used as the "keys" for distinguishing different RUs in the library.
Instead, the keys have to be the names of partitions in the body of each RU. These must be unique. For example, if each RU has a version of the Operator Segment, Version 1.0 may have a partition named OPSEG1.0, Version 1.1 may have a partition named OPSEG1.1 and so on.
Having discussed the features that the RU format must support, we can proceed to describe the format itself.
Basically, the RU is a file made up of LD-style directives. This allows the RU to be read, written, and maintained by the same routines as LD files and libraries.
The internal profile consists of standard LD debug information directives. These directives may be converted into a more useful in-memory format at the time that the program is loaded under control of a debugger.
The body of the RU begins with a few LD information directives, stating that this is an RU and providing general information about the data (file name, create date/time, etc.). Specifically, we expect the following.
LC_LDVERSION LC_TARGET_INFO optional LC_FILENAME, LC_MODULE, LC_REVISION, LC_CPR, LC_TITLE in any order RU_LOCATOR
The RU_LOCATOR directive gives the seek address of the beginnings of the external profile and internal profile. (When we talk about a seek address here, we mean a byte offset from the beginning of the RU itself, not the RU library as a whole.) RU_LOCATOR is fully described in a later chapter.
The body of the RU is divided into two parts. The first part contains directives that create:
The directives used are RU_PARTITION, RU_SUBPARTITION, RU_ENTRY, RU_PRIMARY_ENTRY, RU_REFER, and RU_EXPORT. These are fully described in a later chapter.
There is no need to give each partition its own page or set of pages. The RU loader may put several partitions on the same page, provided that the attributes of all the partitions are the same. (We want to emphasize that attributes describe the hardware page that holds the partition, not the segments that frame all or part of the partition.) It is LD's job to ensure that the partition options are compatible with the segments that are placed in the partition.
The end of the first part of the body is marked with an
LD_MARKER
directive.
The second part of the body consists of directives that initialize the contents of the partitions. These are called RU_DATA and RU_RELOC. They are similar to the LD_DATA and LD_RELOC directives that initialize memory in LD files.
The end of the body is indicated with an
LD_END
directive.
The internal and external profiles follow the body of the RU. The profiles are made up of similar directives. The end of the RU is indicated by a zero directive (i.e. one consisting only of a null byte, followed by a zero length and checksum).
The instantiation process begins with a request to do one of the following:
Such requests are resolved by locating the appropriate RU to instantiate.
When the RU loader is asked to instantiate an RU in user space, it begins by going through all the partition information at the beginning of the body of the RU. These contain references to other RUs, which are also read and their partitions instantiated. Instantiation (and initialization later on) therefore takes place recursively.
The loader instantiates all the required partitions before initializing any of them. This means that all the necessary memory is allocated before any of the contents are laid down. This allows the location of partitions to be shifted around if necessary, without worrying about moving contents too. If your requirements exceed the amount of memory available, you find out at this step, before you've spent the time of initializing the memory.
Once all the partitions from all relevant RUs have been created in memory, the RU loader begins to copy the initialization data into the partitions. Relocation takes place at this time. Relocation can be performed quickly, because everything else is already in memory.
Once the partitions have been set up, the RU loader walks through the descriptor partitions, performing validity checks on the descriptors. For example, the validity check makes sure that the descriptors do not refer to privileged memory, and that descriptor segments (hardware descriptor types 1 and 3) do not frame parts of non-descriptor partitions. This is necessary, because users should not be able to create arbitrary descriptors by patching their RU files.
When all this has been done, the RU has been properly instantiated. Once the RU has been instantiated, the requested entry or partition is returned to the caller.
If the requested named entry point or subpartition belongs to a shared RU, only the unsharable partitions of the RU are instantiated in user space. The shared partitions are already in the shared working space. As with an unshared RU, any other RUs referenced by the shared RU are also instantiated.
Demand segmentation is based on the following set-up. A hardware descriptor has a bit that indicates whether or not the associated virtual memory is actually in core. If the bit is off, the memory is not present (for example, it's been swapped out). If you try to use such a descriptor to access the associated memory, you get a fault.
Now, there are several ways you can deal with this fault. In GCOS8, the fault handler's standard response to the fault is to abort the program. A fault handler that supported demand segmentation would bring the partition containing the memory back into core. It would then replace the original descriptor with one that has the appropriate bit turned on, indicating that the partition is now back in core, and it would update all other descriptors which framed parts of that partition. Execution then resumes, performing the same operation with the corrected descriptor. At some later time, the partition may be swapped out again and any descriptors into it are marked "not present" again.
When a desired partition is missing, it is not in memory anywhere. As we noted earlier, the operating system does its swapping using partitions rather than individual segments; therefore, if a segment is a missing, all segments in the same partition are also missing. When a partition is swapped out, descriptors to any parts of the partition must be marked missing "simultaneously"; similarly, when a partition is swapped back in, all relevant descriptors must be changed "simultaneously".
Dynamic linking is a somewhat similar concept, but is only expected to take place on a CLIMB instruction. It works with a dynamic linking descriptor. A dynamic linking descriptor takes the place of a normal descriptor. Four bits in the descriptor say "This is a dynamic linking descriptor." The other 68 bits contain additional information that we'll discuss shortly.
Every time you attempt a CLIMB, the hardware checks the four bits in the descriptor to see if it's a dynamic linking descriptor. If it is, the hardware triggers a dynamic linking fault, thereby invoking the associated fault handler. The operating system's standard fault handler examines the other 68 bits of the dynamic linking descriptor and figures out what happens next. Typically, those 68 bits indicate an RU that should be instantiated; for example, they can contain a pointer to a memory address that gives the name of the RU. In this case, the fault handler issues a call to instantiate the RU and then arranges for a CLIMB to the appropriate entry point in the RU.
When we say "arranges for a CLIMB", there are two ways it can be done:
For the sake of efficiency, snapping the link is desirable. However, there are a number of situations in which it is unwise. Thus the link should only be snapped if both the caller and callee agree to snap it. Part of the information for a dynamic linking descriptor must specify this information. There is therefore an "okay to snap" bit associated with each segment and entry point definition.
From the previous discussion, it should be apparent that there are several important differences between demand segmentation and dynamic linking. Once a dynamic link has been snapped, the linked- in material never disappears; however, a segment that has been swapped in through demand segmentation may be swapped out again at some later time. Even if a dynamic link has not been snapped, the target of the link operation may be somewhere in memory; in demand segmentation, the whole point is that the desired segment isn't present in memory.
It is possible for one partition to have several dynamic links to another partition. Some of these may be snapped while others are not. On the other hand, descriptors to a swappable segment are all valid (or invalid) at the same time.
In this section, we give details of the directives that describe RU (run-unit) constructs.
Before we do that, we want to discuss the concept of a logical descriptor. A logical descriptor serves the same purpose as a hardware descriptor: framing a block of memory. The difference is that logical descriptors are not restricted by the hardware descriptor format. The hardware restrictions only apply to the actual (program-accessible) descriptors created by the relocation operations (which will themselves be remarkably similar to the operation of RU_SUBPARTITION and RU_ENTRY described below). For example, a logical descriptor framing a partition need not be limited to the one megabyte maximum size that applies to a type 0 descriptor.
An RU directive that defines or references a new entity (e.g. an entry point, partition, or subpartition) implicitly reserves an RUREF for that entity. RUREFs are similar to the SEGREFs that are used in normal LD object files. RUREFs are simply integers which are assigned sequentially. Once an RUREF has been defined, subsequent RU directives use the RUREF to refer to the associated entity.
Each RUREF is associated with a logical descriptor that refers to the same entity.
The RU_REFER directive indicates that the RU references some external entity. The directive takes the same form as the LD_REFER directive:
{R}{flags}"name"
where
The reference to "name" is resolved using the standard search rules. During instantiation, an RU_REFER may result in the instantiation of other RUs if necessary.
RU_REFER automatically reserves an RUREF for the entity with the given name. It also generates a logical descriptor. This descriptor is a copy of the logical descriptor for entity that is found in the RU that actually contains (defines) "name".
Possible flag values are listed below, with their values:
The RU_PARTITION directive creates a partition. It has the form:
{/}<size><alignment>{flags}
where
RU_PARTITION automatically reserves an RUREF for the partition. It also creates a logical descriptor that frames the entire memory area for the partition. The {flags} argument describes the attributes needed in the page that will hold the partition. The logical descriptor specifies the most permissions possible for the partition, restricted by options given for the page that holds the partition.
The possible flag values for partitions are represented by symbolic names, each of which begins with RUF. Below we list these flags and their values.
RUF_WRITE and RUF_PRIVILEGED effectively describe the attributes of the page table entry using the write control bit and housekeeping bit. These bits will also be used in constructing descriptors for the associated segments.
The RU_SUBPARTITION directive defines a subpartition of a partition. The directive has the following form:
{<}<size><offset><parent>{pflags}
where:
An RU_SUBPARTITION directive may have a flag that is not available for RU_PARTITION:
If RUF_NEW_SIZE is not specified and <offset> is zero, the newly created subpartition represents the same memory area as the parent.
As is probably obvious, RU_SUBPARTITION effectively describes a shrink operation. RU_SUBPARTITION implicitly reserves an RUREF for the subpartition being defined.
The RU_EXPORT directive is similar to the LD_CRSEG directive in a normal LD file. It creates a name that should be made available for other RUs to reference. This may be the name of a partition or subpartition.
The directive has the format
{S}{flags}<RUREF>"name"
where
The {flags} must include the LF_GLOBAL flags from the LD object format. They may also contain LF_SECOND.
When an RUF_STATIC_LINK flag is used in an RU_EXPORT directive, it indicates that the static links to the name are allowed (e.g. non-dynamic references or dynamic references that may be snapped).
The RU_ENTRY directive defines an entry point. In so doing, it creates an entry descriptor. The directive has the following form:
{:}<LSR_ref><iseg><ic>
where
Every RU_ENTRY directive reserves an RUREF for the entry.
The RU_PRIMARY_ENTRY directive tells which entry (if any) should be considered the primary entry in an RU that contains a normal program. The directive has the form
{@}<entref>
where
The RU_DATA directive is analogous to the LD_DATA directive for LD files. RU_DATA specifies data for a partition. It has the form
{L}<org_seg><offset>{waste_len} {waste}...{data}...
where
We expect that most RU_DATA directives will have the extended record format described earlier in this manual.
The RU_RELOC directive corresponds to the LD_RELOC directive used to relocate data in LD files. It has the format
{O}<partition>triplets
The <partition> argument is a Dvalue giving the RUREF of the partition being relocated. After this come relocation triplets, each of which tells how to relocate items that appeared in the preceding data.
The only type of relocation that is used at this point is probably "descriptor relocation", creating a descriptor that will be stored in a descriptor partition. For this reason, there is very little relocation necessary: normally only one relocation for each descriptor in the linkage segment (maybe 1000 relocations in all). Thus we don't have to worry about the cost of deciphering Dvalues; we just won't be doing it that often.
A relocation triplet consists of three values:
Together, the offset and the RUREF specify a location in the RU. The instantiation process is supposed to store an appropriate descriptor at this location.
Before relocation takes place, the RU holds 72 bits of data in the location that is supposed to hold the descriptor that is being generated. This 72 bits, plus the RUREF labelled (c) above, provide enough information to perform the relocation. The 72 bits contain the following information:
The program loader will build a descriptor with the given type and attributes, framing the referenced entity. The type must be compatible with the type of the entity; for example, only types 8, 9, and 11 are allowed if the referenced entity is an entry.
The RU_LOCATOR directive makes it possible to find the important parts of an RU in an RU file. The directive has the form
{>}(ext_prof_seek)(int_prof_seek)
The two arguments are byte seek addresses relative to the beginning of the RU. When a file contains a single RU (as opposed to RU libraries), these will be actual seek addresses. The (ext_prof_seek) tells where to find the external profile, and the (int_prof_seek) tells where to find the internal profile. A zero value for either of these indicates that the corresponding section does not exist.
Below we summarize the various directives of the LD object format. The LENGTH and CHK bytes have been omitted. The following notation is used:
{byte} (ulong) "string" 'time' <Dvalue> [TWORD]
(Definitions of the above concepts are given in Chapter 1.) Directives that implicitly generate a REF (SEGREF, TREF, VLREF) are marked as such.
The following flags are used in LD_CRSEG, LD_NAME, and LD_REFER.
GLOBAL == 1 COMMON == 2 SECONDARY == 4
Below we list the directives that are only found in run-units.
This appendix describes constructs that have appeared in earlier versions of the LD object format, but are now considered obsolete.
The LC_SECONDARY sub-directive of the LD_CONTROL directive has the format
{S}"symbol_name"
where {S} is the ASCII character 'S' and "symbol_name" is a string giving the name of a symbol defined or referenced elsewhere in the module. If the symbol is defined in the module, LC_SECONDARY indicates that it should be a secondary SYMDEF; if it is just referenced, it is a secondary SYMREF. In C program, the information is obtained from a #pragma secondary preprocessor directive.
Instead of using LC_SECONDARY, the current version of the LD format marks symbols as secondary using the "flags" argument of the directive that defines or references the symbol.
This appendix describes the ways in which we intend to change the LD object format in the near future. We also list some enhancements which are desirable but not yet fully designed.
The LD_DELETE directive will have the format
{D}<segref>
This directive removes an existing segment with the given SEGREF from the LD symbol table. If the symbol with the deleted name appears again in the object code, it will be taken as a reference to a new symbol.
LD_DELETE can be used to generate a linked list (e.g. of initialization code) at link time. For example, suppose some module refers to a symbol named "list". A subsequent module can then issue the following sequence of directives.
LD_NAME "list" LD_DELETE "list" LD_REFER "list"
The LD_NAME directive defines "list" at the current location. The LD_DELETE then deletes the "list" symbol, and the LD_REFER creates a reference to another instance of "list" defined in some subsequent module. Thus the previous module's reference to "list" will be resolved to the "list" defined in this module, but this module's references to "list" will be resolved to a "list" in a future module. When all the modules are linked together, the set of things known as "list" will turn into a linked list.
There needs to be some way to specify a machine-dependent version number for an LD file. This is required because relocation codes on a machine may change from version to version.
There must be some way to record what produced an LD object module originally (e.g. a C compiler, or the YAA assembler.)
The debug information structure must be supplemented. At present, the debug directives cannot represent everything that LD's internal structures can; they are not capable of supporting some data types from some programming languages; and they cannot keep track of combined debugging tables obtained by linking several modules.
Some relocation codes need to be split up. For example, on the DPS-8 we need to distinguish between adding a word offset to the bottom 18 bits of a word and adding the word offset to the entire 36-bit word.
It would be useful to have versions of several directives (e.g. LD_NAME) which took offset sizes in bits rather than bytes or TWORDs.
This appendix examines features that depend on the intended target machine.
The Bull HN DPS-8 machine family includes the Bull HN DPS-88 and DPS-90 hardwares. Operating systems running on this hardware include G.E.'s MARK III, and Bull HN's GCOS8 (SS and NS mode) and CP6. On this family of machines, a byte is 9 bits long and a TWORD is 36 bits long (one machine word).
Possible relocation codes for LD_RELOC directives are identified by the following keywords.
The LD_SYMOPTS directive lets you specify attributes for a symbol. Attributes are specified by strings. Some strings have the form +word; these turn an option or bit on. Others have the form -word; these turn an option or bit off.
The following options can be used to set or clear NSA segment flags:
+read +write +save +cache -read -write -save -cache +extended +execute +privileged +accessible -extended -execute -privileged -accessible +bounded -bounded
You can also set segment flags with either of the following.
The default segment flag settings are 0553, which means
+read +save +cache +executable +bounded +accessible
By default, segments will be taken to be Type 0; this can be changed with the following LD_SYMOPTS option.
Many of the other LD_SYMOPTS options influence the behavior of the LD output writer which creates run-units. Loosely speaking, this output writer attempts to reduce the number of symbols, by putting eligible symbols inside other segments. Various symbol options are required to control this process. Below, we list the relevant options.
The RU output writer uses all this information to bind the segments of the program into separate hardware segments. Segments with the segment property are hardware segments. Segments without the segment property are bound with +flexible segment segments wherever possible; the +first32k flag is taken into consideration as this binding takes place.
The remaining symbol options are needed for proper handling of Bull HN OM object modules.
The LD software family is distributed with a library of routines that can be used to manipulate LD object files. This appendix briefly describes these routines, using prototypes of the C programming language to show how the routines are called.
The LD utility routines work with at most two files: an input file and an output file. The input file is assumed to be in standard LD format. The output file will be written in standard LD format.
In general, routines that operate on the input file have names that begin with lr_. Routines that operate on the output file have names that begin with lw_.
It is possible to read and write on the same file, provided that it is an LD library. To do this, the library should be opened with the lw_lopen routine. A library opened in this way will always be in read mode or in write mode. When it is in read mode, the library is used as the input file, and you can have a second file for the output file. Similarly, when a library is in write mode, you can have a different file as the input file. The functions lw_lwrite and lr_lread can switch a library from read mode to write mode, and vice versa. However, you cannot switch to write mode if you already have an open output file, and you cannot switch to read mode if you already have an open input file.
An non-library output file (opened with lw_open) can be changed to the input file with a call to lw_reread". However, it cannot be changed back.
The LD utility routines do not allow error recovery. If a routine encounters an error of some sort, it simply prints out an appropriate message and terminates the calling program. The one exception to this rule is lr_test, which returns a status if it fails to open a file.
Several C typedef definitions are used to define types used by the LD utilities. These are
ld_target_word -- unsigned integer big enough to hold a TWORD ld_dvalue -- signed integer big enough to hold a Dvalue
The routines described in this section perform simple file manipulations.
#include <ld.h> lw_open(filename,reloc_align,bits_per_byte, most_reloc,longest_reloc, origin_bits,target);
lw_open opens a file as the LD output file. It also outputs appropriate LC_LDVERSION and LC_TARGET_INFO directives as the first two directives of the file.
#include <ld.h> lr_open(filename,min_version,max_version,target);
lr_open opens a file as the LD input file. It also checks that the LD version number of the input file falls into the range specified by min_version and max_version. If a non-NULL target argument is specified, lr_open verifies that this target machine matches the one named in the file's LC_TARGET_INFO directive (if any).
After this checking has taken place, lr_open positions the input file at seek position zero. Thus the first directive that the program will find is the LC_TARGET_INFO.
#include <ld.h> ret = lr_test(file,minversion,maxversion,target);
lr_test is similar to the lr_open function in that it attempts to open a file as the LD input file. The difference is that lr_open terminates the program if the file's version number falls outside the given range, or if the file's target machine name does not match the target argument. If lr_test fails, it just returns zero.
#include <ld.h> lw_lopen(file,clearflag,index_seek,first_module);
lw_lopen opens an LD library for updating and reads the LC_TARGET_INFO and LC_LIB_HEADER directives that appear at the beginning of the library.
The file is accessed for both reading and writing, but it will begin in read mode at seek position zero. To begin writing on the file, use lw_lwrite.
If the file is cleared, there will be no LC_TARGET_INFO directive. Such a directive will be inserted by lw_lclose, using values taken from object files copied by ld_copy.
#include <ld.h> lw_close();
lw_close writes a null directive on the current LD output file (marking the end of the file), then closes the file.
#include <ld.h> lr_close();
lr_close closes the current LD input file.
#include <ld.h> lw_lclose(index_seek,first_module);
lw_lclose closes a library that is currently being used as the LD output file. This file should have been opened by lw_lopen and changed to write mode with lw_lwrite. The current write position in the library MUST be positioned at the end of the index. This routine will append the required NULL directive to the end of the library header, but not to the end of the index.
#include <ld.h> lw_lwrite();
lw_lwrite changes a library from read mode to write mode. The seek position in the library does not change. The library must have been opened with lw_lopen.
#include <ld.h> lr_lread();
lr_lread changes an LD library from write mode to read mode. The seek position in the library does not change. The library must have been opened with lw_lopen.
#include <ld.h> lw_reread();
lw_reread changes the current LD output file into the current input file. The file is positioned so that reading takes place at the beginning of the file.
#include <ld.h> lr_reread();
lr_reread repositions the current input file back to the beginning of the file.
#include <ld.h> lw_seek(pos);
lw_seek moves to a new position in the current output file. lw_seek can only be issued when the current write position is the first byte of a directive, and it can only be used to move to the first byte of another directive.
#include <ld.h> lr_seek(pos);
lr_seek moves to a new position in the current input file. lr_seek can only be issued when the current read position is the first byte of a directive, and it can only be used to move to the first byte of another directive.
#include <ld.h> pos = lr_tell();
lr_tell obtains a value representing the current read position in the LD input file. This can later be passed to lr_seek to return to the position.
#include <ld.h> pos = lw_tell();
lw_tell obtains a value that represents the current write position in the LD output file. This value can be used in subsequent calls to lw_seek to come back to this same position.
The following routines all help you build an output directive.
#include <ld.h> lw_start(code);
lw_start is the first step in writing out a directive. After this, you use other lw_ functions to write out the various fields of the directive. You do not have to calculate the length or the checksum of the directive -- these are calculated when you call lw_end to close off the directive you have built.
You do not need to create your own LD_DATA and LD_RELOC directives with this routine. Instead, you should build these directives using lw_dword and lw_reloc. The LD utilities will write out appropriate LD_DATA and LD_RELOC directives as they accumulate.
#include <ld.h> lw_end();
lw_end indicates that you have finished building a directive. lw_end will calculate the length and the checksum for the directive just produced, fill in these fields, and write out the completed directive.
#include <ld.h> lw_word(tword);
lw_word writes a TWORD value to the output directive that is currently being built.
#include <ld.h> lw_byte(byte);
lw_byte writes a byte to the output directive that is currently being built.
#include <ld.h> lw_dvalue(dval);
lw_dvalue writes a Dvalue to the output directive currently being built.
#include <ld.h> lw_data(ptr,length);
lw_data writes a block of data to the output directive currently being built.
#include <ld.h> lw_string(ptr);
lw_string writes a string to the output directive currently being built.
#include <ld.h> lw_ulong(ul);
lw_ulong writes a ULONG value to the output directive that is currently being built.
#include <ld.h> #include <time.h> lw_time(tim);
lw_time converts a C time number into an LD time and writes out the time to the output directive currently being built.
The following routines are used to construct paired LD_DATA and LD_RELOC directives. They should not be called if there is already a partly built output directive (i.e. if you have called lw_start to begin a directive, but have not called lw_end to end it).
#include <ld.h> lw_dword(tword);
lw_dword is used when creating an LD_DATA/LD_RELOC directive pair. The LD utility functions let you create the LD_DATA directive and its associated LD_RELOC directive "simultaneously".
The first step in the process is to indicate the segment and offset where the data should be placed. This is done with external variables declared as
extern ld_dvalue lw_segment; extern ld_dvalue lw_origin;
Assign the appropriate SEGREF to lw_segment and the offset of the data to lw_origin.
Next, call lw_reloc to specify relocation information for the data. The first argument of lw_reloc gives the SEGREF of a relocatable symbol. The second argument gives the desired relocation code. If there are several relocations to be applied to the data, issue all of them via separate lw_reloc function calls.
Finally, issue an lw_dword call for the TWORD you want to write out. This call will increment lw_origin automatically, so you don't have to adjust lw_origin if you are going to output data to the next TWORD position in the same segment.
Information produced via lw_reloc and lw_dword will be accumulated as it is produced. Just before closing the file, you should call lw_flush to flush the accumulated data and relocation information, producing LD_DATA and LD_RELOC directives.
#include <ld.h> lw_reloc(segref,reloc_code);
lw_reloc writes out relocation information about relocatable data. For further information, see the description of lw_dword.
#include <ld.h> lw_flush();
lw_flush is used to flush accumulated data and relocation information prior to closing off an object module.
#include <ld.h> lw_lend(segref);
lw_lend is used to flush and close off a literal segment. The literal segment should have begun with an LD_LITERAL directive which implicitly claimed a SEGREF for the literal segment. Then came calls to lw_dword and ld_reloc to create the contents of the literal segment. lw_lend flushes the accumulated data and relocation information (if necessary), and writes out an appropriate LD_END_LITERAL directive to end the literal definition.
The following routines all obtain information from the input LD file. The read routines handle continuation directives properly, so that you can just keep reading into the continuation directive. However, if you read past the end of a directive and there is no continuation, the read routines will return garbage. The lr_eor function lets you determine when you have reached the end of a directive.
#include <ld.h> code = lr_getdir();
(LD_CONTROL << 8) | LC_code
where LD_CONTROL is '#' and LC_code is the code of the sub- directive.
lr_getdir reads a new directive from the LD input file. Subsequent lr_ functions can be used to read fields from the directive.
If the previous directive has been completely read, lr_getdir verifies its checksum. Otherwise, lr_getdir skips over whatever is left of the previous directive and goes to the beginning of the next.
#include <ld.h> byte = lr_byte();
lr_byte reads a byte from the input file.
#include <ld.h> ul = lr_ulong();
lr_ulong reads a ULONG value from the current LD input file.
#include <ld.h> lr_data(ptr,length);
lr_data reads a block of data.
#include <ld.h> str = lr_string();
lr_string reads a string from an LD file and returns a pointer to that string. The length of the string is based on the number of bytes that remain in the directive being read. Space for this string is obtained using malloc.
lr_string adds the usual '\0' to mark the end of the string.
#include <ld.h> str = lr_sstring();
lr_sstring is exactly like lr_string except that it stores the input string in a static location instead of obtaining space with malloc. Each call to lr_sstring will overwrite the string obtained by the previous call. For further information, see the description of lr_string.
#include <ld.h> lr_vstring(ptr);
lr_vstring reads a string from the LD input file and stores it in the given VString. The result can then be manipulated by the VS_ routines in the utility library.
#include <ld.h> dval = lr_dvalue();
lr_dvalue reads a Dvalue from the current input file.
#include <ld.h> tword = lr_word();
lr_word reads a TWORD from the current LD input file.
#include <ld.h> #include <time.h> tim = lr_time();
lr_time reads a time value from the current LD input file and converts it to the time_t format recognized by standard C functions.
#include <ld.h> length = lr_length();
lr_length tells how many bytes have not yet been read in the current record. This will always be a positive integer, except for records which are continued by LD_CONTINUATION records. In this case, lr_length returns the negative of the number of bytes left in the record, to indicate that the record continues on to a new record.
Note that the correct length is returned for extended records.
#include <ld.h> bool = lr_eor();
lr_eor determines whether or not you have reached the end of the current record in the input file.
The following routine performs general LD file operations.
#include <ld.h> ret = ld_copy(length);
ld_copy copies the given number of bytes from the input file to the output file. It should only be used to copy complete directives.
ld_copy checks that the target machines of the two files match, but does no other checking. For example, it does not check to see if the checksums are correct.
In this appendix, we will examine the work that LD output writers do in preparing various output formats. Most of these formats are system-specific, and will be marked as such.
Before writing an LD file, all relocation in the LD object code is simplified so that the triplet in the LD_RELOC directive will always refer to a SYMREF or a segment created by an LD_CRSEG directive.
Next, all LPTR_RELOC relocations are replaced with ADDR_RELOC relocations wherever addressability is guaranteed. (See Appendix D for more discussion of LPTR_RELOC.)
Finally, segments containing no global definitions are bound to their parents. Other segments cannot be bound, since further linking may grow the segment.
The RU format uses the term SEGREF to mean something different from LD's SEGREFs. To avoid confusion, we will refer to RU's type of SEGREF as an RUREF and keep the term SEGREF itself to mean an LD SEGREF.
The output writer begins by resolving undefined SYMREFs to NULL. Undefined segments become global RUREFs. Undefined "entry" objects become ENTREFs.
Next, the output writer must determine the RU's primary entry point. If there is an Entry=name option on the LD command line, the given name will be used. Otherwise, if there is an "entry" definition starting with "......" (six dots), this will be used. Otherwise, the first primary entry definition will be used. All other "entry" definitions and "segment" names become local to the RU. The name of the RU is taken from the "Name=name" option of the LD command line; if there is none, the default name is used (see "expl ld").
LD then creates D$LINKAGE_SEG. In it, LD places the descriptors for defined segments and entries, plus slots for ENTREFs and RUREFs. Certain "known" segments are given specific SEGIDs (i.e. slots in the linkage segment); these are
Name SEGID Linkage Segment Offset ---- ----- ---------------------- D$NULL_DESC 6000 octal 0 D$PRIVILEGE_SEG 6001 octal 2 D$SLIS 6002 octal 4 D$LINKAGE_SEG 6010 octal 20
All offsets above are given in words. NOTE: Except for D$LINKAGE_SEG, LD does not define these -- it just references them. This, module(s) in the input must supply a definition and appropriate initialization for the other segments.
The RU writer will implicitly bind non-"segment" segments to "segment +flexible" segments as long as they fit. It will then create additional unnamed "segment +flexible" segments to hold any remaining unbound non-"segment" segments.
All literals are expanded, i.e. replaced with an appropriate segment bound to the LD segment that contains the appropriate literal pool.
At present, all debugging information is discarded. In future releases, the RU writer will create the debug schema required by DSSV.
First, any data in a "segment" segment is moved to an LD-created child, since OM format forbids placing data directly in a "segment".
All "segment" definitions become OM SEGDEF directives. Every segment reference purely for relocation becomes an SSREF directive. If a segment definition has no parent, or its parent is a "segment", the definition becomes a SECTDEF.
Every "constant" reference becomes a CONREF. Every "constant" definition and symbol whose value is not relocatable becomes a CONDEF.
Every "entry" definition is replaced by an ENTDEF and ENTREF. Every "entry" reference is replaced by an ENTREF.
All other global references are replaced by SYMREFs, and all other global definitions are replaced by SYMDEFs.
Other segment definitions are folded into their parent segments, since OMs forbid multi-level nesting.
All literals are expanded, i.e. replaced with an appropriate segment bound to the LD segment that contains the appropriate literal pool.
Triplets with LPTR_RELOC relocation are changed to ADDR_RELOC or an LDP opcode.
Unnamed objects in the output are given dummy names of the form "#N", where N is a decimal number. Local objects in the output are given dummy names of the form "original_name#N", where "original_name" is the original name of the object, and N is a decimal number that distinguishes objects with the same name but from different modules.
The symbol tables from the first input module having debug directives will become the local segment named "(SCHEMA)".
The following design issues have yet to be resolved.
While LD ensures that a sharable subpartition does not contain a reference (descriptor) to an unshared subpartition, the RU loader must verify that this is the case. We do not want users to be able to break security by patching their RUs.
Similarly, a routine invoked through a shared entry cannot refer to an unsharable partition. This must be verified by the RU loader.
Creating breakpoints in shared code for the purpose of debugging is a tricky process. We see three alternatives:
If Bull does not want to support dynamic linking at this time, we recommend the following:
In this way, you can prevent dynamic linking until you are ready to support it.
Dynamic linking can be partly simulated by replacing dynamic linking descriptors as they are encountered during instantiation. This replacement amounts to snapping the link, as described earlier. Therefore it can only be done safely if the caller and the callee both agree that the link can be snapped.