LD Object Format Reference Manual

Thinkage Ltd.
85 McIntyre Drive
Kitchener, Ontario
Canada N2R 1H6
Copyright © 2008 by Thinkage Ltd.

1. General Principles
    1.1 The Purpose of LD
        1.1.1 Reading Object Code
        1.1.2 Linking and Editing Object Code
        1.1.3 Writing Object or Executable Code
    1.2 Relocation
    1.3 Standard Record Format
    1.4 Directive Types
    1.5 Dvalues
    1.6 Other Terminology
    1.7 Notation
    1.8 Continuation Directives
    1.9 Extended Records
    1.10 The End of a Module
2. Segments
    2.1 Predefined Segments
    2.2 Reference Numbers (SEGREFs)
    2.3 Creating New Segments
    2.4 Naming Locations in Segments
    2.5 Alignment
    2.6 Segment Information
    2.7 Symbol References
    2.8 System-Dependent Characteristics
    2.9 Other Segment Manipulation Directives
3. Control Directives
    3.1 LC_LDVERSION
    3.2 LC_FILENAME
    3.3 LC_REVISION
    3.4 LC_CPR
    3.5 LC_TITLE
    3.6 LC_MODULE
    3.7 LC_TARGET_INFO
    3.8 LC_LIB_HEADER
    3.9 LC_PATCH
    3.10 LC_PATCH_RECORD
4. Data
    4.1 Data Definitions
        4.1.1 The Relocation Origin
        4.1.2 Relocation Codes
        4.1.3 In Practice
    4.2 Bit Data
    4.3 Literals
        4.3.1 Creating Literals
        4.3.2 Creating a Literal Pool
5. Debugger Directives
    5.1 Defining Variable Lists
    5.2 Specifying Type Information
    5.3 Pointer Types
        5.3.1 Type Qualifiers
        5.3.2 Function Types
        5.3.3 Array Types
        5.3.4 Structure Types
        5.3.5 Bit Fields
        5.3.6 Union and Enumerated Types
        5.3.7 Typedef Declarations
    5.4 Describing Symbols
    5.5 Names with Associated Scopes
    5.6 Setting Scope Attributes
    5.7 Line Numbers
    5.8 Debug Directive Summary
6. Final Object Format
    6.1 The Marker Directive
7. Object Libraries
    7.1 Library Length
    7.2 Library Index Directives
        7.2.1 Number of Modules
        7.2.2 Module Location
        7.2.3 Specifying Names
        7.2.4 Index Entries
        7.2.5 Module Information
8. Introduction to Run-Units
    8.1 Glossary
9. Parts of an RU
    9.1 The Body of an RU
        9.1.1 Information in the Body
    9.2 The External Profile
    9.3 The Internal Profile
10. Program Invocation
11. Sharing
    11.1 Sharing Subpartitions
    11.2 Sharing Entries
    11.3 Shared Library Units
        11.3.1 Library Versions
        11.3.2 Using an SLU
        11.3.3 Building an SLU
        11.3.4 Naming Routines
12. RU Libraries
13. RU Format
    13.1 LD Directives
    13.2 The Instantiation Process
14. Dynamic Linking and Demand Segmentation
    14.1 Dynamic Linking
15. LD Directives for Run-Units
    15.1 RU_REFER
    15.2 RU_PARTITION
        15.2.1 Possible Partition Flags
    15.3 RU_SUBPARTITION
    15.4 RU_EXPORT
    15.5 RU_ENTRY
    15.6 RU_PRIMARY_ENTRY
    15.7 RU_DATA
    15.8 RU_RELOC
    15.9 RU_LOCATOR
Appendix A: Summary of Directives
    A.1 RU Directives
Appendix B: Deprecated Constructs
    B.1 LC_SECONDARY
Appendix C: Future Directions
    C.1 LD_DELETE
    C.2 Needed Enhancements
Appendix D: Target Machine Dependencies
    D.1 Bull HN DPS-8 Family
        D.1.1 Relocation Codes
        D.1.2 DPS-8 Symbol Options
Appendix E: LD Utility Routines
    E.1 General Concepts
    E.2 File Manipulation
        E.2.1 Open an LD Output File
        E.2.2 Open an LD Input File
        E.2.3 Conditionally Open an LD Input File
        E.2.4 Open an LD Library for Updating
        E.2.5 Close LD Output File
        E.2.6 Close LD Input File
        E.2.7 Close an LD Library
        E.2.8 Change Library to Write Mode
        E.2.9 Change Library to Read Mode
        E.2.10 Change Output File to Input File
        E.2.11 Reposition Input File
        E.2.12 Change Position in LD Output File
        E.2.13 Change Position in Input File
        E.2.14 Obtain Current Position in Input File
        E.2.15 Obtain Current Position in Output File
    E.3 Building Output Directives
        E.3.1 Start Building a Directive for Output
        E.3.2 Close Off Output Directive
        E.3.3 Write a TWORD
        E.3.4 Write a Byte
        E.3.5 Write Dvalue
        E.3.6 Write Block of Data
        E.3.7 Write String
        E.3.8 Write a ULONG
        E.3.9 Write a Time Value
    E.4 Generating Data
        E.4.1 Write Out Data for LD_DATA
        E.4.2 Write Relocation Information
        E.4.3 Flush Data and Relocation Information
        E.4.4 Flush and Close Literal`
    E.5 Reading from the Input File
        E.5.1 Read Directive from Input File
        E.5.2 Read a Byte
        E.5.3 Read a ULONG from LD Input File
        E.5.4 Read a Block of Data
        E.5.5 Read a String from LD Input File
        E.5.6 Read Static String from LD Input File
        E.5.7 Read VString from LD Input File
        E.5.8 Read a Dvalue from LD Input File
        E.5.9 Read a TWORD from LD Input File
        E.5.10 Read a Time from LD Input File
        E.5.11 Obtain Length of Remaining Record
        E.5.12 Test for End of Directive
    E.6 Miscellaneous Routines
        E.6.1 Copy Data from Input File to Output File
Appendix F: Output Formats
    F.1 LD Format (Bull HN DPS-8)
    F.2 RU Format (Bull HN DPS-8, GCOS8 NS Mode)
    F.3 OM Format (Bull HN DPS-8, GCOS8 NS Mode)
Appendix G: Outstanding Issues
    G.1 Sharing Subpartitions and Entries
    G.2 Debugging Breakpoints
    G.3 Dynamic Linking

1. General Principles

LD object code is a stream of bytes. Difficulties immediately arise if the size of a byte on the host machine is not the same as the size on the target machine. Therefore each byte in the object file only uses the number of bits that can be contained in the smaller byte size of the two machines.

For example, suppose the host machine has 9-bit bytes while the target machine has 8-bit bytes. The LD object code on the host machine will only use the (low order) eight bits of every byte; the uppermost bit is always ignored. This avoids complications if the object file is shipped to the target machine.

From this point onward, the byte size of an object file should be thought of as the number of significant bits in a byte.

1.1 The Purpose of LD

Large programs are easier to work with if they are split over several source files. Such files may be compiled or assembled separately, producing an object file for each source file. When all source files have been compiled, the resulting object files may be linked together to form the complete program. Later in the chapter, we'll discuss this linking process in more detail.

An object library is a single file containing a number of object files, stored in a convenient and compact way. Libraries reduce the amount of disk space needed to store a program's object code and simplify the task of linking the program (since you only have to specify a single library file rather than a lot of separate object files).

The LD program has three phases:

Reading object code;
Linking or editing object code;
Writing out the result.

We'll examine each phase in detail.

1.1.1 Reading Object Code

LD can read several different kinds of object code. It can read a single object file stored in the LD object format that is described in this document; and it can read an object library containing LD object code (usually called an LD library). The LD object format is system- independent.

LD can also read system-dependent object formats. The GCOS8 version of LD can read so-called OM files, OM libraries, B* files, and B* libraries. The MARK III version of LD can read MARK III object code and libraries.

As LD reads object code, it converts the code to the LD object format if the code is not already in that form. LD can read many input files and libraries as input. These are specified on the LD command line as in

          ld file1 file2 lib1 lib2 ...

LD reads the entire contents of such files and puts them together into a single large unit that is passed on to the next phase of the LD program.

1.1.2 Linking and Editing Object Code

Object code contains several types of information. Typically, it will contain executable code and descriptions of data. It may also contain debugging information. The linking and editing phase of LD collects object code from object files and libraries and merges it into a coherent whole.

For example, compilers tend to create debugging information "on the fly", as the source code is compiled. As a result, debugging information is scattered throughout the whole object file. The linking and editing phase of LD can gather the scattered debugging information and store it all together. This makes the information easier to find later on.

The linking and editing phase also does as much relocation as it can. Relocation is discussed in more detail later in this chapter.

The result of the linking and editing phase is LD object code in a more coherent format, with all of the input object code merged into a single unit. This unit will be written out in the third phase.

1.1.3 Writing Object or Executable Code

The third phase of LD writes out object code or an executable program. LD can write a single object file stored in the LD object format that is described in this document; and it can write an LD object library.

LD can also write system-dependent object and executable formats. The GCOS8 version of LD can write bound OM files, OM libraries, B* files, B* libraries, Q* files, and GCOS8 run-units. The MARK III version of LD can write MARK III object code and libraries, and MARK III run-units.

1.2 Relocation

LD object code consists of symbol definitions (SYMDEFs) and symbol references (SYMREFs). For example, if a source file defines a subprogram named X, the object code that results from compiling that source file will contain a SYMDEF for X. If a source file contains a call to the subprogram X, the object code that results from compiling that source file will contain a SYMREF to X.

A typical object file has many SYMREFs to symbols not found in the file: a SYMREF for every external subprogram and variable that the code uses. Such SYMREFs are said to be unresolved, because they refer to symbols whose location is not currently known.

When a program is loaded for execution, the program loader chooses where each symbol will be placed in the computer's memory. The loader has a great deal of freedom in doing this: for example, the independent subprograms of a program may be arranged in any order. As a result of this freedom, the memory location of a symbol can only be determined when the program is loaded.

Many machine instructions in the executable code make use of memory locations. For example, the process of calling a subprogram must contain a machine instruction which jumps to the beginning of that subprogram. Since the location of the subprogram can't be determined until the program is loaded, the correct form of the jump machine instruction cannot be determined either. The object code can only contain a partial form of the jump machine instruction, with the understanding that the location of the subprogram will be filled into the partial instruction when the program is loaded.

Object code therefore contains many partial machine instructions. Object code must also contain any information that the program loader needs to allow the machine instruction to be completed. This information almost always involves a SYMREF.

For example, consider the "jump to subprogram" machine instruction that we have been discussing. The loader needs the partial jump instruction and a SYMREF naming the subprogram to which the jump instruction jumps. When the program loader loads subprogram X into memory, the loader can then look for SYMREFs to X, and fill in the true address of X into partial machine instructions that need it.

This process of completing partial machine instructions with memory locations is called relocation. The relocation process typically uses the following pieces of information:

A partial machine instruction.
A SYMREF naming the subprogram or data object whose location must be filled into the partial machine instruction.
An offset value. This is for machine instructions which do not refer to the symbol itself but some offset from the symbol. For example, if a machine instruction refers to an element in an array, the SYMREF will give the name of the array, and the offset will give the offset of the desired element from the beginning of the array.
A relocation code, as discussed below.

The relocation code provides more information about the relocation process. For example, the code indicates the format of the address that should be supplied (e.g. byte address or word address) and where the address should be placed in the partial machine instruction (e.g. top half of a machine word or bottom half). The relocation codes that may be specified correspond to the ways in which addresses may be used in machine instructions.

Data objects may also require relocation. For example, if a variable is initialized to hold a pointer to another object, the correct pointer value cannot be determined until the program is loaded. The object code will contain all the information required to calculate the correct pointer value; in place of the true pointer, the object code will either have some sort of partial pointer (if this makes sense in machine architecture) or a simple placeholder.

Relocation codes are machine-dependent. Appendix D specifies the relocation codes recognized on each machine where LD is implemented.

1.3 Standard Record Format

The bytes in an object file are organized into variable length records. Each record represents a loader directive.

The standard record format is

          +-----+-----+----------------------------+-----+
          | DIR | LEN |           DATA             | CHK |
          +-----+-----+----------------------------+-----+

DIR is a byte that indicates what kind of directive the record is. In all current directives, this byte contains a printable ASCII character and different directives use different ASCII character identifiers.

LEN is a byte indicating the length of the DATA field. This length is given in bytes.

DATA is the data for the directive. The maximum length for DATA is the maximum number that can be represented in the byte LEN (255 if bytes are eight bits, 511 if bytes are nine bits). If a directive is longer than this because it contains a long string, it must be represented as in the long directive format or carried over onto a continuation record (as described at the end of this chapter).

The final byte of the directive is a checksum for the record (CHK). This is the exclusive OR (XOR) of all the bytes that precede CHK in the record (including DIR and LEN).

1.4 Directive Types

Each type of directive has a name beginning with LD_. For example, LD_BEGIN is name of the directive that usually marks the beginning of an object module. These names will be used throughout this document. Appendix A summarizes the various directives and their contents.

The loader reads the object file one byte at a time, figures out what each directive is and how to handle it, and verifies that the checksum is correct. The output produced by the loader is object code in the format that is required by the system under which the compiler is running.

1.5 Dvalues

The DATA field of a directive often contains several arguments for the directive. Thus there must be a way of indicating where one argument ends and the next begins. This is often done by expressing the argument in Dvalue format.

The Dvalue format represents numeric values with a series of consecutive bytes that have their high order bit off (0), plus a final byte with its high order bit on (1). For example, if an object file has bytes that are eight (significant) bits long, arguments will be broken up into seven-bit chunks and stored in bytes that have the high order bit off. The high order bit on the last seven- bit chunk is turned on to indicate the end of the Dvalue.

Chunks are specified in the DATA with the least significant one first; thus an argument having the binary configuration

0110000001

would be broken up into seven-bit chunks as

          011 0000001

and then represented as

          00000001 10000011

The lower chunk comes first with its high order bit set to 0. The higher chunk comes last with its high order bit set to 1.

When an argument has been reconstructed from its seven-bit chunks, the argument's high order bit should be propagated to the proper alignment boundary. This is the usual sign extension process. Thus the 32-bit integer for -1 can be represented by the single byte

          11111111    -- eight ones

The high order 1 indicates that this is the only byte in the Dvalue. The argument is therefore

          1111111     -- seven ones

and the high order bit of this is propagated out to 32 bits.

Because of sign extension process, the Dvalue form is often shorter than the full-length argument it represents. On the other hand, the Dvalue can also be slightly longer. For example, suppose we wanted to represent the hexadecimal number 7F consisting of seven ones. This has to be done in two bytes.

          01111111 10000000

The first byte gives the desired value; the second byte is necessary so that sign extension will propagate zeroes instead of ones.

On machines where bytes have nine bits instead of eight (e.g. the Bull HN DPS-8), the principle for creating a Dvalue is much the same. In this case, the value to be represented is broken up into eight-bit chunks. The ninth bit is 1 in the last byte of the Dvalue and zero in all the preceding bytes. Again, the high order bit of the last eight-bit chunk is propagated out to the appropriate length.

1.6 Other Terminology

In addition to Dvalue, several other terms will be used frequently throughout this document.

The DATA field of certain directives can contain symbol names expressed as ASCII strings. By an ASCII string, we simply mean a sequence of ASCII characters. The number of characters in any ASCII string is usually determined from the LEN field of the directive that contains the string -- unlike strings in C, there is no \0 byte to mark the end of the string.

A ULONG number is a binary integer written as a sequence of four bytes. Only the bottom seven bits of each byte are significant. Thus a ULONG value is actually a 28-bit number, made up of the four 7-bit numbers joined together from left to right. The high order bit of this 28-bit number is propagated out to the left to get a (long) integer for the appropriate machine (usually 32 or 36 bits).

A time value represents a date and time. Such a value is made from two ULONG values. The first ULONG gives a number of days from January 1, 1900. The second ULONG gives a number of milliseconds from midnight on the day in question.

A TWORD value represents the smallest chunk of memory to which relocation can be applied on the target machine. TWORD values in an object file are made up of a (fixed length) sequence of bytes. The number of (significant) bits in these bytes is greater than or equal to the number of bits in the corresponding chunk of memory on the target system.

In most cases, a TWORD will represent a machine word on the target machine. A TWORD value will be made up of a sequence of bytes containing sufficient (significant) bits to match a word on the target machine.

A local name is one that should not be known outside the source module in which it appears. Separately compiled modules of the same program will not be able to refer to such an object by name. The most common type of local name that will be visible in LD code is a static variable, either within a function or outside the scope of any function. Other local data that appears in source code (e.g. auto variables and function parameters) isn't usually visible in LD object code, because such items are usually resolved as part of compilation rather than linking.

A global name is one that can be referenced by separately compiled modules. Here are some examples of global data objects:

external variables;
all functions that are not explicitly declared static.

Secondary global symbols are special global symbols that usually appear in compiled modules that are stored in object libraries. When the linker searches a library in its attempt to resolve SYMREFs, it will not see the secondary global definitions that appear in the library. However, if the linker brings in an object module that happens to contain secondary global definitions, it can then use the secondary global definitions to resolve outstanding SYMREFs.

An example will make this situation clearer. The C library contains a standard function called open for opening files. This open function is fairly large, since it is designed to deal with a wide variety of file types (terminals, disk files, tape files, random, sequential, etc.). Whenever you make a normal call to open, the linker will search through the C library, find the full-sized open function and link it into your program.

If you are only going to do I/O to and from the terminal, you don't need the full-sized version of open. You can therefore reduce memory requirements by using a stripped down version. To get this stripped down version, all you have to do is specify

          use=tty_only;

on the command line for the final link operation. The linker will resolve this reference by running through the library and obtaining the module that defines the symbol tty_only. This module also contains a secondary global definition for open and a stripped down version of the open routine. When the module is brought in to resolve the reference to tty_only, the linker also obtains the definition for open. Thus any subsequent references to open will be resolved with the stripped down version that is already available, and the linker will not search through the library for the full-sized open.

As this example shows, secondary global definitions can be used inside a module to create the module's "personalized" version of a routine. Those using the module will get the personalized version; if the module is not obtained, the linker will search through the normal global definitions in the library and will find the standard version of the routine. Note that the order of linking is important in this case -- the secondary definition must be found before the "normal one" or else the linker will search the library and find the standard definition.

Another concept that crops up in connection with object libraries is that of a secondary reference (not related to secondary global definitions). A secondary reference is a reference to a symbol that may or may not exist within the modules and library routines that are being linked together. When a secondary reference is found, the linker attempts to resolve the reference in the usual way, by searching through the compiled modules and various object libraries. If a corresponding definition for the symbol is found, the reference is resolved; if not, the linker simply creates a null definition and resolves the reference that way. No error messages are issued in this process, since secondary references are used for "optional" data objects.

1.7 Notation

When we describe the format of a directive, the following conventions will be used:

{byte}: values that occupy a single byte will be enclosed in braces
(ulong): ULONG values will be enclosed in parentheses
"string": string values will be enclosed in double quotes
'time': time values will be enclosed in single quotes
<dvalue>: data expressed as a Dvalue will be enclosed in angle brackets
[tword]: TWORD values will be enclosed in square brackets

As a simple example of how this notation is used, we'll describe the LD_MODULE directive. This directive appears close to the beginning of an object file and indicates the start of an object module. The format of the directive can be written as

          {M}{length}"module_name"{chk}

The parts of the directive are described below.

{M}: is a byte containing the ASCII character 'M'. 'M' indicates an LD_MODULE directive.
{length}: is a byte giving the length of the LD_MODULE directive's DATA area (in bytes).
"module_name": is an ASCII string giving a name to be associated with this module. The length of the string can be determined from the {length} value (since the DATA area only contains the string). If no name is associated with this module, a null string (zero length) is used.
{chk}: is the byte containing the checksum: the exclusive OR of all other bytes in the directive.

As an example, suppose that we have a machine with 8-bit bytes and a module named X. The LD_MODULE directive for this module could be written in hex as

          4D 01 58 14
             -- 4D is ASCII 'M'
             -- 01 is data length, 1 byte
             -- 58 is ASCII 'X', module name
             -- 14 is checksum

1.8 Continuation Directives

As noted earlier, the number of bytes in a single standard format directive is limited by the length that can be expressed by the {length} byte. If this is not long enough to hold a piece of data required by the data (e.g. a long string) and it is not appropriate to use the long directive format, a continuation directive is required. The directive is called LD_CONTINUATION, and it has the form

          {&}{length}data{chk}

where {&&} is the ASCII ampersand character. The data in this directive is considered a direct continuation of that data in the preceding directive.

An LD_CONTINUATION directive is always required when the {length} byte of the preceding directive has the maximum possible value. For example, on a machine where a byte has eight bits, a {length} byte has a maximum value of 255. If a directive happens to have exactly 255 bytes of data, there must still be an LD_CONTINUATION directive (with a zero length).

1.9 Extended Records

The LD_CONTINUATION directive is one way of handling long directives. Another is to use the extended record format.

An extended record is called LP_EXTEND. It has the format

          {+}(length){type}contents{checksum}

The beginning of the directive is the ASCII '+' character. After this comes a ULONG (length) giving the length of contents of the record (plus one for the checksum). Since the length is given as a ULONG rather than a byte, the maximum length of this kind of directive is 2**28 characters.

The {type} byte is the type of another record (LD_MODULE, LD_DATA, etc.). After this come the arguments that would normally be given for that type of directive (except that the usual length byte is not specified).

The checksum at the end of the record is a checksum for the entire record, from the '+' at the beginning to the end of the contents.

In the rest of this manual, we will write up directives as if they were in standard format. However, any directive may be written in the extended format if the length of the data makes this necessary.

1.10 The End of a Module

As another example of a simple directive, LD_END marks the end of an object module. It can be written

          {E}{length}{chk}

The first byte is the ASCII character 'E', identifying an LD_END directive. The {length} byte will always be 0 (because there is no DATA area) and the {chk} byte will always be the ASCII character 'E' (which is the exclusive OR of 'E' and 0).

After the LD_END directive, there should be another marker consisting of

          {\0}{\0}{\0}

where all three bytes are 0-bits (ASCII NUL characters). This is used to mark the very end of an LD file, and the end of modules in LD libraries.

2. Segments

LD object code describes a program in terms of segments. Each segment corresponds to a block of memory that will be used when the program executes. For example, there might be one segment that holds all the data while another that holds all the executable code.

There is no direct relationship between LD segments and other constructs that might be called "segments" on a particular machine (e.g. hardware segments). The only connection is that each LD segment will always be contained entirely within a single hardware segment.

Segments are often defined within other segments. For example, external variables may be defined as separate segments within the data segment. When a segment is created inside another segment, LD places the contents of the sub-segment at the end of anything else that is currently in the enclosing segment.

To define a segment fully, LD object code must specify several pieces of information:

the segment that contains the segment being defined
the alignment of the new segment
the size of the new segment
the contents of the new segment

Sometimes the contents of the segment do not need to be specified, as in the case of uninitialized data areas. Also, the size of a segment can often be inferred from the highest initialized location.

In summary, the major characteristics of a segment are:

it describes a contiguous chunk of memory for the linked program;
any sub-segments can be placed at the end of the existing contents;
the first word of each segment has a location of zero, relative to that segment.

2.1 Predefined Segments

Predefined segments are segments which do not appear in the specified input files, but are required for a specific output format. For example, if the GCOS8 NS mode version of LD is asked to prepare a run-unit, LD must implicitly create a linkage segment to contain the descriptors for the segments that are explicitly specified in the input files.

The number and nature of predefined segments is system-dependent, and is also dependent on the output format that LD is asked to generate. For example, the GCOS8 NS mode version of LD does not put in a linkage segment when it is creating an LD object library; the linkage segment is only needed for the run-unit format.

Predefined segments are sometimes added by the writer for the particular output format, but the preferred method is to use an extra module rather than innate knowledge of the format.

2.2 Reference Numbers (SEGREFs)

Segments and symbols are identified by reference numbers or SEGREFs. Each time a new segment or symbol is referenced or defined, it is implicitly assigned the next sequential SEGREF. LD object code always uses SEGREFs to refer to segments.

Conceptually, SEGREF numbers begin at one and increase from there. In practice, however, it is sometimes desirable to change the beginning point.

The LD_BEGIN directive gives a value that should be used as the next SEGREF. Subsequent SEGREFs will follow sequentially from this value, until a new LD_BEGIN directive is encountered. The directive has the format

          {B}{length}<init_seg_no>{chk}

where

{B}: is the ASCII character 'B' identifying the LD_BEGIN directive.
{length}: is the length of the directive.
<init_seg_no>: is a Dvalue integer giving the SEGREF to use for the next reference or definition.
{chk}: is the usual checksum.

In future directive descriptions, we will usually omit the {length} and {chk} bytes because these are always present. Thus we might describe LD_BEGIN in the abbreviated format

          {B}<init_seg_no>

Note that an LD_BEGIN directive can leave gaps, i.e. SEGREF numbers for which there is no corresponding reference or definition. Attempting to use such a SEGREF is an error.

An LD_BEGIN directive can also specify an initial segment number that has already been used for some other segment. In this case, new references or definitions will hide the previous ones with the same SEGREF numbers. The previous segments will still exist, but they can no longer be mentioned by LD directives -- their SEGREF numbers now refer to the new segments.

2.3 Creating New Segments

New segments are usually created with the LD_CRSEG directive. This has the format

          {S}{flags}<parent>"name"

(Remember that we are leaving out the {length} and {chk} in our format descriptions, even though these will be present in the actual directive.) The fields of LD_CRSEG are described below.

{S}

is a byte containing the ASCII character 'S'.

{flags}

is a single byte containing an integer value that indicates what kind of segment is being defined. Any combination of the following values are recognized.

        1  -- GLOBAL
        2  -- COMMON
        4  -- SECONDARY

The COMMON flag indicates the Fortran type of common block (and similar constructs in other languages). This allows multiple definitions of the same segment, with the final size being the largest of the sizes of all the definitions. On some machines, certain C constructs must be expressed as COMMON segments, because of loader requirements. For example, on the Bull HN DPS-6, the linker requires every external variable to be put in its own COMMON block. Flags are ORed together. Thus the flags for a secondary global symbol would have a 1 ORed with a 4.

<parent>

is a Dvalue giving the SEGREF of the segment that is to contain the segment being defined. For example, if this LD_CRSEG directive defines a variable that will be located in a data segment, this argument would be the SEGREF of the data segment. The parent segment may be one of the other segments defined in the input files. If the segment being defined is not part of any existing segment, this argument will be zero. A free-floating segment will eventually be given a fixed location by LD, the target system's loader, or by the target system when the program is run. Alternatively, if the segment is COMMON, a subsequent definition can name a parent segment.

"name"

is a string giving the name of the segment. This is usually the name of the item in the source code that corresponds to the segment being defined. For local segments, a null string can be supplied, creating an unnamed segment. Such a segment can only be referenced via its SEGREF.

When the loader reads an LD_CRSEG directive in the input, it assigns the next sequential SEGREF to the new segment and records the parent segment that contains the new segment. It also creates a SYMDEF for the segment, using the name that is given in the directive.

Notice that the SEGREF of the segment being defined does not actually appear in the LD_CRSEG directive. It is obtained implicitly by incrementing the count of the number of segments that have already been assigned SEGREFs.

Here is an example of an LD_CRSEG directive (written in a combination of ASCII characters and hex digits).

          S 08 01 82 ..code CHK
              -- 'S' identifies LD_CRSEG directive
              -- 08 is length, eight bytes
              -- 01 is flag, indicating global segment
              -- 82 is Dvalue; says parent segment has
                 SEGREF 2
              -- "..code" is name of segment
              -- CHK is checksum, whatever it has to be

Note that the length of the name can be determined from the {length} byte. LD knows how long the {flags} and <parent> are, so the rest of the length must be the name.

2.4 Naming Locations in Segments

The LD_NAME directive associates a symbol name with an offset from the beginning of a segment that has already been created with LD_CRSEG. For example, the first function in a program is usually defined as an offset of zero from the beginning of the code segment. A symbol defined with LD_NAME is assigned the next available SEGREF number, just like segments defined with LD_CRSEG.

LD_NAME has the format

          {N}{flags}<offset><parent>"name"

where

{N}: is a byte containing the ASCII character 'N'.
{flags}: is a byte containing flags similar to those for LD_CRSEG. However, the COMMON flag cannot be used with LD_NAME -- segments can be common, but not symbols.
<offset>: is a Dvalue giving the word offset of the defined symbol in this segment (using the size of a TWORD on the target machine).
<parent>: is a Dvalue giving the SEGREF of the segment that contains the symbol being defined.
"name": is a string giving the name of the symbol being defined. If the symbol is local, this can be an empty string, creating an unnamed symbol that can only be referenced via its SEGREF.

If the {flags} indicate a global symbol, the loader will generate a SYMDEF for the symbol being defined.

Here is an example of an LD_NAME directive.

          N 07 01 80 8A main CHK
              -- 'N' indicates LD_NAME
              -- 07 is length, 7 bytes
              -- 01 is flag, indicate global symbol
              -- 80 is Dvalue offset (0)
              -- 8A is Dvalue SEGREF of parent segment
                 (hex A, segment 10)
              -- "main" is name of symbol
              -- CHK is checksum

2.5 Alignment

The LD_ALIGN directive describes the alignment of a segment when the alignment is important. For example, a segment containing a double-precision floating point variable has to start on a double word boundary on many machines. LD_ALIGN would indicate this requirement. The directive has the format

          {A}<segref><align_in_bits>

where

{A}: is the ASCII character 'A'.
<segref>: is a Dvalue containing the SEGREF of the segment that needs to be aligned. This segment must already have been created with LD_CRSEG.
<align_in_bits>: is a Dvalue giving the unit of alignment in bits. For example, on the DPS-8 a double-word contains 72 bits; thus double-word alignment would be indicated with a Dvalue equal to 72.

For example,

          A 02 8B A0
              -- 'A' for LD_ALIGN
              -- 02 is data length, two bytes
              -- 8B is Dvalue for segment, hex B,
                 segment 11
              -- A0 is Dvalue for 32

indicates that segment 11 should be aligned on a 32-bit boundary (probably four 8-bit bytes).

An LD_ALIGN directive for a segment may appear long after the LD_CRSEG that creates the segment, since alignment requirements are sometimes only discovered long after the segment is first defined.

When an alignment requirement is specified for a particular segment, the same requirement is automatically inherited by any segments containing that segment. If a particular data object must be aligned on a double word boundary, then all enclosing segments should have double word alignment (or better) so that there will be no problem getting the alignment that the data object needs. In this way, one LD_ALIGN directive may dictate alignment requirements for several (nested) segments.

2.6 Segment Information

In order to eliminate superfluous LD_ALIGN directives and to make it easier to lay out the segments of the program, the LD object format has a directive that summarizes information about each segment. The LD_SEGINFO directive has the format

          {s}<segref><length_in_twords><align_bits>

where

{s}: is the lower case ASCII character 's'.
<segref>: is a Dvalue giving the SEGREF of the segment that this directive describes.
<length_in_twords>: is a Dvalue giving the length of the segment (in TWORDs).
<align_in_bits>: is a Dvalue giving the unit of alignment in bits. This is just like the alignment units given in LD_ALIGN.

One LD_SEGINFO directive eliminates the need for all LD_ALIGN directives describing the alignment of the segment. It also eliminates the need for LD_DATA and LD_RELOC directives stating the length of the segment. (LD_DATA and LD_RELOC are described in Chapter 4 of this document.)

2.7 Symbol References

An LD_REFER directive creates a reference to a symbol name that has not yet been defined. Local names must be defined later in the module; global names may be defined elsewhere in the module or in another module. If a global name is not defined in the module that contains the reference, different output writers deal with the situation in format-dependent ways.

LD_REFER has the format

          {R}{flags}"name"

where the {flags} are the same as for LD_CRSEG and the "name" is the name of the symbol which has been referenced. The name cannot be null.

The symbol named in an LD_REFER is assigned the next available SEGREF number just like symbols defined with LD_CRSEG and LD_NAME directives.

If a module contains a definition for a name, as well as LD_REFER directives, the flags in the definition directive take precedence over those in any LD_REFER directive. Of course, if the definition is marked as global and the LD_REFER is local (or vice versa), the directives refer to different objects and the flags have no effect on each other. However, if both the definition and LD_REFER are global, the flags on the definition will be used and the flags on the LD_REFER will be ignored.

2.8 System-Dependent Characteristics

The LD_SYMOPTS directive specifies one or more characteristics for a segment, SYMREF, or SYMDEF. The format of the directive is

          {o}<segref>"characteristics"

where

{o}: is a byte containing the ASCII character 'o' (lower case oh).
<segref>: is the segment number of the symbol that this directive describes.
"characteristics": is a string containing (target) system-dependent characteristics for the given symbol.

Characteristics vary from machine to machine and the format of the "characteristics" string is system-dependent.

2.9 Other Segment Manipulation Directives

The LD_LITERAL directive is used to create a segment that contains a literal. Such a literal may be "folded" with any other literal that has the same value. For more information about LD_LITERAL, see Chapter 4.

3. Control Directives

The LD_CONTROL directive is used to specify various kinds of information about a program or a module. It has the format

          {#}{length}sub-directive{chk}

where {length} and {chk} are the usual length and checksum bytes, and {#} is the ASCII character '#'.

Sub-directives for LD_CONTROL all consist of a single ASCII character, followed by one or more pieces of data. Sub-directives are known by names that begin with LC_.

3.1 LC_LDVERSION

This specifies the version of the LD object format that a file contains. It should be the first directive in a file or library. The format of the sub-directive is

          {V}{version}'create_time'{type}

where {V} is the ASCII character 'V' (upper case), {version} is an integer, and 'create_time' is a time value indicating when the file or library was last updated. The {type} byte is optional and indicates the type of the file: a value of LT_LD_OBJECT (0) indicates a regular LD file containing object code; a value of LT_RUN_UNIT (1) indicates a run-unit or run-unit library (on systems where run-units are supported). If this byte is omitted, the default is LT_LD_OBJECT.

LD version numbers began at zero. This manual describes version 2 of the LD format.

3.2 LC_FILENAME

This gives the name of the source file that contained the original source code. The sub- directive has the form

          {F}"filename"

where {F} is the ASCII character 'F' and "filename" is the name of the source file.

An object file may contain several LC_FILENAME directives (i.e. LD_CONTROL directives with LC_FILENAME sub-directives). For example, suppose the file cprog contains the C code

          /* Program starts here */
          #include <stdio.h>
          main() { ...

There will be an LC_FILENAME directive for cprog when the program first begins compilation. Immediately after that will be an LC_FILENAME directive for <stdio.h> include file. If the <stdio.h> include file #includes other files, there will be LC_FILENAME directives for all those. When the compiler has finished with the <stdio.h> file, it will output another LC_FILENAME for cprog to show that it has returned to the original source file.

3.3 LC_REVISION

The LC_REVISION sub-directive has the format

          {R}"revision"

where {R} is the ASCII character 'R' and "revision" is a string. The contents of "revision" may give a version number to the program being linked. In C programs, the "revision" string is obtained from a #pragma version preprocessor directive.

3.4 LC_CPR

The LC_CPR sub-directive has the format

          {C}"copyright_string"

where {C} is the ASCII character 'C' and "copyright_string" is a string. The contents of "copyright_string" may state copyright information for the program being linked. In C programs, the "copyright_string" is obtained from a #pragma copyright preprocessor directive.

3.5 LC_TITLE

The LC_TITLE sub-directive has the format

          {T}"title"

where {T} is the ASCII character 'T' and "title" is a string. The contents of "title" may state a name or title for the program. In C program, the "title" is obtained from a #pragma title preprocessor directive.

3.6 LC_MODULE

The LC_MODULE sub-directive has the format

          {M}"module_name"

where {M} is the ASCII character 'M' and "module_name" is a string. The contents of "module_name" may state a name for the module.

Note that this LD_CONTROL sub-directive provides the same information as the LD_MODULE directive we discussed in Chapter 1. Older versions of LD use LD_MODULE, while newer ones use LD_CONTROL and LC_MODULE.

3.7 LC_TARGET_INFO

The LC_TARGET_INFO sub-directive describes important aspects of the target machine. It has the format

          {I}{byte_len}{reloc_align}{origin_size}
             "machine_name"

where

{I}: is the ASCII character 'I'.
{byte_len}: is an integer giving the number of bits in a byte on the target machine.
{reloc_align}: is an integer giving the relocation alignment of the target machine. The relocation alignment indicates the number of bits in the smallest unit to which relocation can be applied. (This is the number of significant bits in a TWORD quantity.) For example, if the smallest relocatable address on the target machine is a machine word, {reloc_align} gives the number of bits in a word on the target system.
{origin_size}: is an integer giving the origin size of the target machine. This is the number of TWORDs that are required to express a relocation origin in an LD_DATA directive. For more information on relocation origins, see Chapter 4.
"machine_name": is a string giving a name for the target machine. Different versions of LD accept different target machine names; if you specify an unsupported machine name, LD will issue an error message. If no checking is desired, the target machine name can be null (i.e. contain no characters).

The LC_TARGET_INFO directive should be one of the first directives in an object file, since it strongly influences how the rest of the data in the file should be treated. In particular, Dvalues and TWORDs cannot be interpreted properly without the information provided by this directive.

3.8 LC_LIB_HEADER

The LC_LIB_HEADER directive is used at the beginning of object libraries. It provides information about the object library. It is often followed by unused space, since library modules usually start on a specific alignment boundary (e.g. a disk sector boundary) to improve performance.

The LC_LIB_HEADER directive has the format

          {H}{version}(index_seek)(end_seek)
               LH_LIB_NAME(gap)'time'{type}

where

{H}: is the ASCII character 'H'.
{version}: is a byte whose contents indicate the version of LD that created this object library. Different versions of LD use different values for this byte.
(index_seek): is a ULONG value giving the seek address of the library index. This index describes the contents of the library.
(end_seek): is a ULONG value giving the seek address of the end of the library.
LH_LIB_NAME: is a symbolic name for a string identifying this file as an object library. At present, the string is "Object library".
(gap): is a ULONG value giving the number of bytes between this LD_CONTROL directive and the beginning of the first module in the object library. (This specifies the size of the unused space that follows the directive.)
'time': states the date and time that the library was created.
{type}: is optional and indicates the type of the file: a value of LT_LD_OBJECT (0) indicates a regular LD file containing object code; a value of LT_RUN_UNIT (1) indicates a run-unit or run-unit library (on systems where run-units are supported). If this byte is omitted, the default is LT_LD_OBJECT.

3.9 LC_PATCH

The LC_PATCH directive is used in run-units to indicate a patch. This is a change that should be made to the object code at the time that a run-unit is loaded or prepared for a debugging session.

For example, the original source code initialized a variable to 0 and in a later debugging session, you discover that the value should be 1. To avoid recompiling the original source code, some debuggers let you patch the object code, changing the 0 initialization value into a 1.

Typically, patches are used with programs that are already in active use, especially ones that are distributed to other sites. Rather than recompile the program and send out a new release of the software, it may be more convenient just to send out a collection of patch directives which can be added to the existing program.

LD offers two ways to patch code. The first is to leave the original object code as is, and to add an LC_PATCH directive describing the change you want to make. The second is to change the object code in the way you wish, and then add an LC_PATCH_RECORD directive describing what the object code originally said. This section discusses LC_PATCH, while the next discusses LC_PATCH_RECORD.

An LC_PATCH directive is a combination of the LD_DATA and LD_RELOC directives described in Chapter 4. In order to understand the contents of the directive, it is better to read Chapter 4 first. LC_PATCH has the format

          {P}'time'<refno><offset>[data_word]
             <nrelocs>[{code}<symbol>]*"comment"

where

{P}: is the ASCII character 'P'.
'time': is the time when this LC_PATCH directive was created.
<refno><offset>: tells what memory location you want to patch. <refno> is a Dvalue giving a reference number standing for a section or a symbol, and <offset> gives a byte offset from the location given where the section or symbol starts.
[data_word]: is the new value that you want to place in the given memory location. This is given as a target word (i.e. a machine word on the machine where the program will eventually run).
<nrelocs>: is the number of relocation pairs that are found in the rest of the LC_PATCH directive.
{code}<symbol>: is a relocation pair. The {code} is a relocation code like those used in relocation triplets in LD_RELOC directive. The <symbol> is the reference number of the symbol used in the relocation; for more on relocation, see Chapter 4.
"comment": is a text string provided by the person making the patch, describing the reason why the patch was made.

Patch records are only created by debuggers. They are not generated by compilers or by programs like LD or LEDIT.

3.10 LC_PATCH_RECORD

The LC_PATCH_RECORD directive is used when a patch has been made to the run-unit. The directive is a record of what the run-unit contained before the patch was made. Thus the body of the object code contains the new material and the LC_PATCH_RECORD directive records the old. Contrast this with LC_PATCH, where the body of the object code contains the old material and the LC_PATCH directive provides the new.

LC_PATCH_RECORD has the same format as LC_PATCH. The [data_word] records the old contents of the machine word that was patched, and the relocation pairs record the relocation information that was associated with the old data word. The rest of the LC_PATCH_RECORD fields have the same meaning as in LC_PATCH.

4. Data

Data directives are used to initialize the contents of a segment. Usually, the segment being initialized will be one of those defined with a LD_CRSEG or LD_NAME directive in the same input file. However, it can happen that input contains initialization directives for segments that were only referenced by the input (with an LD_REFER directive). In this situation, the result depends on the output format.

There are two directives for producing absolute (constant) data:

LD_DATA for complete TWORDs aligned on TWORD boundaries; and
LD_BIT_DATA for smaller quantities of data.

After data has been specified, it can be relocated with the LD_RELOC directive. LD_RELOC is also used to specify the load origin(s) for the preceding LD_DATA directive, so LD_RELOCs may appear even if no relocation is required.

Fixed data is assigned in the order it is seen in input, so only the most recently stored value in each bit is kept. Only after all constant data has been merged is the relocation performed. If some TWORDS have only been partly initialized (through LD_BIT_DATA directives), the result depends on the output format chosen.

4.1 Data Definitions

The LD_DATA directive specifies data that should be stored in a segment.

Every LD_DATA directive is immediately followed by an LD_RELOC directive which gives relocation information for the data in the LD_DATA directive. The relocation information tells which segment contains the data, the offset of the data within the segment, and how to relocate the data.

The format of an LD_DATA directive is

          {L}[data_word][data_word]...

where

{L}: is the ASCII character 'L'.
[data_word][data_word]...: is a sequence of one or more TWORDs containing data. (Recall that a TWORD usually represents a machine word on the target system. For a more precise definition, see Chapter 1.)

Each of the TWORD values represents a fixed length chunk of data that can be assigned on a boundary to which relocation may be applied. For example, on the DPS-8, each TWORD word of data represents a machine word (36 bits). In this way, the contents of a segment may be given one word at a time.

The format of an LD_RELOC directive is

          {O}reloc-triplet,reloc-triplet,...

where {O} is the ASCII character 'O' (upper case "oh"). The DATA field of the directive consists of a sequence of relocation triplets, one for each origin and for each relocatable TWORD in the corresponding LD_DATA directive. Some of the TWORDs in the LD_DATA will not need relocation (e.g. many machine instructions in a code segment), so there are often more TWORDs in the LD_DATA than relocation triplets in the LD_RELOC.

Each relocation triplet consists of three components:

A byte containing a code that indicates what type of relocation should be used for the corresponding data. In some sense, the relocation code indicates how to interpret TWORDs from LD_DATA and the other two components in the relocation triplet. We will discuss relocation codes in more detail shortly.
A byte containing the number of the TWORD which should be relocated according to this particular triplet. TWORDS in an LD_DATA directive are numbered from 0.
A Dvalue giving the SEGREF of the symbol that is to be used for relocation.

We will refer to these three components as the CODE, the WORD#, and the SEGREF.

4.1.1 The Relocation Origin

Most relocation codes are (target) system-dependent. However, there is one system-independent CODE value; this is given the symbolic name ORIGIN_RELOC, and has a value of zero. This CODE indicates that the corresponding TWORD(s) in the LD_DATA directive do not specify an initialization value but an offset (in TWORDs) from the symbol indicated by the triplet's SEGREF. Effectively, an ORIGIN_RELOC triplet says, "Start writing out data at this offset from this symbol." The first relocation triplet in any LD_RELOC directive must be an ORIGIN_RELOC triplet that tells where to write the first TWORD of the preceding LD_DATA directive.

The number of TWORDs required to specify the offset for ORIGIN_RELOC is system- dependent. This number is given by the {origin_size} value in the LC_TARGET_INFO directive that describes the target machine (see Chapter 3). Usually, a single TWORD is large enough to hold such an offset; but on a machine with small TWORDs (e.g. the PC, where TWORDs are only 8 bits) several TWORDs may be needed.

To show how ORIGIN_RELOC works, suppose a C program contains a declaration of the form

          int K = 3;

We'll suppose that K is an external variable and has a segment all to itself. We could write directives to initialize K to the proper value in the following way (omitting LENGTH and CHK bytes to simplify things).

          {L}[0][3]
          {O}{ORIGIN_RELOC}{0}<K's SEGREF>

The LD_DATA (L) directive has DATA field consisting of two TWORDs with the values 0 and 3. The LD_RELOC (O) directive has a DATA field consisting of one relocation triplet. This triplet has the following components:

{ORIGIN_RELOC}: is a byte containing the ORIGIN_RELOC code.
{0}: is a byte indicating that this relocation triplet applies to TWORD zero of the preceding LD_DATA directive.
<K's SEGREF>: is a Dvalue giving the SEGREF of the segment containing K.

The relocation triplet says that the loader should begin laying down data in K's segment, beginning at the offset 0. This offset is obtained from TWORD zero of the LD_DATA directive. All the remaining data in the LD_DATA directive (just the TWORD [3]) is laid down sequentially in K's segment. The result is that the value 3 is put in the segment that represents K.

Multiple initializations are handled in a similar way. For example,

          int I[] = {1,2,3,4};

could be written with LD_DATA and LD_RELOC directives as

          {L}[0][1][2][3][4]
          {O}{ORIGIN_RELOC}{0}<I's SEGREF>

As before, the relocation triplet indicates that the loader should begin laying down data at offset 0 in I's segment. TWORDs one through four are laid down sequentially in this segment.

The situation gets slightly more complicated when some of the data TWORDs require relocation. Suppose that we are working with

          int C[10];
          int *P = &C[5];

where both P and C are external variables and therefore have their own segments. To initialize P on the DPS-8, we could use the following directives.

          {L}[0][05000000]
          {O} {ORIGIN_RELOC}{0}<P's SEGREF>
              {PTR_RELOC}{1}<C's SEGREF>

The LD_RELOC (O) directive has two relocation triplets. The first indicates that the loader should lay down data as TWORDs, beginning at offset 0 in P's segment. The offset of is obtained from TWORD {0} in the LD_DATA directive.

The second relocation triplet tells how TWORD {1} should be relocated when it is laid down in P's segment. {PTR_RELOC} is another symbolic CODE, standing for a type of pointer relocation on the DPS-8. With this relocation code, the corresponding TWORD is expected to contain a (word) offset value in the upper half of the word (which is why the corresponding TWORD value is 05000000 octal). This offset is relocated by adding on the address of C's segment (as indicated by the relocation triplet). The final address laid down in P is therefore the address formed from the start of C's segment and a word offset of 5.

4.1.2 Relocation Codes

ORIGIN_RELOC is a special relocation code that is used on all machines. Apart from this, all relocation codes are system-dependent.

For example, on the DPS-8 machine, addresses are often stored in the upper 18 bits of the machine word. Therefore there is a frequently used relocation code indicating that LD should relocate the upper 18 bits of a particular data value and leave the rest alone. A different relocation code is used if the address is stored in the lower 18 bits of the word; this different relocation code tells LD to relocate the lower 18 bits of the data value and leave the upper bits alone. In some cases, a word may hold two addresses, one in its upper half and one in its lower. In this case, there would be two relocation triplets for the same word: one for each address that needed relocating.

On other machines, where addresses may be stored in a variety of locations within a chunk of data, the relocation codes tell what part of the data is the address and what part should not be touched. Appendix C describes the relocation codes for various systems.

It is usually possible to apply several relocations to the same TWORD (although some combinations may be meaningless).

On the other hand, a single relocation triplet may affect more than one TWORD; for example, in NS mode on the DPS-8, there will be a CODE that relocates a descriptor made up of two TWORDs. In this case, the TWORD named in the triplet will always be the one with the smallest (numerically least) offset.

LD_DATA and LD_RELOC directives can be used to provide information about the length of a segment. In essence, you use an ORIGIN_RELOC relocation triplet to set the relocation origin to the end of the segment. For example, consider

          int C[10];

on the DPS-8. If C is an external variable and therefore a segment on its own, we could write

          {L}[10]
          {O}{ORIGIN_RELOC}{0}<C's SEGREF>

This sets a relocation origin for the segment to an offset of 10 words (given by TWORD {0}) in C's segment. This indicates that C's segment is at least 10 words long, even though zero words of data were supplied.

4.1.3 In Practice

The examples we have given so far in this chapter show how LD_DATA and LD_RELOC could be used. In practice, however, the directives may be used in a slightly different way (by our C compilers, for example).

Let us consider the four extern declarations

          int K = 3;
          int I[] = {1,2,3,4};
          int C[10];
          int *P = &C[5];

When encountering one of these declarations, the compiler will put out directives in the following order.

An LD_REFER for the variable name being declared, indicating a reference to the symbol.
An LD_CRSEG creating a segment to hold the data object.
An LD_ALIGN indicating the alignment of the segment just created.
An LD_DATA and LD_RELOC to give the size of the segment just created. The LD_DATA will only have one TWORD (an offset giving the size) and the LD_RELOC will only have one ORIGIN_RELOC relocation triplet.

The directives listed above create space for the variables but do not give initialization values. Initialization values will be given at the end of the object file, using large LD_DATA/LD_RELOC directives that initialize several data objects at once. For example, you might see

          {L}[0][3]
             [0][1][2][3][4]
             [0][05000000]

          {O}{ORIGIN_RELOC}{0}<K's SEGREF>
             {ORIGIN_RELOC}{2}<I's SEGREF>
             {ORIGIN_RELOC}{7}<P's SEGREF>
                {PTR_RELOC}{8}<C's SEGREF>

The above directives combine all the initializations we have discussed into a single LD_DATA/LD_RELOC pair.

4.2 Bit Data

It is sometimes desirable to initialize part of a TWORD, while leaving other parts untouched. Such bit data is specified with the LD_BIT_DATA directive. This has the form

          {1}<symbol><bit_offset><Nbits>[data][data]...

where

{1}: is the ASCII character '1' (one).
<symbol>: is the SEGREF of a symbol that has already been defined.
<bit_offset>: is a Dvalue giving a bit offset from the specified symbol. This offset indicates the beginning of the bit data.
<Nbits>: is the number of bits that should be stored in the given location.
[data][data]...: is a sequence of one or more TWORDs containing the bit data. If Nbits is not a multiple of the size of a TWORD, some bits from the last TWORD will be discarded. The decision of which bits to discard (most significant or least significant) depends on how the target machine architecture numbers its bits; the "highest-numbered" bits are the ones that are discarded.

4.3 Literals

Literals are values that are expected not to change in the course of program execution. Typically, literals are numeric constants or constant strings, although there are other kinds of literals too.

Each literal is created as a separate segment. Collections of literals are grouped into literal pools. A literal pool is associated with a particular segment. The pool will contain all the literals used by that segment and the segment's child segments.

If several literals in a pool have the same value, the literals are folded together during the loading process. By this, we mean that the pool will only contain one literal with that value, and all references to that value will be aimed at this unique literal.

Before identical literals are folded, each will have its own SEGREF (since each literal is its own segment). This means that the folding process associates all these separate SEGREFs with the segment that holds the (unique) folded literal.

4.3.1 Creating Literals

A literal value is created using the LD_LITERAL and LD_END_LITERAL directives.

LD_LITERAL has the format

          {[}<ref_segment>

where

{[}: is a byte containing the ASCII character '['.
<ref_segment>: is a Dvalue giving the SEGREF of a segment with which the literal will be associated.

An LD_LITERAL directive implicitly associates the next available SEGREF with the literal being created. In this way, the SEGREF can be used to refer to the literal (in the same way that SEGREFs refer to segments created with LD_CRSEG).

The LD_END_LITERAL directive marks the end of a literal whose definition has begun at a previous LD_LITERAL directive. LD_END_LITERAL has the format

          {]}<lit_segref>

where

{]}: is a byte containing the ASCII character ']'.
<lit_segref>: is a Dvalue giving the SEGREF of the literal being ended.

A compiler's first step in creating a literal is to output an LD_LITERAL directive giving the SEGREF of the segment that uses the literal. After the LD_LITERAL will come LD_DATA and LD_RELOC directives which generate the literal value. In some situations, other directives may also be needed to construct the literal (e.g. LD_ALIGN).

When the code generator has finished outputting directives to construct the literal value, it marks the end of the literal with LD_END_LITERAL. Once the LD_END_LITERAL has been found, LD can place the literal in the pool associated with the segment whose SEGREF appeared in the original LD_LITERAL directive. LD does this by looking at the containing segment, the segment's parent, the parent's parent, and so on, until it finds a parent segment that has an associated literal pool.

If the same literal is used several times in a program or module, the uses will be folded together to give a single value in the literal pool.

4.3.2 Creating a Literal Pool

The LD_POOL directive creates a literal pool associated with a segment. The directive has the form

          {=}<parent>

where

{=}: is the ASCII character '='.
<parent>: is a Dvalue giving the SEGREF of a segment that has already been created (e.g. with LD_CRSEG).

The literal pool will be contained in the segment given by <parent>. When literal values are created by the parent segment or by any of its child segments (segments contained in the parent), LD will collect them and store them in the literal pool. Depending on output format, the literals in a pool may eventually become children of the parent themselves.

5. Debugger Directives

The directives we have already discussed are sufficient for creating an object module that can be linked and loaded. However, compilers may generate additional directives that specify debugging information for the program being compiled.

The LD phase of the compiler assembles the information provided by these into debugging tables which are included as part of the final load module. These tables can be read by a symbolic debugger to provide information when examining a post-abort dump or when running a program. The information supplied by the debugging directives includes the type and storage class of each variable and function defined in a program.

Debugger information must always be given in the context of scope within a program. For example, information about a local variable "i" in one function may not apply to a local variable "i" in a different function. Thus, all the information specified by debugger directives must be associated with a particular scope.

Scopes are represented by reference numbers known as VLREFs. This stands for Variable List Reference numbers. Variables with the same VLREF have the same scope, e.g. all the auto variables defined at the beginning of a function. The set of all variables with a given VLREF is called the variable list of the associated scope.

Variable lists are frequently nested inside other lists. For example, the elements of a structure form their own variable list because the element names are only meaningful within the context of the structure. The structure itself will be part of another variable list (e.g. an enclosing structure). Similarly, the local variable list of a particular function is enclosed in the variable list of external variables which are also accessible to that function.

5.1 Defining Variable Lists

The LD_DEFVLIST directive is the most common way of marking the start of a variable list. This directive has the format

          {v}<enclosing_vlref>

where

{v}: is a byte containing the ASCII character 'v' (lower case vee).
<enclosing_vlref>: is a Dvalue giving the VLREF of the variable list that contains the new variable list being defined.

VLREFs begin at 1. Every time an LD_DEFVLIST command is issued, the next available VLREF is used for variables in the associated variable list. This is much the same as the situation for LD_CRSEG directives, where every LD_CRSEG directive issued is implicitly given the next available SEGREF number.

The variable list with a VLREF of 1 is usually the outermost variable list in the module, representing a scope covering the entire compilation unit.

There is a special predefined variable list with a VLREF of zero. It is intended to contain externally visible objects.

The end of a particular scope is marked by an LD_ENDVLIST directive. This has the format

          {}}<vlref><start_counter><end_counter>

where

{}}: is a byte containing the ASCII character '}'.
<vlref>: is a Dvalue giving the VLREF of the variable list that is ending.
<start_counter><end_counter>: are only present when the scope of the variable list is a block of executable code. <start_counter> is a Dvalue telling the code address where the scope of the variable list begins and <end_counter> is a Dvalue telling the code address where the scope ends.

After an LD_ENDVLIST directive, variables with the given VLREF are no longer recognized. For example, an LD_ENDVLIST directive is issued at the end of every function to mark the end of the scope of that function's local variables.

5.2 Specifying Type Information

The LD_DEFTYPE directive describes a data type used in a module. Every time an LD_DEFTYPE is used to specify a type, a number is associated with the given type. This number is known as a TREF.

Like SEGREFs, TREFs are assigned sequentially. By default, TREFs begin at 1. The LD_INITTREF directive can be used to specify a different starting point. It has the form

          {0}<TREF>

where {0} is byte containing an ASCII zero, and <TREF> is a Dvalue giving a number that should be used the next time a TREF is defined. New TREFs will be assigned sequentially from this starting TREF.

The format of LD_DEFTYPE is

          {T}{type_code}extra info

where

{T}

is a byte containing the ASCII character 'T'.

{type_code}

is a byte representing the kind of type being defined. The following table lists the possible codes that might appear and the types they represent.

        Types:
            0 -- typedef
            1 -- structure
            2 -- union
            3 -- enumerated class
            4 -- pointer
            5 -- array
            6 -- function
            7 -- bit field
            8 -- const modifier
            9 -- volatile modifier
           10 -- far modifier
           11 -- near modifier
           12 -- huge modifier
           13 -- void
           14 -- variable argument list (... in C)
           15 -- char (signed)
           16 -- short
           17 -- int
           18 -- long
           19 -- unsigned char
           20 -- unsigned short
           21 -- unsigned int
           22 -- unsigned long
           23 -- float (single precision)
           24 -- double
           25 -- long double
           26 -- statement label
           27 -- block label
           28 -- signed long char
           29 -- unsigned long char

(Note: new numbers may be added to this list in future releases of the compilers.) The type codes up to 11 represent type modifiers, while the remaining codes (from void on) represent basic types. The basic types correspond to the basic types of the C programming language, plus a type for block labels (26).

extra info

provides more information about a type, depending on the {type_code} given. We will say more about this shortly.

Whenever a new data type is encountered in the source code, an LD_DEFTYPE directive is issued to describe that data type. The LD_DEFTYPE for a type is only issued the first time a particular type is mentioned; in other words, there is only one declaration for the int type, no matter how many int variables are declared in the source code.

LD predefines TREFs for all the basic types listed above. Each is given a TREF equal to the type code. Typically then, an LD_INITTREF directive will be used to begin numbering TREFs with the number that follows the last basic type value.

5.3 Pointer Types

If the {type_code} of an LD_DEFTYPE directive is 4 (indicating a pointer type), the format of the LD_DEFTYPE is

          {T}{4}<tref>

where <tref> is a Dvalue giving the TREF of the type pointed to.

As an example,the TREF for int values is 16, so an LD_DEFTYPE directive for a pointer to int could be written as

          {T}{4}<16>

The 4 indicates a pointer type; the 16 is the TREF of the int type.

If we wanted to define a pointer to a pointer to integers, the first argument of the LD_DEFTYPE directive would be a 4 and the second argument would be the TREF of the pointer to integers that we just described.

5.3.1 Type Qualifiers

The const and volatile keywords are known as type qualifiers in the C programming language. Speaking very loosely, an object with the const qualifier should not be assigned a value in the executable code of the routine that declares the object. An object with the volatile qualifier may change its value without direct program action (e.g. a hardware clock). For a more rigorous description of these two qualifiers, see the C Reference Manual.

If the {type_code} of an LD_DEFTYPE directive is 8 (const) or 9 (volatile), the format of the directive is

          {T}{type_code}<tref>

where <tref> is a Dvalue giving the TREF of the type being modified by const or volatile.

For example,

          const int *p;

would result in two LD_DEFTYPE directives.

          {T}{8}<16>                 -- const int
          {T}{4}<TREF of const int>
                           -- pointer to const int

The near and far qualifiers are used when a machine has more than one pointer format. Typically, far pointers are able to address a wider range of memory than near pointers, but far pointers take up more space themselves and are less convenient to work with. Thus near pointers are generally more efficient to use, but far pointers are required in situations where addressability is a concern. For a more rigorous description of near and far pointers, see the appropriate documentation on machines that support them.

Near and far qualifiers are added to types in a manner similar to the process of adding the const and volatile qualifiers. Note that a "far pointer" is actually a pointer to a "far" object; the "far" attribute is attached to the object type, not the pointer. The same goes for "near".

5.3.2 Function Types

If the {type_code} of an LD_DEFTYPE directive is 6 (function returning a value), the format of the LD_DEFTYPE is

          {T}{6}<tref>

where <tref> is a Dvalue giving the TREF for the type of value that the function returns.

As an example, the TREF of the int type is 16, so the LD_DEFTYPE directive for a function returning int could be written as

          {T}{6}<16>

5.3.3 Array Types

If the {type_code} of an LD_DEFTYPE directive is a 5 (indicating an array), the format of the LD_DEFTYPE is

          {T}{5}<tref><length>

where

<tref>: is a Dvalue giving the TREF of the type of elements in the array.
<length>: is a Dvalue giving the number of elements in the array.

Thus the LD_DEFTYPE directive for an array of 10 integers could be written

          {T}{5}<16><10>

(As always, the TREF for int is 16).

If an array has more than one dimension, it will be defined with several successive LD_DEFTYPE directives. For example,

          int arr[20][30];

would be broken down into

          {T}{5}<16><30>   -- array of 30 integers
          {T}{5}<TREF of int[30]><20>

The second LD_DEFTYPE directive might be read as describing an array of 20 "arrays of 30 integers".

This method of breaking type declarations into parts is used for all complex types. For example, consider

          int *p[20];

which declares an array of 20 integer pointers. First of all there is a TREF for integer pointers

          {T}{4}<16>

and next there is a TREF for arrays of such things.

          {T}{6}<TREF of *int><20>

If an array is declared with an unspecified dimension, as in

          extern int x[];

the <length> argument in the LD_DEFTYPE will be given as zero, indicating that it is unknown at present.

5.3.4 Structure Types

If the {type_code} of an LD_DEFTYPE directive is a 1 (indicating a struct type), the format of the directive is

          {T}{1}<vlref>"tag"

where

<vlref>: is a Dvalue giving the VLREF of the variable list that contains this structure in its scope.
"tag": is a string giving the structure tag, if any. If there is no tag, the string is null (containing no characters).

An LD_DEFTYPE for a structure automatically obtains a new VLREF (scope) because the fields in the structure form their own name space. Thus an LD_DEFTYPE for a structure implies an LD_DEFVLIST directive to start a new variable list.

As an example, suppose we have the definition

          struct complex {
              float x;
              float y;
          };

outside the scope of any function in a program. Since this is an external type, the enclosing variable list has VLREF 1. The corresponding LD_DEFTYPE declaration would have the format

          {T}{1}<1>complex

(assuming that 1 was the VLREF for file scope).

No special LD_DEFTYPE declarations are needed for other types of elements of a structure (except for bit fields which are described in the next section). For example, the elements of the "complex" structure declared above just have the normal float type and do not need a special LD_DEFTYPE. If an element has a type that has not been seen before in this module, a normal LD_DEFTYPE will be constructed for the type.

5.3.5 Bit Fields

If the {type_code} of an LD_DEFTYPE directive is 7 (indicating a bit field), the format of the directive is

          {T}{7}<tref><length>

where

<tref>: is a Dvalue giving the TREF of the type of data contained by the field. In ANSI standard C, this can be int or unsigned int.
<length>: is a Dvalue giving the number of bits in the field.

For example,

          {T}{7}<16><9>

describes a bit field nine bits long and having the int type.

5.3.6 Union and Enumerated Types

LD_DEFTYPE directives for union and enum types are similar to those for struct types.

          {T}{2}<vlref>"tag"

describes a union type with the given "tag", inside the variable list <vlref>.

          {T}{3}<vlref>"tag"

describes an enum type with the given "tag", inside the variable list <vlref>. Both LD_DEFTYPE automatically start a new variable list scope, allocating a new VLREF for the variable list.

5.3.7 Typedef Declarations

A typedef statement is usually handled with at least two LD_DEFTYPE directives. For example,

          typedef char *STRING;

could be represented with three LD_DEFTYPE directives.

One to get a TREF for the char type (this is automatically generated by LD).
One to get a TREF for char * (pointer to char).
One to get a TREF for STRING. This directive will refer to the TREF for "pointer to char". In this way, this LD_DEFTYPE directive does not really define a new type but rather attaches a name to a type that already has its own TREF.

The format of a typedef LD_DEFTYPE directive is

          {T}{0}<tref><vlref>"name"

where

<tref>: is a Dvalue giving the TREF of the type that will be associated with the given name.
<vlref>: is a Dvalue giving the variable list (scope) where the typedef name is defined.
"name": is the name that will be associated with the appropriate type.

5.4 Describing Symbols

The LD_DEFVAR directive describes data objects (functions and variables) in a program. There is an LD_DEFVAR directive for every variable and function used in the program. (Note that there is NOT a separate LD_DEFTYPE directive for every variable and function. LD_DEFTYPE directives are only issued when a NEW data type is encountered.)

In order to describe a variable, one needs several pieces of information: the storage class, the data type, the name, the scope, and one or two other details depending on the type of variable being described.

The format of the LD_DEFVAR directive is

          {V}{class}<tref><vlref><offset><segref>"name"

where

{V}

is a byte containing the (upper case) ASCII character 'V'.

{class}

is a byte containing an integer that indicates the storage class of the object being defined. The possible values for {class} are

        Storage Classes:
             0  -- external
             1  -- static
             2  -- auto
             3  -- register
             4  -- argument
             5  -- structure tag (*)
             6  -- union tag (*)
             7  -- enum tag (*)
             8  -- structure element
             9  -- union element
            10  -- enum element
            11  -- debugger use (*)
            12  -- typedef (*)
            13  -- display

The classes marked with a star will never actually appear in an LD object file, but the class codes are reserved for the internal use of support software.

Most of the above classes are self-explanatory, but the need for an "argument" class may need some clarification. In C, function arguments are semantically and syntactically the same as "auto" variables. However, the actual machine code that deals with arguments is sometimes radically different from the code that deals with other auto variables, and the debugger must know the difference in order to get things straight.

The "display" class indicates that the object is a compiler-generated auto which contains a pointer to the most recent stack frame of its lexical parent. This is used for languages like Pascal, where subprograms can be local to other subprograms.

<tref>

is a Dvalue giving the TREF of the object's data type. If the variable has a type that has not been used before in this program, the code generator will first generate an LD_DEFTYPE for the type, then issue the LD_DEFVAR directive for the variable.

<vlref>

is a Dvalue giving the VLREF of the variable list that contains the data object being defined. This establishes the scope of the object's name.

<offset>

is a Dvalue giving the offset of the object within some segment. The segment is determined by the {class} of the object. We will list possible values shortly.

<segref>

is a Dvalue giving the segment that contains the variable. This only appears if the {class} is external or static; for other classes of objects, it is omitted.

"name"

is a string giving the object's name.

Below we list the possible values of <offset> depending on the type of object.

external or static: <offset> is the byte offset from the beginning of the segment indicated by <segref>.
auto, argument, or display: <offset> is a byte offset in the current function's stack frame.
register: <offset> is a byte offset in the current function's stack frame. register variables are always stored like autos.
structure or union element: <offset> is a bit offset from the beginning of the structure being defined. Note that all structure definitions start with an LD_DEFTYPE directive that names the structure and also creates a new variable list for the elements.
All subsequent LD_DEFVAR directives defining elements will give the VLREF of the structure that contains the elements. When the code generator comes to the end of the structure, it will issue an LD_ENDVLIST directive to indicate the end of the variable list that contains the structure elements.
enum element: <offset> gives the value of the element within the enumerated set.

As an example, consider the following declaration.

          struct complex {
              float x;
              float y;
          } Z1;

This would generate the following directives.

          {T}<1><VLREF of enclosing VList>complex
                       -- LD_DEFTYPE for struct
          {V}{8}<TREF of float><VLREF of struct><0>x
                       -- LD_DEFVAR for x, offset 0
          {V}{8}<TREF of float><VLREF of struct>
                <Bits in float>y
                       -- LD_DEFVAR for y, offset is
                          number of bits in float
          {}}<VLREF of struct>
                       -- show end of struct V List
          {R}{01}Z1
                       -- reference to Z1 (global)
          {V}{0}<TREF of complex>
              <VLREF of enclosing VList><0>
              <SEGREF of Z1 segment>Z1
                       -- Z1 is external, of type complex,
                          offset of 0 in Z1's segment
          {S}{01}<SEGREF of enclosing segment)Z1
                       -- creation of global segment for Z1
          {L}[sizeof Z1]
                       -- set size of Z1 segment
          {O}{ORIGIN_RELOC}{0}<SEGREF of Z1 segment>
                       -- relocation triplet for size of Z1
          {A}<SEGREF of Z1><bit alignment of struct>
                       -- set alignment for Z1

Note that the code generator creates an LD_REFER directive for the reference to Z1 before it issues the LD_CRSEG directive that actually creates Z1's segment.

5.5 Names with Associated Scopes

The LD_SCOPEVAR directive is used instead of LD_DEFVAR, for names that have associated scopes: the names of functions and labelled blocks. LD_SCOPEVAR specifies all the information the LD_DEFVAR does and also connects the name with its associated scope. The format of the directive is

          {X}{class}<tref><vlref><offset>
             <segref><scope>"name"

where the <scope> argument gives the VLREF of the scope that is associated with the name, and all other arguments are the same as those for LD_DEFVAR. The <tref> will either be a function type (in which case the scope will be the outermost scope of the function given by "name"), or else a block label type (for languages that support labelled internal blocks).

5.6 Setting Scope Attributes

The LD_SCOPEFLAGS directive lets you set or change certain attributes of a scope (variable list). The directive has the form

          {z}<VLREF>{flags}{flags}...

where

{z}: is the ASCII lower case 'z'.
VLREF: is the VLREF of the scope whose attributes you want to set.
{flags}{flags}...: is a list of one or more bytes of flags.

A flag byte can have one of two forms:

          LF_SET | LF_flag
          LF_CLEAR | LF_flag

(where '|' represents the C bitwise OR operation). If LF_SET is used, the attribute flag is turned on; if LF_CLEAR is used, the attribute flag is turned off. Possible flag values are:

LF_ROOT_SCOPE: When this flag is on, the scope associated with the VLREF is the root (outermost) scope of a function.
LF_SAME_FRAME: When this flag is on, the scope associated with the VLREF uses the same stack frame as its parent scope.

There are four possible combinations of these flags.

Root scope on, same frame on: non-recursive internal function (not used in C);
Root scope on, same frame off: normal C function;
Root scope off, same frame on: C inner scope;
Root scope off, same frame off: special scope for C functions that accept a variable number of arguments. The auto variables and arguments are effectively in two different stack frames, and thus the autos must be placed in an extra inner scope of this type.

The flags used by LD_SCOPEFLAGS have the following numeric values.

          LF_SET           0
          LF_CLEAR         1
          LF_ROOT_SCOPE    2
          LF_SAME_FRAME    4

By default, LF_ROOT_SCOPE is off and LF_SAME_FRAME is on (which is appropriate for a C inner scope).

5.7 Line Numbers

The LD_LINETAB directive shows how source code is broken into text lines. This lets a debugger associate compiled code with lines in the original source file. The format of the directive is

          {l}<vlref><counter><line#>{stat_type}
                    <counter><line#>{stat_type}...

where

{l}

is a byte containing the ASCII character 'l' (lower case ell).

<vlref>

is a Dvalue giving the VLREF of the scope that contains a set of line numbers.

<counter>

is a Dvalue giving a location in the code segment corresponding to the beginning of a source code line.

<line#>

is a Dvalue giving the source code line number that is associated with that location given by <counter>.

{stat_type}

is a byte indicating what sort of statement was found on the line. The following codes are used.

         0 -- expression
         1 -- break statement
         2 -- goto
         3 -- continue
         4 -- return
         5 -- if
         6 -- test of for loop
         7 -- switch
         8 -- while
         9 -- repeat (Pascal)
        10 -- else
        11 -- assignment
        12 -- initialization of for loop
        13 -- do of do-while
        14 -- while of do-while
        15 -- call
        16 -- write/writeln (Pascal)
        17 -- with (Pascal)
        18 -- until (Pascal)
        19 -- miscellaneous
        20 -- increment of for loop
        21 -- end of if
        22 -- end of while
        23 -- end of for
        24 -- end of else-if
        25 -- end of switch
        26 -- beginning of function definition
        27 -- end of function definition
        28 -- return statement with expression
        29 -- file name
        30 -- beginning of inner scope
        31 -- end of with (Pascal)
        32 -- read/readln (Pascal)
        33 -- start of input file
        34 -- restore to previous file
        35 -- line number of end of file
        36 -- marks end of statement if ambiguous

Note that statements with many parts to them (e.g., for) have a different code for each part so that the parts may be distinguished.

The <counter>, <line#>, and {stat_type} values form a triplet. If a particular line has more than one statement or statement type on it, one of these triplets will be issued for each one.

In current compilers, the code generator does not put out an LD_LINETAB directive every time it comes to a new statement or statement type. Instead, it saves information about a number of statements and then puts out one large LD_LINETAB directive with a number of triplets in it. Accumulated LD_LINETAB information must be flushed before the compiler can change the name of the current input source file.

5.8 Debug Directive Summary

The LD_DEBUG_INFO directive provides a summary of information about the debugging directives associated with a module. By reading the LD_DEBUG_INFO directives, a program can determine important facts about the directives, in preparation for creating debugging tables for the module. LD_DEBUG_INFO has the form

          {d}(codescopes)(trefs)(filenames)
             (linetabs)(ltabentries)(vars)

where

{d}: is the ASCII character 'd'.
(codescopes): is a ULONG giving the number of code scopes used by the debugging directives.
(trefs): is a ULONG giving the number of TREFs used by the debugging directives.
(filenames): is a ULONG giving the number of distinct file names mentioned by the debugger directives. This includes both header files and source files. Note that it is possible for the same file to be mentioned several times in debugger directives (e.g. if the same header file is included more than once). However (filenames) only counts each file name once, even if it is mentioned by several directives.
(linetabs): is a ULONG giving the number of line tables containing code in the debug tables for the module.
(ltabentries): is a ULONG giving the number of LD_LINETAB directives for lines that contain executable code (i.e. every LD_LINETAB whose statement type value signifies executable code; at present, this covers all statement types from 0 through 32).
(vars): is a ULONG giving the number of vars in the debug directives.

6. Final Object Format

An object file coming fresh from a compiler's code generator has many LD_CRSEG, LD_NAME, and LD_REFER directives in it. Once the various library routines have been linked in to the object code, almost all of these can disappear.

Almost all segment creation directives can be removed -- once all the various pieces of the program have been brought together, the loader can choose an actual location for all the segments that are embedded in other segments. These segments can then be represented as offsets within the enclosing segment.

Since there are only a few segments which are not embedded in other segments, there are only a few LD_CRSEG directives required. LD_REFER and LD_NAME directives are not required because all references can be resolved. LD_ALIGN directives are not required because the embedded segments can be properly aligned at the time that the associated LD_CRSEGs are resolved. In fact, the entire object file can be reduced to a handful of LD_CRSEGs, a large number of LD_DATA/LD_RELOC pairs, and whatever debugging directives are to be included as part of the program. This format is called Final Object Format.

On most systems, object code never reaches Final Object Format. The object code is usually converted to the object format that is used on the target system. However, on a system that used LD object format as its object standard, Final Object Format would be used to store most compiled programs.

6.1 The Marker Directive

The LD_MARKER directive is used to indicate that an LD file is in Final Object Format, and to separate the file into its logical divisions. The format of LD_MARKER is

{*}

where {*} is a byte containing the ASCII character '*' (asterisk). The DATA field is empty.

LD_MARKER can be the first directive of an object module. Its presence indicates that directives of the module are arranged in Final Object Format. LD_MARKER also appears between key divisions of the object code. Below we show the divisions:

          * marker
          LD_CONTROL directives
          segment creation, reference, description
             directives (LD_CRSEG, LD_NAME, LD_REFER,
             LD_SEGINFO, etc.)
          * marker
          LD_DATA/LD_RELOC directives
          * marker
          Debugging directives

If no debugging tables are desired, LD may skip the debugging directives by stopping at the third marker.

7. Object Libraries

An LD object library is a collection of LD object modules, gathered into a single file for ease of use.

An object library should have the following properties.

There should be no duplicate primary global definitions in the library. When a module is to be added to the library, the program that adds the module should check that no names in the module conflict with existing names. If there is a conflict, either the new definition should replace the old or an error should be issued. Note that there can be numerous duplicate secondary global definitions, but only one standard global definition.
The program that maintains the library should not use module names as a uniqueness criterion. Like copyright and revision information, module names are primarily for the convenience of the user, and have little bearing on the way the object code is used. Instead, the library maintenance program should look at the actual symbols that are defined in the program.
The library should include an index of all global definitions so that the linker can rapidly locate the module that contains any definitions needed. In order to support LINFO's cross- referencing facilities, reference information should also be stored.

7.1 Library Length

There are several directives that are used solely inside LD libraries. These directives provide information about the library itself.

Each module has an associated LD_LENGTH directive, telling the length of in bytes. The directive has the format

          {b}(length)

where (length) is a ULONG value giving the total number of bytes in the library. The LD_LENGTH directive for a module immediately precedes the module in the library.

7.2 Library Index Directives

The index of a library describes the modules contained in the library. The location of the index is given by the seek address contained in an LC_LIB_HEADER directive at the beginning of the library file. (LC_LIB_HEADER was described in Chapter 3.)

The index contains several sections, separated from one another by LD_MARKER directives. The sections are listed below, in the order in which they appear.

Header and Module Definition Section: gives number of modules and their locations in the library (LD_INDEX_HEADER and LD_LOCATOR directives).
SYMDEF Section: contains an LD_INDEX_NAME and LD_INDEX_ENTRY directive for each SYMDEF in the library.
SYMDEF Cross-Reference: tells which modules contain SYMREFs to the SYMDEFs defined in the library. The information is presented as a sequence of LD_INDEX_ENTRY directives in the same order as the SYMDEFs listed in the previous section. Each LD_INDEX_ENTRY directive names the modules that contain references to the corresponding SYMDEF in the SYMDEF section.
SYMREF List and Cross-Reference: tells which modules contain which SYMREFs to names which are not defined in this library. For each such SYMREF, there is an LD_INDEX_NAME directive giving the name of the symbol, and an LD_INDEX_ENTRY directive telling the modules where the SYMREF appears.
Module Information Section: tells when each module was compiled, and the name of the file that contained the original source code (LD_INDEX_INFO directives).

The directives that make up these sections are described in later sections.

The end of the index is indicated by a directive with a first byte of an ASCII NUL (octal 000). This directive has a zero {length} byte, and therefore a zero checksum. The same sort of directive is used to mark the end of each module.

7.2.1 Number of Modules

The LD_INDEX_HEADER directive specifies the number of modules that are stored in the library. It has the format

          {H}(number)

This is always the first directive of the index. Its location is given by the seek address in the LC_LIB_HEADER directive at the beginning of the file.

7.2.2 Module Location

LD_LOCATOR directives appear immediately after the LD_INDEX_HEADER directive. There is one LD_LOCATOR directive for every object module in the library. These are given in the order that the modules appear in the library. The format of each directive is

          {I}<objlen><spacelen>

where

{I}: is a byte containing the ASCII character {I} (upper case 'i').
<objlen>: is the length (in bytes) of the object module.
<spacelen>: is the length (in bytes) of any unused space following the contents of the module.

This information is enough to let a program reading the file calculate the seek address of each module within the library.

7.2.3 Specifying Names

LD_INDEX_NAME directives specify the name of a SYMREF or SYMDEF. Such directives are found in the SYMDEF and SYMREF sections of the index. The format of the directive is

          {n}"name"

7.2.4 Index Entries

Each LD_INDEX_NAME directive is immediately followed by an LD_INDEX_ENTRY directive. These directives provide information about the symbol named in the LD_INDEX_NAME.In the SYMDEF section, the LD_INDEX_ENTRY directive lists modules that contain SYMDEFs; in the SYMDEF cross-reference section and the SYMREF section, the directive lists modules that contain SYMREFs. Modules are referenced by number; the first module is number 1. The format of LD_INDEX_ENTRY is

          {e}<module_number><module_number>...

When LD_INDEX_ENTRY directives appear in the SYMDEF section of the index, the first module number in the directive is the number of the library module that contains the primary definition of the symbol named in the accompanying LD_INDEX_NAME directive. (Recall that an LD library can only contain one primary definition for each SYMDEF.) If this kind of LD_INDEX_ENTRY directive contains additional module numbers, they tell which modules contain secondary definitions for the symbol. If a symbol has secondary SYMDEFs but no primary one, the first <module_number> in the list will be zero.

If a module contains a COMMON definition or reference to the symbol, the LD_INDEX_ENTRY directive will contain the negative of that module number.

7.2.5 Module Information

The Module Information section of the library index is made up of LD_INDEX_INFO directives. There is an LD_INDEX_INFO directive for each module in the library. The directives appear in the same order as the modules.

LD_INDEX_INFO has the format

          {m}'time'{mflags}"filename"

where 'time' gives the time and date that the module was compiled or assembled, {mflags} is 000 if tables are not present and 051 if they are, and "filename" is the file that contained the original source code for the module. The file name used will be the file name in the first LC_FILENAME directive that appeared in the original LD file.

8. Introduction to Run-Units

The remaining chapters of this document describe the use of an RU (run-unit) and its internal format.

An RU is a file that represents part of the in-memory image of a running program. We emphasize that it is only part of a program. A full program may be made up of the contents of an RU, plus data and software obtained from shared libraries and other RUs, as well as material supplied by the operating system.

Since an RU is only part of a program, it makes sense to use the RU format to represent anything that can be part of a program. In particular, the RU format will be used to represent shared libraries.

An RU is created at Link Time. When the RU is linked, the linker merges a number of object modules to form the RU. To reduce the amount of work that must be done in linking, the RU has the same format as an LD object module.

Since an RU only contains part of a program, it may contain references to items which are not found within the RU. Ideally, the linking process should resolve as many of these references as possible. The more references resolved at linking time, the fewer you have to resolve each time you run the RU.

We use the term instantiation for the point at which the contents of an RU are placed into virtual memory. This is not the same as "running" the program, since the contents of an RU may be placed into memory long before they are actually used. As we have said, an RU only contains part of a program, and the part that it contains may not be used for quite some time.

Since an RU only constitutes part of a program, a program may be put together from several RUs. These RUs can all be instantiated at the same time, or they can be instantiated as they are needed. For example, suppose a program occasionally needs to use an RU named X. If it usually doesn't need X, the RU loader may choose not to instantiate X when the rest of the program is invoked. Instead, it will only instantiate X if and when X is needed. This process is called dynamic linking

8.1 Glossary

Before we go on, it will be useful to introduce a number of new terms and to clarify the meaning of known terms.

Page:

The hardware unit of memory, equal to 1K words. Every page has a set of associated attributes, such as writable, privileged, etc. (GCOS8 calls the privileged attribute "housekeeping".) The hardware prevents you from performing operations that conflict with a page's attributes (for example, writing to a read-only page).

Partition:

A partition is an area of virtual memory. From the RU loader's point of view, the partition is the basic "atom" of a program. The RU loader allocates memory on a partition-by- partition basis; at the time of instantiation, a program is initialized partition-by-partition. The partition is the fundamental unit of virtual memory allocation. It is an analog of the hardware page; however, hardware pages always have the same size, while partitions can be as big or as small as you want.

A partition has associated attributes. Some of these are the same as page attributes (writable, privileged, and so on). Partitions may also have attributes which are not available with hardware pages (e.g. sharable).

Subpartition:

A subpartition is an area of virtual memory that lies inside a partition. A partition can be regarded as a subpartition of itself. A subpartition has attributes that may only be a subset of the attributes of the partition.

Descriptor:

A descriptor is a 72-bit quantity. It may also be called a hardware descriptor (to distinguish it from a logical descriptor, described later). There are two types of descriptors: entry descriptors and segment descriptors.

Entry Descriptors:

An entry descriptor is a descriptor that refers to a point in a domain. Through an entry descriptor, you can CLIMB to that domain.

Segment Descriptor:

A segment descriptor is a descriptor that refers to an area of virtual memory. The descriptor specifies the beginning and end of this area. Because of this, the descriptor is said to frame the memory area. A descriptor also specifies attributes of the memory area, such as read-only, privileged, sharable, etc.

Segment:

A segment is a structure imposed on an area of virtual memory by a segment descriptor. It is important to realize that a segment is not the area of memory itself; a segment is entirely a by-product of the segment descriptor. As an example of the distinction, consider a writable area of memory. We could create a segment descriptor framing that area of memory, but with the "writable" bit turned off. The segment generated by this descriptor would be read-only, even though the memory itself is writable.

Segments can overlap with one another, if the corresponding segment descriptors frame overlapping areas of virtual memory. For a well-formed RU, however, the memory area associated with a segment must be entirely contained within a single partition. (You can have a segment whose memory area is the entire partition, but you cannot have a segment with memory in more than one partition.)

There are two types of segments: descriptor segments and operand segments.

Descriptor Segment:

A segment that contains descriptors.

Operand Segment:

A segment that does not contain descriptors. We sometimes use "code segment" for a segment that contains executable code, and "data segment" for a segment that contains data. However, these concepts are not enforced by the hardware. For example, an application may write executable code into a segment as if it were data, then execute the code. Similarly, there are times when data and code may be intermixed in the same segment.

9. Parts of an RU

Each RU is divided into three parts: the body, the external profile, and the internal profile.

9.1 The Body of an RU

The body provides the actual memory image that the RU will produce, plus all the information needed to instantiate it. The memory image is divided into partitions, which are the basic building blocks of the associated program. Each partition is separate from the other partitions, and the instantiation process may arrange partitions in any order.

The body specifies the size of each partition, plus any initialization values for the partition. The initialization values may include hardware segment type information and flags, if a partition contains descriptors.

The entire partition represents an area of virtual memory. When the partition is instantiated, the pages for this memory area will be given sufficient permissions for the attributes of the partition; for example, the pages will be writable if the partition is. However, the partition itself is just an area of memory. The processor will not let code access this memory unless the code has a segment descriptor that frames the memory.

For the purposes of RU loading and initialization, the RU loader can generate descriptors as needed, by using its privileges. On the other hand, a program obtains its initial set of descriptors via the relocation in the RU.

Partitions are independent of each other. The RU loader may arrange them in memory in whatever order happens to be convenient or efficient. When partitions are small, the RU loader may put several partitions into the same virtual memory page, provided that they all have the same attributes. This allows more efficient use of memory.

The RU loader may generate several descriptors framing disjoint areas of a partition. However, the partition is still treated as an "atomic quantity". If part of the partition is swapped in or out, the whole partition is swapped in or out.

A partition can be reduced either in size or in attributes to give a subpartition. This is analogous to the hardware shrink operation.

It is possible to create segment descriptors that only refer to a part of a partition, but you cannot have a segment that straddles two partitions. Segment descriptors referring to the same region of memory can have different attributes; for example, one segment descriptor may let you write to the memory, while another marks the memory as read-only.

An RU can define as many partitions as it wants. This gives LD's RU format an advantage over the stock GCOS8 format, which only provides four partitions:

read-only data
writable data
read-only descriptors
writable descriptors

The stock GCOS8 arrangement does not provide sufficient control over sharing; if you want to share one piece of data, you have to share all similar data. The stock arrangement also precludes demand segmentation (discussed later).

In practice, LD does not place more than one segment in each partition. This gives the RU loader the maximum freedom to organize partitions, thereby allowing more efficient use of memory.

9.1.1 Information in the Body

Typically, the RU body specifies all the information needed to create the partitions of the RU, then all the information needed to initialize those partitions. The body may also contain references to items in other RUs; later on, we will describe how these are handled.

Linking between RUs takes place at the subpartition level. An RU that references a subpartition may generate further "sub-subpartitions", using descriptors that frame only part of a subpartition.

The body of the RU provides "handles" to all externally visible partitions, subpartitions and entry points. A handle is similar to a SYMDEF (but a SYMDEF is not necessarily a handle). External entities (for example, other instantiated RUs) use handles to get at partitions, subpartitions and entry points. These handles are the only access points that are available by name after an RU has been instantiated.

In addition to named entry points, an RU may have one unnamed entry, called the primary entry. Typically, an RU has named subpartitions/entries or a primary entry, but not both (although the format does not forbid it). The purpose of the primary entry is to give a point where execution of a program should start. In other words, it lets you "run the RU", without having to know a specific entry name.

9.2 The External Profile

The external profile of an RU is a kind of symbol table that provides information which can be used when linking other programs. For example, suppose that you're linking an RU named A, and A refers to an external function named B. When LD links A, LD is passed directives that indicate A refers to B. LD therefore searches through some set of RUs to find one that contains B.

The external profile of each RU describes its contents. LD will eventually find that function B begins at a particular offset within some partition in some RU. LD can use this information to resolve such references wherever they occur within A. (As we noted earlier, we want to resolve as many references from one RU to another at Link Time.)

When we say that references are "resolved", we do not mean that they will be completely resolved. Instead of being references to named functions or data objects, they are converted into references to offsets in partitions of other RUs. In our example above, the body of A will contain a reference (consisting of a descriptor and offset) to some point in a partition of the RU that contains B. When A is instantiated, this reference causes the instantiation of the RU that contains B.

9.3 The Internal Profile

The internal profile contains a symbol table and debugging information describing the body of the RU. This information is only relevant within the RU itself. It is only used by software like debuggers and the program that writes out dumps.

10. Program Invocation

When a user invokes a program, the associated RU is instantiated. If the body of the associated RU references other subpartitions or entries, the RU loader searches for these entries and subpartitions within other RUs supplied by the user or within shared RUs recognized by the loader. Some of these RUs may already be in memory (for example, standard support software like the Operator Segment); others may need to be instantiated. Once the RU loader has instantiated any required RUs, the loader can resolve the references that caused the search in the first place.

To resolve one reference, the RU loader may have to instantiate an RU that contains other references that also need to be resolved. Thus one reference may require the instantiation of several RUs.

Each partition is instantiated as an area of virtual memory, in pages whose attributes are compatible with the partition attributes. The RU loader must record the primary entry (if it exists), plus any externally-visible entry descriptors. The names of these entry points are recorded so that they can be used in search rules (described later).

A program may specifically name the files that contain the RUs that it needs, or else these files may be located through search rules.

11. Sharing

An instantiated RU may share material with other RUs. Sharing can take place between RUs put into execution by different users, or between RUs running simultaneously for the same user. Sharing reduces the total memory requirements for the system. It also makes possible a number of programming techniques that cannot be used when every program operates in its own separate environment.

Some people talk about "shared domains". However, on the DPS-8 and DPS-90, you do not share domains, you share segments (and thus you share the associated subpartitions). In essence, a subpartition can belong to several domains at once. It is incorrect to think of a subpartition as "belonging" to a particular domain.

If we must speak of ownership, it is better to speak of partitions being "owned" by RUs. A partition can only come into being two ways:

Through the instantiation of some RU. In this case, the partition belongs to the RU where it originated.
Through dynamic allocation. In this case, some software asked the operating system to obtain an empty partition (e.g. for dynamic memory allocation). Such a partition has no owner.

The ownership of a partition is mainly intended to control the deletion of the partition when the owner is finished with it.

A particular shared RU may have some partitions marked as sharable or unsharable. This marking is independent of the write permissions on the segment. When such an RU is instantiated into a shared working space, only the shared partitions are actually instantiated. The unshared ones are recorded, and the RU file is kept available for I/O. When any partition of such an RU is referenced from a user working space, all unshared partitions are instantiated and initialized in the user's working space (using information taken from the RU file). Any references from the shared RU are resolved at this point.

11.1 Sharing Subpartitions

As noted earlier, sharing is done at the subpartition level. Making a subpartition available for sharing is easy: you simply store the parent partition in a location that is available to all programs. In hardware terms, this means that you access shared subpartitions using descriptors that refer to a working space register whose value doesn't change as programs are swapped in and out.

At instantiation time, a partition with sharable subpartitions cannot contain a reference (descriptor) to an unshared partition.

11.2 Sharing Entries

An entry definition implicitly contains a reference to a partition: the partition that contains the linkage segment defined by the entry descriptor. Thus, for instantiation purposes, resolving an entry reference implicitly resolves a partition reference as well.

A routine invoked through a shared entry cannot refer to an unsharable partition.

11.3 Shared Library Units

In order to make shared library units (SLUs) useful in the GCOS8 NS mode environment, it should be possible for outside groups to write SLU software in any supported language, using a programming style that is natural to that language. Without this requirement, it will be impossible for sites to make use of commodity software like the IMSL library. (IMSL is a library of mathematical and statistical routines.)

As a result, each SLU must have a fixed SEGID for its code and another for its static data. If an SLU does not have these fixed SEGIDs, a library routine will have great difficulty locating the static data and other library routines. (Certainly, most software could be written in a style that avoided such difficulties, but this style is not natural in most programming languages.)

There are only 1000 fixed SEGIDs available for use. Realistically, we should leave half of these for user libraries and routines. This leaves 500 for system-supplied software. Since each SLU requires one segment for its code and one for its static data, this makes it possible to have 250 shared libraries. Of these, a good number of segments (maybe half) are used up for system operations.

Each SLU has its own pair of fixed SEGIDs. Obviously, no two SLUs can have the same SEGID(s); if any two have a SEGID in common, a program will not be able to use the two libraries together. To avoid conflicts, a large number of the available SEGIDs should be reserved for known products (e.g. the IMSL library).

The first part of each code segment should be a collection of transfer vectors to the library's routines (in later parts of the segment). With this organization, it is easy to add new routines and recode existing ones transparently. It is also easy to discontinue support for old routines, by changing the transfer vector to jump to an appropriate handler routine.

The first part of each static data segment should contain externally visible data. Ideally, these should be pointers, so that they are less likely to need to be moved as the library is changed. Private data should come in later parts of the segment. This lets you change the locations of the library's private data without affecting the outward appearance of the software.

11.3.1 Library Versions

There are two ways in which a library can be changed: transparently and visibly. A transparent change is not visible to user programs--the interface to all existing routines remains the same, as does the type and location of all user-visible data objects. A simple bug fix is a common sort of transparent change. A visible change is one that changes the appearance of the SLU (for example, changing the interface to an existing routine).

A visible change makes it necessary to recompile all programs that use that SLU (if the change is not backwards compatible). Thus visible changes should be avoided whenever possible.

To accommodate change, every SLU should have two version numbers: one that describes the visible version and one that describes the actual version (including transparent changes). When a user program uses an SLU, it should only have to specify the visible version. The actual version is only of interest to those who are maintaining the library (for example, when they are trying to track down bugs).

More generally, it is desirable to let SLUs specify a range of version numbers (or a list, or some other mechanism) that tells which visible/transparent versions that the SLU supports. This allows the SLU to specify its degree of backward compatibility. To allow non-ambiguous identification of SLUs, it is a good idea to associate some kind of checksum with each library as an additional identifier.

The overall SLU version number should appear in part of the exported reference name of the SLU. This will only be updated when a non-upward compatible visible change is made.

11.3.2 Using an SLU

There are two mechanisms that a program should have available when in wants to use an SLU.

A run-unit may contain a reference stating that it wants library X, (visible) version Y. When the run-unit is loaded, the RU loader would automatically bring in the desired library.
A run-unit that wants to use an SLU should be able to issue a system call saying, "I want to use library X, (visible) version Y." The system will then locate descriptors for the library and attach them to the program's linkage segment. This is needed because there are some languages (e.g. LISP) where library calls can be constructed "on the fly".

When we say that a program requests access to an SLU, we do not mean that the programmer must code an explicit call. In most languages, the call is generated automatically when the program is linked, and put into the domain start-up code. An explicit call would only be needed in languages which can construct calls on the fly, and even in such languages, the call could often be generated by underlying support routines.

Note that the program specifies version numbers at the time that the library is brought in. Therefore, the routines of the library do not need to check version numbers.

Each SLU needs an initialization mechanism which is invoked when dynamic attaching takes place. This initialization code is responsible for the following:

Checking to see if a version of the same SLU is already attached. If there is, the code will abort the program (with recovery allowed) if the attached SLU does not support the version being requested. The same sort of checking should take place when SLUs are attached in a static (link-time) manner.
Attaching any other SLUs that are required by the SLU being loaded. For example, if you are attaching an SLU containing IMSL routines written in C, the routines will probably require the SLU containing the standard C library.

11.3.3 Building an SLU

The first 32K of every SLU code segment will be reserved for transfer vectors. The actual routines of the segment will be linked with LD, and the transfer vectors filled in appropriately.

When an SLU is linked, LD generates an appropriate run-unit, including an external profile naming all the routines in the SLU and references to any other libraries that the SLU depends on.

For example, a C version of the IMSL library would probably make calls to the standard C library. The external profile would be referenced when linking any program that called the SLU. It would also be used when creating a new version of the SLU, to make sure that every visible name is put in the same place as in the previous version.

11.3.4 Naming Routines

Each SLU must have an associated version number. This is done by making version numbers part of each routine name. (Note: Readers may wonder why version numbers are not passed as arguments. Passing them as arguments just makes it that much harder to look at a routine and determine its version.)

Incorporating the version number into routine names (and into data names as well) makes it easy for one SLU to support several versions of the same routine: it just contains the different versions under their different names.

As an example of how you would support functions with version-specific names, suppose that a typical release contains enough routines to require 4K of transfer vectors. Then visible release 1 takes the first 4K, visible release 2 takes the next 4K, and so on. When the SLU finally decides it will no longer support visible release 1, the first 4K transfer vectors are freed for re-use. The version-checking mechanism makes sure that programs can no longer use the version 1 interpretations of those transfer vectors.

(Once an SLU contains a particular name associated with a location in the transfer vectors or the first part of the data segment, neither the name nor the location can be used by later releases, until support for the original release is discontinued.)

Using this scheme, a single SLU has no difficulty supporting several releases of the same routine. It is much easier and less bug-prone to use this approach, because it uses the same code as the previous release, not some attempted simulation. Also, supporting several releases eliminates the need to update a lot of code simultaneously; you can put in a new SLU without having to update a lot of old code.

12. RU Libraries

An RU library is a single file that contains several RUs. For example, you might have an RU library that contains several versions of the Operator Segment.

Different versions of the same RU will have similar external profiles. For example, if you have several Operator Segments in the same RU library, they will all define routines like .CALL, .RETRN, and so on. The external profiles will not have unique entries, and therefore cannot be used as the "keys" for distinguishing different RUs in the library.

Instead, the keys have to be the names of partitions in the body of each RU. These must be unique. For example, if each RU has a version of the Operator Segment, Version 1.0 may have a partition named OPSEG1.0, Version 1.1 may have a partition named OPSEG1.1 and so on.

13. RU Format

Having discussed the features that the RU format must support, we can proceed to describe the format itself.

Basically, the RU is a file made up of LD-style directives. This allows the RU to be read, written, and maintained by the same routines as LD files and libraries.

The internal profile consists of standard LD debug information directives. These directives may be converted into a more useful in-memory format at the time that the program is loaded under control of a debugger.

13.1 LD Directives

The body of the RU begins with a few LD information directives, stating that this is an RU and providing general information about the data (file name, create date/time, etc.). Specifically, we expect the following.

          LC_LDVERSION
          LC_TARGET_INFO
          optional LC_FILENAME,
                   LC_MODULE,
                   LC_REVISION,
                   LC_CPR,
                   LC_TITLE in any order
          RU_LOCATOR

The RU_LOCATOR directive gives the seek address of the beginnings of the external profile and internal profile. (When we talk about a seek address here, we mean a byte offset from the beginning of the RU itself, not the RU library as a whole.) RU_LOCATOR is fully described in a later chapter.

The body of the RU is divided into two parts. The first part contains directives that create:

the partitions of the RU
subpartitions contained in partitions
entry points
references to external names
globally-visible names

The directives used are RU_PARTITION, RU_SUBPARTITION, RU_ENTRY, RU_PRIMARY_ENTRY, RU_REFER, and RU_EXPORT. These are fully described in a later chapter.

There is no need to give each partition its own page or set of pages. The RU loader may put several partitions on the same page, provided that the attributes of all the partitions are the same. (We want to emphasize that attributes describe the hardware page that holds the partition, not the segments that frame all or part of the partition.) It is LD's job to ensure that the partition options are compatible with the segments that are placed in the partition.

The end of the first part of the body is marked with an

          LD_MARKER

directive.

The second part of the body consists of directives that initialize the contents of the partitions. These are called RU_DATA and RU_RELOC. They are similar to the LD_DATA and LD_RELOC directives that initialize memory in LD files.

The end of the body is indicated with an

          LD_END

directive.

The internal and external profiles follow the body of the RU. The profiles are made up of similar directives. The end of the RU is indicated by a zero directive (i.e. one consisting only of a null byte, followed by a zero length and checksum).

13.2 The Instantiation Process

The instantiation process begins with a request to do one of the following:

Find a named entry;
Find a named subpartition;
Find the primary entry of an RU.

Such requests are resolved by locating the appropriate RU to instantiate.

When the RU loader is asked to instantiate an RU in user space, it begins by going through all the partition information at the beginning of the body of the RU. These contain references to other RUs, which are also read and their partitions instantiated. Instantiation (and initialization later on) therefore takes place recursively.

The loader instantiates all the required partitions before initializing any of them. This means that all the necessary memory is allocated before any of the contents are laid down. This allows the location of partitions to be shifted around if necessary, without worrying about moving contents too. If your requirements exceed the amount of memory available, you find out at this step, before you've spent the time of initializing the memory.

Once all the partitions from all relevant RUs have been created in memory, the RU loader begins to copy the initialization data into the partitions. Relocation takes place at this time. Relocation can be performed quickly, because everything else is already in memory.

Once the partitions have been set up, the RU loader walks through the descriptor partitions, performing validity checks on the descriptors. For example, the validity check makes sure that the descriptors do not refer to privileged memory, and that descriptor segments (hardware descriptor types 1 and 3) do not frame parts of non-descriptor partitions. This is necessary, because users should not be able to create arbitrary descriptors by patching their RU files.

When all this has been done, the RU has been properly instantiated. Once the RU has been instantiated, the requested entry or partition is returned to the caller.

If the requested named entry point or subpartition belongs to a shared RU, only the unsharable partitions of the RU are instantiated in user space. The shared partitions are already in the shared working space. As with an unshared RU, any other RUs referenced by the shared RU are also instantiated.

14. Dynamic Linking and Demand Segmentation

Demand segmentation is based on the following set-up. A hardware descriptor has a bit that indicates whether or not the associated virtual memory is actually in core. If the bit is off, the memory is not present (for example, it's been swapped out). If you try to use such a descriptor to access the associated memory, you get a fault.

Now, there are several ways you can deal with this fault. In GCOS8, the fault handler's standard response to the fault is to abort the program. A fault handler that supported demand segmentation would bring the partition containing the memory back into core. It would then replace the original descriptor with one that has the appropriate bit turned on, indicating that the partition is now back in core, and it would update all other descriptors which framed parts of that partition. Execution then resumes, performing the same operation with the corrected descriptor. At some later time, the partition may be swapped out again and any descriptors into it are marked "not present" again.

When a desired partition is missing, it is not in memory anywhere. As we noted earlier, the operating system does its swapping using partitions rather than individual segments; therefore, if a segment is a missing, all segments in the same partition are also missing. When a partition is swapped out, descriptors to any parts of the partition must be marked missing "simultaneously"; similarly, when a partition is swapped back in, all relevant descriptors must be changed "simultaneously".

14.1 Dynamic Linking

Dynamic linking is a somewhat similar concept, but is only expected to take place on a CLIMB instruction. It works with a dynamic linking descriptor. A dynamic linking descriptor takes the place of a normal descriptor. Four bits in the descriptor say "This is a dynamic linking descriptor." The other 68 bits contain additional information that we'll discuss shortly.

Every time you attempt a CLIMB, the hardware checks the four bits in the descriptor to see if it's a dynamic linking descriptor. If it is, the hardware triggers a dynamic linking fault, thereby invoking the associated fault handler. The operating system's standard fault handler examines the other 68 bits of the dynamic linking descriptor and figures out what happens next. Typically, those 68 bits indicate an RU that should be instantiated; for example, they can contain a pointer to a memory address that gives the name of the RU. In this case, the fault handler issues a call to instantiate the RU and then arranges for a CLIMB to the appropriate entry point in the RU.

When we say "arranges for a CLIMB", there are two ways it can be done:

"Snapping the link". With this approach, the original dynamic linking descriptor is changed to a normal descriptor to the appropriate entry point. This normal descriptor can then be used in future CLIMB operations to do immediate CLIMBs.
You can leave the dynamic linking descriptor as is. Each time the program tries to use this descriptor, a dynamic linking fault occurs, and the fault handler has to deal with the situation.

For the sake of efficiency, snapping the link is desirable. However, there are a number of situations in which it is unwise. Thus the link should only be snapped if both the caller and callee agree to snap it. Part of the information for a dynamic linking descriptor must specify this information. There is therefore an "okay to snap" bit associated with each segment and entry point definition.

From the previous discussion, it should be apparent that there are several important differences between demand segmentation and dynamic linking. Once a dynamic link has been snapped, the linked- in material never disappears; however, a segment that has been swapped in through demand segmentation may be swapped out again at some later time. Even if a dynamic link has not been snapped, the target of the link operation may be somewhere in memory; in demand segmentation, the whole point is that the desired segment isn't present in memory.

It is possible for one partition to have several dynamic links to another partition. Some of these may be snapped while others are not. On the other hand, descriptors to a swappable segment are all valid (or invalid) at the same time.

15. LD Directives for Run-Units

In this section, we give details of the directives that describe RU (run-unit) constructs.

Before we do that, we want to discuss the concept of a logical descriptor. A logical descriptor serves the same purpose as a hardware descriptor: framing a block of memory. The difference is that logical descriptors are not restricted by the hardware descriptor format. The hardware restrictions only apply to the actual (program-accessible) descriptors created by the relocation operations (which will themselves be remarkably similar to the operation of RU_SUBPARTITION and RU_ENTRY described below). For example, a logical descriptor framing a partition need not be limited to the one megabyte maximum size that applies to a type 0 descriptor.

An RU directive that defines or references a new entity (e.g. an entry point, partition, or subpartition) implicitly reserves an RUREF for that entity. RUREFs are similar to the SEGREFs that are used in normal LD object files. RUREFs are simply integers which are assigned sequentially. Once an RUREF has been defined, subsequent RU directives use the RUREF to refer to the associated entity.

Each RUREF is associated with a logical descriptor that refers to the same entity.

15.1 RU_REFER

The RU_REFER directive indicates that the RU references some external entity. The directive takes the same form as the LD_REFER directive:

          {R}{flags}"name"

where

{flags}: is a byte of flags ascribing attributes to the external entity. Possible flags are described below.
"name": is the name of the entity.

The reference to "name" is resolved using the standard search rules. During instantiation, an RU_REFER may result in the instantiation of other RUs if necessary.

RU_REFER automatically reserves an RUREF for the entity with the given name. It also generates a logical descriptor. This descriptor is a copy of the logical descriptor for entity that is found in the RU that actually contains (defines) "name".

Possible flag values are listed below, with their values:

LF_GLOBAL (1):: Symbol is global and external. This will always be specified.
LF_SECONDARY (4):: Indicates secondary symbol.
RUF_STATIC_LINK (8):: Indicates that at least one internal use of the symbol is non-dynamic and that the name must therefore be resolved immediately, during the instantiation process.

15.2 RU_PARTITION

The RU_PARTITION directive creates a partition. It has the form:

          {/}<size><alignment>{flags}

where

<size>: is the size of the partition.
<alignment>: is the alignment of the partition.
{flags}: specifies attribute flags for the partition. Possible flags are listed in the next section.

RU_PARTITION automatically reserves an RUREF for the partition. It also creates a logical descriptor that frames the entire memory area for the partition. The {flags} argument describes the attributes needed in the page that will hold the partition. The logical descriptor specifies the most permissions possible for the partition, restricted by options given for the page that holds the partition.

15.2.1 Possible Partition Flags

The possible flag values for partitions are represented by symbolic names, each of which begins with RUF. Below we list these flags and their values.

RUF_READ (1):: Indicates that the entity is expected to be readable.
RUF_WRITE (2):: Indicates that the entity is expected to be writable.
RUF_EXECUTE (4):: Indicates that the entity is expected to be executable.
RUF_PRIVILEGED (8):: Indicates that the entity is expected to be privileged.
RUF_DESCRIPTORS (16):: Indicates that the entity may contain descriptor segments, and therefore descriptors. This requires RUF_PRIVILEGED. This cannot be changed by an RU_SUBPARTITION directive (described later).
RUF_SHARE (32):: Indicates that the entity can be shared. This cannot be changed by an RU_SUBPARTITION record.

RUF_WRITE and RUF_PRIVILEGED effectively describe the attributes of the page table entry using the write control bit and housekeeping bit. These bits will also be used in constructing descriptors for the associated segments.

15.3 RU_SUBPARTITION

The RU_SUBPARTITION directive defines a subpartition of a partition. The directive has the following form:

          {<}<size><offset><parent>{pflags}

where:

<size>: is the size of the new subpartition.
<offset>: is the byte offset of the subpartition within the containing partition or subpartition (i.e. the parent). If this is zero, the subpartition begins at the start of the parent.
<parent>: is the RUREF of the partition or subpartition that will contain the new subpartition. The parent must already have been created with a previous RU_PARTITION or RU_SUBPARTITION.
{pflags}: specify attributes for the subpartition. Possible flags are the same as those for RU_PARTITION plus one new one (described below). These flags will be ANDed with those of the parent. If this subpartition or its parent have the RUF_SHARE or RUF_DESCRIPTORS attribute, both of them must have it.

An RU_SUBPARTITION directive may have a flag that is not available for RU_PARTITION:

RUF_NEW_SIZE (64):: Indicates that the new subpartition's size should be reduced to <size> bytes. If this flag is not specified, the <size> argument is simply ignored; the subpartition will extend from the beginning <offset> to the end of the parent.

If RUF_NEW_SIZE is not specified and <offset> is zero, the newly created subpartition represents the same memory area as the parent.

As is probably obvious, RU_SUBPARTITION effectively describes a shrink operation. RU_SUBPARTITION implicitly reserves an RUREF for the subpartition being defined.

15.4 RU_EXPORT

The RU_EXPORT directive is similar to the LD_CRSEG directive in a normal LD file. It creates a name that should be made available for other RUs to reference. This may be the name of a partition or subpartition.

The directive has the format

          {S}{flags}<RUREF>"name"

where

{flags}: are the same sort of flags that can be used with RU_REFER. If you want to restrict the attributes on an item when you export it, create a subpartition with restricted attributes and then export the subpartition.
<RUREF>: is the RUREF of the partition or subpartition that is being made available for use by external RUs.
"name": is the name that will be known to the outside world.

The {flags} must include the LF_GLOBAL flags from the LD object format. They may also contain LF_SECOND.

When an RUF_STATIC_LINK flag is used in an RU_EXPORT directive, it indicates that the static links to the name are allowed (e.g. non-dynamic references or dynamic references that may be snapped).

15.5 RU_ENTRY

The RU_ENTRY directive defines an entry point. In so doing, it creates an entry descriptor. The directive has the following form:

          {:}<LSR_ref><iseg><ic>

where

<LSR_ref>: is the RUREF of the LSR segment for the entry descriptor.
<iseg>: is an index in descriptors into the linkage segment given by <LSR_ref>. The <iseg>'th descriptor frames the starting instruction segment.
<ic>: is the instruction counter for the entry descriptor.

Every RU_ENTRY directive reserves an RUREF for the entry.

15.6 RU_PRIMARY_ENTRY

The RU_PRIMARY_ENTRY directive tells which entry (if any) should be considered the primary entry in an RU that contains a normal program. The directive has the form

          {@}<entref>

where

<entref>: is a Dvalue giving the RUREF of the entry that is the primary entry.

15.7 RU_DATA

The RU_DATA directive is analogous to the LD_DATA directive for LD files. RU_DATA specifies data for a partition. It has the form

          {L}<org_seg><offset>{waste_len}
               {waste}...{data}...

where

<org_seg>: is a Dvalue giving the RUREF of the partition that the directive initializes.
<offset>: is a Dvalue giving the byte offset at which the data should begin to be laid down.
{waste_len}: is a byte giving the total number of "waste bytes" that have been added before the actual data, for alignment purposes.
{waste}...: is the waste bytes themselves. There are usually no more than three of these.
{data}...: are the data bytes that should be stored in the partition.

We expect that most RU_DATA directives will have the extended record format described earlier in this manual.

15.8 RU_RELOC

The RU_RELOC directive corresponds to the LD_RELOC directive used to relocate data in LD files. It has the format

          {O}<partition>triplets

The <partition> argument is a Dvalue giving the RUREF of the partition being relocated. After this come relocation triplets, each of which tells how to relocate items that appeared in the preceding data.

The only type of relocation that is used at this point is probably "descriptor relocation", creating a descriptor that will be stored in a descriptor partition. For this reason, there is very little relocation necessary: normally only one relocation for each descriptor in the linkage segment (maybe 1000 relocations in all). Thus we don't have to worry about the cost of deciphering Dvalues; we just won't be doing it that often.

A relocation triplet consists of three values:

A Dvalue giving a word offset from the beginning of the entity given by the preceding RUREF.
A single byte giving the relocation code. At present, the only relocation code is DESCRIPTOR_RELOC (decimal value 16).
A Dvalue giving an RUREF. This RUREF may refer to another entity in the same RU, or an entity in a different RU.

Together, the offset and the RUREF specify a location in the RU. The instantiation process is supposed to store an appropriate descriptor at this location.

Before relocation takes place, the RU holds 72 bits of data in the location that is supposed to hold the descriptor that is being generated. This 72 bits, plus the RUREF labelled (c) above, provide enough information to perform the relocation. The 72 bits contain the following information:

Bits 0-19:: Unused.
Bits 20-28:: Hardware descriptor flags.
Bits 29-30:: Dynamic descriptor options. If these bits are both zero, you get a regular descriptor. Otherwise you get a type 5 descriptor (dynamic linking). In this case, the RU loader records additional information that will allow the link to be resolved later. If the second bit is 0, a dynamic link may be snapped; otherwise, it may not be.
Bit 31:: Unused.
Bits 32-35:: Hardware descriptor type.
Bits 36-71:: Unused.

The program loader will build a descriptor with the given type and attributes, framing the referenced entity. The type must be compatible with the type of the entity; for example, only types 8, 9, and 11 are allowed if the referenced entity is an entry.

15.9 RU_LOCATOR

The RU_LOCATOR directive makes it possible to find the important parts of an RU in an RU file. The directive has the form

          {>}(ext_prof_seek)(int_prof_seek)

The two arguments are byte seek addresses relative to the beginning of the RU. When a file contains a single RU (as opposed to RU libraries), these will be actual seek addresses. The (ext_prof_seek) tells where to find the external profile, and the (int_prof_seek) tells where to find the internal profile. A zero value for either of these indicates that the corresponding section does not exist.

Appendix A: Summary of Directives

Below we summarize the various directives of the LD object format. The LENGTH and CHK bytes have been omitted. The following notation is used:

          {byte}
          (ulong)
          "string"
          'time'
          <Dvalue>
          [TWORD]

(Definitions of the above concepts are given in Chapter 1.) Directives that implicitly generate a REF (SEGREF, TREF, VLREF) are marked as such.

LD_MODULE: {M}"name"

tell module name

LD_CONTINUATION: {&}data

continue previous directive data

LD_END: {E}

mark end of module

LD_BEGIN: {B}<segno>

begin numbering segments at given number

LD_CRSEG: {S}{flags}<parent SEGREF> "name" ->segref

create segment of given name in parent segment

LD_NAME: {N}{flags}<offset><parent SEGREF>"name" - >segref

associate name with given word offset in parent segment

LD_REFER: {R}{flags}"name" ->segref

generate SYMREF for given name

LD_POOL: {=}<parent SEGREF>

create literal pool segment in parent segment

LD_ALIGN: {A}<SEGREF><bit alignment>

segment should have given bit alignment

LD_DATA: {L}[data][data]...

specify (machine word) contents of segment

LD_RELOC: {O}{reloc-code}{TWORD#}<SEGREF>

relocate word in associated LD_DATA

LD_BIT_DATA: {1}<symbol><bit_offset><Nbits> [data][data]...

assign bit data, beginning at given bit offset from given symbol; length is given by Nbits

LD_LITERAL: {[}<referencing SEGREF> ->segref

begin literal referenced in given segment

LD_END_LITERAL: {]}<pool SEGREF>

end literal and put in given pool segment

LD_SEGINFO: {s}<SEGREF><length><bit alignment>

segment has given length (words) and given alignment

LD_SYMOPTS: {o}<SEGREF>"characteristics"

set system-dependent characteristics for segment

LD_MARKER: {*}

separate sections in sorted object file

LD_CONTROL: {#}sub-directive

control directive:

LC_TARGET_INFO: {#}{I}{byte_len} {reloc_unit} {origin_size}"name"

target machine "name" has given byte length, relocation units, and origin size;

LC_FILENAME: {#}{F}"filename"

source file name

LC_REVISION: {#}{R}"revision"

revision info

LC_CPR: {#}{C}"copyright"

LC_TITLE: {#}{T}"title"

program title

LC_SECONDARY: {#}{S}"name"

symbol "name" is secondary (deprecated)

LC_MODULE: {#}{M}"name"

tell module name

LC_LIB_HEADER: {#}{H}{version} (index seek address) (end seek address)"LH_LIB_NAME" (bytes before first module) 'create_time'{type}

object library header

LC_LDVERSION: {#}{V}{version}'createtime'{type}

LD version

LC_PATCH: {#}{P}'time'<refno><offset> [data_word]<nrelocs>[{code}<symbol>]* "comment"

shows a patch that should be made to the program when it is loaded

LC_PATCH_RECORD: {#}{P}'time'<refno><offset> [data_word]<nrelocs>[{code}<symbol>]* "comment"

records a patch that has already been made to the program

LD_DEFVLIST: {v}<enclosing VLREF> ->vlref

start new scope (variable list)

LD_ENDVLIST: {}}<vlref><start_count> <end_count>

end existing scope (variable list)

LD_INITTREF: {0}<starting TREF>

specifies starting point for type TREFs

LD_DEFTYPE: ->tref

describe type:

{T}{0}<TREF><VLREF>"name": typedef of type TREF known within scope VLREF
{T}{1}<VLREF>"tag" ->vlref: structure with tag, in VLREF scope
{T}{2}<VLREF>"tag" ->vlref: union with tag, in VLREF scope
{T}{3}<VLREF>"tag" ->vlref: enum class with tag, in VLREF scope
{T}{4}<TREF>: pointer to type TREF
{T}{5}<TREF><length>: array of type TREF and given length
{T}{6}<TREF>: function returning type TREF
{T}{7}<TREF><length>: bit field of type TREF
{T}{8}<TREF>: "const" of type TREF
{T}{9}<TREF>: "volatile" of type TREF
{T}{10}<TREF>: far object
{T}{11}<TREF>: near object

LD_DEFVAR: {V}{class}<TREF><VLREF><offset> <SEGREF>"NAME"

declaration for variable NAME, type TREF, storage class {class}, scope VLREF, at given offset in given segment

LD_SCOPEVAR: {X}{class}<TREF><VLREF><offset> <SEGREF><SCOPE>"NAME"

declaration for variable NAME, with associated SCOPE

LD_LINETAB: {l}<VLREF><counter><line#> {stat_type}<counter><line#> {stat_type}...

line in scope VLREF at memory location <counter>, line number <line#>, statement type {stat_type}

LD_SCOPEFLAGS: {z}<VLREF>{flags}{flags}...

scope VLREF should have given attribute flag settings

LD_LENGTH: {b}(library length in bytes)

length of library

LD_LOCATOR: {I}<objlen><spacelen>

object module is <objlen> bytes long, followed by <spacelen> bytes of unused space

LD_INDEX_NAME: {n}"name"

specify name of SYMDEF or SYMREF in library

LD_INDEX_ENTRY: {e}<module number>

specify modules containing SYMREF or SYMDEF for symbol named in preceding LD_INDEX_NAME

LD_INDEX_HEADER: {H}(number of modules)

number of modules in library

LD_INDEX_INFO: {m}'time'{mflags}"filename"

module compiled from "filename" at given time, {mflags} is 000 if tables are not present, 051 if they are

LD_DEBUG_INFO: {d}(codescopes)(trefs) (filenames)(linetabs) (ltabentries)(vars)

debugging info: number of code scopes, TREFs, distinct file names, line tables, line table entries, and vars

The following flags are used in LD_CRSEG, LD_NAME, and LD_REFER.

          GLOBAL    == 1
          COMMON    == 2
          SECONDARY == 4

A.1 RU Directives

Below we list the directives that are only found in run-units.

RU_REFER: {R}{flags}"name": this run-unit refers to an external entity with the given name
RU_PARTITION: {/}<size><alignment>{flags}: create a partition
RU_SUBPARTITION: {<}<size><offset> <parent>{pflags}: define a subpartition within a parent partition
RU_EXPORT: {S}{flags}<RUREF>"name": indicate that a given name is available for external references.
RU_ENTRY: {:}<LSR_ref><iseg><ic>: define an entry point, creating an entry descriptor.
RU_PRIMARY_ENTRY: {@}<entref>: tell which entry should be considered the primary entry
RU_DATA: {L}<org_seg><offset> {waste_len}{waste}...{data}...: specify data for a partition.
RU_RELOC: {O}<partition>triplets: relocate data in a run-unit; each triplet is a Dvalue giving a word offset from the beginning of the entity given by the preceding RUREF, a single byte giving the relocation code, and a Dvalue giving another RUREF.
RU_LOCATOR: {>}(ext_prof_seek)(int_prof_seek): specify location of external and internal profiles

Appendix B: Deprecated Constructs

This appendix describes constructs that have appeared in earlier versions of the LD object format, but are now considered obsolete.

B.1 LC_SECONDARY

The LC_SECONDARY sub-directive of the LD_CONTROL directive has the format

          {S}"symbol_name"

where {S} is the ASCII character 'S' and "symbol_name" is a string giving the name of a symbol defined or referenced elsewhere in the module. If the symbol is defined in the module, LC_SECONDARY indicates that it should be a secondary SYMDEF; if it is just referenced, it is a secondary SYMREF. In C program, the information is obtained from a #pragma secondary preprocessor directive.

Instead of using LC_SECONDARY, the current version of the LD format marks symbols as secondary using the "flags" argument of the directive that defines or references the symbol.

Appendix C: Future Directions

This appendix describes the ways in which we intend to change the LD object format in the near future. We also list some enhancements which are desirable but not yet fully designed.

C.1 LD_DELETE

The LD_DELETE directive will have the format

          {D}<segref>

This directive removes an existing segment with the given SEGREF from the LD symbol table. If the symbol with the deleted name appears again in the object code, it will be taken as a reference to a new symbol.

LD_DELETE can be used to generate a linked list (e.g. of initialization code) at link time. For example, suppose some module refers to a symbol named "list". A subsequent module can then issue the following sequence of directives.

          LD_NAME "list"
          LD_DELETE "list"
          LD_REFER "list"

The LD_NAME directive defines "list" at the current location. The LD_DELETE then deletes the "list" symbol, and the LD_REFER creates a reference to another instance of "list" defined in some subsequent module. Thus the previous module's reference to "list" will be resolved to the "list" defined in this module, but this module's references to "list" will be resolved to a "list" in a future module. When all the modules are linked together, the set of things known as "list" will turn into a linked list.

C.2 Needed Enhancements

There needs to be some way to specify a machine-dependent version number for an LD file. This is required because relocation codes on a machine may change from version to version.

There must be some way to record what produced an LD object module originally (e.g. a C compiler, or the YAA assembler.)

The debug information structure must be supplemented. At present, the debug directives cannot represent everything that LD's internal structures can; they are not capable of supporting some data types from some programming languages; and they cannot keep track of combined debugging tables obtained by linking several modules.

Some relocation codes need to be split up. For example, on the DPS-8 we need to distinguish between adding a word offset to the bottom 18 bits of a word and adding the word offset to the entire 36-bit word.

It would be useful to have versions of several directives (e.g. LD_NAME) which took offset sizes in bits rather than bytes or TWORDs.

Appendix D: Target Machine Dependencies

This appendix examines features that depend on the intended target machine.

D.1 Bull HN DPS-8 Family

The Bull HN DPS-8 machine family includes the Bull HN DPS-88 and DPS-90 hardwares. Operating systems running on this hardware include G.E.'s MARK III, and Bull HN's GCOS8 (SS and NS mode) and CP6. On this family of machines, a byte is 9 bits long and a TWORD is 36 bits long (one machine word).

D.1.1 Relocation Codes

Possible relocation codes for LD_RELOC directives are identified by the following keywords.

BIT_RELOC (0x01): TWORD is 36-bit data word; relocation adds symbol's bit offset to it.
BYTE_RELOC (0x02): TWORD is 36-bit data word; relocation adds symbol's byte offset to it.
WORD_RELOC (0x03): TWORD is 36-bit data word; relocation adds symbol's word offset to it.
BIT_AR_RELOC (0x04): add symbol's bit offset to bits 3-23 of TWORD (BDSC format).
BYTE_AR_RELOC (0x05): add symbol's byte offset to bits 3-19 of TWORD (ADSC9 format).
WORD_AR_RELOC (0x06): add symbol's word offset to bits 3-17 of TWORD.
BIT_ADDR_RELOC (0x07): add symbol's bit offset to bits 0-23 of TWORD (BDSC format).
BYTE_ADDR_RELOC (0x08): add symbol's byte offset to bits 0-19 of TWORD (ADSC9 format or LAR operand).
WORD_ADDR_RELOC (0x09): add symbol's word offset to bits 0-17 of TWORD.
PTR_RELOC (0x0A): add symbol's bit offset to bits 0-23 of TWORD, put symbol's SEGID in bits 24-35.
SEGID_RELOC (0x0B): put symbol's SEGID in bits 24-35 of TWORD.
SEGID_DL_RELOC (0x0C): put symbol's SEGID in bits 6-17 of TWORD (e.g. for LDP0 x,dl).
LPTR_RELOC (0x0D): can only be used on EPPR opcodes with zero tag field and no AR modifier. It cannot be combined with other relocations. If the target symbol (plus the offset in the instruction's address field) can be correctly obtained with an EPPR opcode, the EPPR is preserved and this code acts like WORD_ADDR_RELOC. Otherwise (e.g. if the target symbol is in another segment), a literal is created containing a pointer to "Symbol+offset", and the EPPR is changed into an LDP of that literal.
FADDR_RELOC (0x0E): is like WORD_ADDR_RELOC, but for functions instead of code. This is obsolete.
FPTR_RELOC (0x0F): is like LPTR_RELOC, but for functions instead of code. This is obsolete.
RADDR_RELOC (0x10): for internal use only; will not appear in LD files.
SEGREL_RELOC (0x11): for internal use only; will not appear in LD files.
BIT_NEG_RELOC (0x12), BYTE_NEG_RELOC (0x13), WORD_NEG_RELOC (0x14), BIT_AR_NEG_RELOC (0x15), BYTE_AR_NEG_RELOC (0x16), WORD_AR_NEG_RELOC (0x17), BIT_ADDR_NEG_RELOC (0x18), BYTE_ADDR_NEG_RELOC (0x19), WORD_ADDR_NEG_RELOC (0x1A): are like BIT_RELOC, BYTE_RELOC, etc. except that the symbol's offset is subtracted from the TWORD rather than added.
CANCEL_RELOC (0x27): tells LD to ignore the associated symbol and cancel any relocation for it.

D.1.2 DPS-8 Symbol Options

The LD_SYMOPTS directive lets you specify attributes for a symbol. Attributes are specified by strings. Some strings have the form +word; these turn an option or bit on. Others have the form -word; these turn an option or bit off.

The following options can be used to set or clear NSA segment flags:

          +read       +write     +save         +cache
          -read       -write     -save         -cache
          +extended   +execute   +privileged   +accessible
          -extended   -execute   -privileged   -accessible
          +bounded    -bounded

You can also set segment flags with either of the following.

nsa_flags=N: indicates that the NSA flags word should be set to the given value (where N is an integer).
nsa_flags|=N: turns on the NSA flag bits that correspond to on-bits in the integer N. Other NSA flag bits are not affected.

The default segment flag settings are 0553, which means

          +read      +save   +cache   +executable
          +bounded   +accessible

By default, segments will be taken to be Type 0; this can be changed with the following LD_SYMOPTS option.

type=N: indicates that the symbol should be stored in a Type N segment (where N is a number from 0 to 15). This option may also be used for an "entry" symbol (as described below), to name the entry descriptor type.

Many of the other LD_SYMOPTS options influence the behavior of the LD output writer which creates run-units. Loosely speaking, this output writer attempts to reduce the number of symbols, by putting eligible symbols inside other segments. Various symbol options are required to control this process. Below, we list the relevant options.

segment: indicates that the given symbol is a hardware segment in its own right and should not be implicitly bound into another segment by the RU output writer.
entry: indicates that the given symbol is a domain entry point and that an entry descriptor will be generated to point to the symbol's location. Use of an entry symbol for relocation will give a zero offset, and the SEGID of the corresponding entry descriptor. The entry option automatically sets the +save NSA segment flag, and type=11.
+flexible: says that a segment symbol can serve as a parent segment to other symbols. This is the default.
-flexible: is used to indicate that a segment symbol cannot have other symbols bound in with it.
+type12ok: can be used to mark a Type 0 block which is eligible to be a child of a Type 12 block. Usually, the RU output writer will only bind child segments to parents if the child and parent have the same NSA segment flags and same NSA type. This flag indicates that the software is able to deal with the situation if this Type 0 object is placed in a Type 12 segment.
-type12ok: says that this block should not be bound as a child to a Type 12 block. This is the default.
+first32k: indicates that the segment should be stored in the first 32K words of its hardware segment.
-first32k: turns off the "First32K" flag. This is the default.
delete: indicates that the symbol is eligible for "deletion". "Deletion" is done by making the symbol local to the output module, even though it was originally global.

The RU output writer uses all this information to bind the segments of the program into separate hardware segments. Segments with the segment property are hardware segments. Segments without the segment property are bound with +flexible segment segments wherever possible; the +first32k flag is taken into consideration as this binding takes place.

The remaining symbol options are needed for proper handling of Bull HN OM object modules.

constant: is used on a SYMREF to generate a CONREF in an OM.
keepname: indicates that the name of a local symbol should be preserved after linking. Normally, locally named objects are changed to unnamed (or dummy-named) objects to save space in the LD file. This option is used to prevent this process. As an example, debugging schema information in an OM must be given the local name "(SCHEMA)", and this name must remain available after linking.
+fillpage: causes any other object after this one in memory to begin on a page boundary.
-fillpage: turns off the "Fillpage" flag.

Appendix E: LD Utility Routines

The LD software family is distributed with a library of routines that can be used to manipulate LD object files. This appendix briefly describes these routines, using prototypes of the C programming language to show how the routines are called.

E.1 General Concepts

The LD utility routines work with at most two files: an input file and an output file. The input file is assumed to be in standard LD format. The output file will be written in standard LD format.

In general, routines that operate on the input file have names that begin with lr_. Routines that operate on the output file have names that begin with lw_.

It is possible to read and write on the same file, provided that it is an LD library. To do this, the library should be opened with the lw_lopen routine. A library opened in this way will always be in read mode or in write mode. When it is in read mode, the library is used as the input file, and you can have a second file for the output file. Similarly, when a library is in write mode, you can have a different file as the input file. The functions lw_lwrite and lr_lread can switch a library from read mode to write mode, and vice versa. However, you cannot switch to write mode if you already have an open output file, and you cannot switch to read mode if you already have an open input file.

An non-library output file (opened with lw_open) can be changed to the input file with a call to lw_reread". However, it cannot be changed back.

The LD utility routines do not allow error recovery. If a routine encounters an error of some sort, it simply prints out an appropriate message and terminates the calling program. The one exception to this rule is lr_test, which returns a status if it fails to open a file.

Several C typedef definitions are used to define types used by the LD utilities. These are

          ld_target_word    -- unsigned integer big
                               enough to hold a TWORD
          ld_dvalue         -- signed integer big
                               enough to hold a Dvalue

E.2 File Manipulation

The routines described in this section perform simple file manipulations.

E.2.1 Open an LD Output File

Usage:

          #include <ld.h>
          lw_open(filename,reloc_align,bits_per_byte,
                  most_reloc,longest_reloc,
                  origin_bits,target);

Where:

char *filename;: is a string giving the name of the LD file you want to open. If you specify NULL for this argument, lw_open will pretend to open an output file and all lw_ routines will behave as if there is an output file open. However, the utilities will attempt to keep generated output in memory rather than writing it to an actual file. If there turns out to be too much to store in memory, the utility routines will create a temporary file to hold the output. This file will have a name chosen not to match any other existing files. The file will be deleted automatically when the current program terminates. Thus, the only way to read back what is written to this file is to call lw_reread.
unsigned reloc_align;: is the relocation alignment of the file being created: the number of bits in a TWORD.
unsigned bits_per_byte;: is the number of bits in a byte on the target machine.
unsigned most_reloc;: is the maximum number of relocation instructions that can be applied to a single TWORD.
unsigned longest_reloc;: is the size of the longest relocatable data object on the target machine. This size is given in TWORDs.
unsigned origin_bits;: is the number of bits required to give the origin of relocation in an LD_DATA directive.
char *target_machine_name;: is a string giving the name of the intended target machine.

Description:

lw_open opens a file as the LD output file. It also outputs appropriate LC_LDVERSION and LC_TARGET_INFO directives as the first two directives of the file.

E.2.2 Open an LD Input File

Usage:

          #include <ld.h>
          lr_open(filename,min_version,max_version,target);

Where:

char *filename;: is the name of the LD file to be opened.
unsigned min_version;: is an integer indicating the earliest version of the LD format that the calling program is prepared to handle. If the version given in the LC_LDVERSION directive at the beginning of the file is earlier than min_version, lr_open will terminate the calling program.
unsigned max_version;: is similar to min_version, but states the latest version of the LD format that the calling program is prepared to handle.
char *target;: is a string containing the name of the target machine. This argument can also be NULL.

Description:

lr_open opens a file as the LD input file. It also checks that the LD version number of the input file falls into the range specified by min_version and max_version. If a non-NULL target argument is specified, lr_open verifies that this target machine matches the one named in the file's LC_TARGET_INFO directive (if any).

After this checking has taken place, lr_open positions the input file at seek position zero. Thus the first directive that the program will find is the LC_TARGET_INFO.

E.2.3 Conditionally Open an LD Input File

Usage:

          #include <ld.h>
          ret = lr_test(file,minversion,maxversion,target);

Where:

char *file;: is the name of the LD file to be opened.
unsigned minversion;: is an integer indicating the earliest version of the LD format that the calling program is prepared to handle. If the version given in the LC_LDVERSION directive at the beginning of the file is earlier than minversion, lr_test will return FALSE.
unsigned maxversion;: is similar to minversion, but states the latest version of the LD format that the calling program is prepared to handle.
char *target;: is a string containing the name of the target machine. This argument can also be NULL.
int ret;: is zero if lr_test fails to read the file, and non-zero otherwise.

Description:

lr_test is similar to the lr_open function in that it attempts to open a file as the LD input file. The difference is that lr_open terminates the program if the file's version number falls outside the given range, or if the file's target machine name does not match the target argument. If lr_test fails, it just returns zero.

E.2.4 Open an LD Library for Updating

Usage:

          #include <ld.h>
          lw_lopen(file,clearflag,index_seek,first_module);

Where:

char *file;: is a string giving the name of the library that you want to open. If the library doesn't currently exist, it will be created; in this case, the clearflag argument must be non-zero or the routine will terminate the program.
int clearflag;: is zero (FALSE) if you want to preserve the current contents of the library. If clearflag is non-zero (TRUE), the library will be initialized and all existing contents will be lost.
unsigned long *index_seek;: points to an area of memory big enough to hold an unsigned long value. lw_lopen will assign this memory the seek address of the library's index. This seek address is suitable for use with lw_seek or lr_seek. If the library is empty, lw_lopen will return a seek address of zero.
unsigned long *first_module;: points to an area of memory big enough to hold an unsigned long value. lw_lopen will assign this memory the seek address of the library's first module. This seek address is suitable for use with lw_seek or lr_seek. If the library is empty, lw_lopen will return a seek address of zero.

Description:

lw_lopen opens an LD library for updating and reads the LC_TARGET_INFO and LC_LIB_HEADER directives that appear at the beginning of the library.

The file is accessed for both reading and writing, but it will begin in read mode at seek position zero. To begin writing on the file, use lw_lwrite.

If the file is cleared, there will be no LC_TARGET_INFO directive. Such a directive will be inserted by lw_lclose, using values taken from object files copied by ld_copy.

E.2.5 Close LD Output File

Usage:

          #include <ld.h>
          lw_close();

Description:

lw_close writes a null directive on the current LD output file (marking the end of the file), then closes the file.

E.2.6 Close LD Input File

Usage:

          #include <ld.h>
          lr_close();

Description:

lr_close closes the current LD input file.

E.2.7 Close an LD Library

Usage:

          #include <ld.h>
          lw_lclose(index_seek,first_module);

Where:

unsigned long index_seek;: is the (new) seek address of the library index.
unsigned long first_module;: is the seek address of the first module that is now in the library.

Description:

lw_lclose closes a library that is currently being used as the LD output file. This file should have been opened by lw_lopen and changed to write mode with lw_lwrite. The current write position in the library MUST be positioned at the end of the index. This routine will append the required NULL directive to the end of the library header, but not to the end of the index.

E.2.8 Change Library to Write Mode

Usage:

          #include <ld.h>
          lw_lwrite();

Description:

lw_lwrite changes a library from read mode to write mode. The seek position in the library does not change. The library must have been opened with lw_lopen.

E.2.9 Change Library to Read Mode

Usage:

          #include <ld.h>
          lr_lread();

Description:

lr_lread changes an LD library from write mode to read mode. The seek position in the library does not change. The library must have been opened with lw_lopen.

E.2.10 Change Output File to Input File

Usage:

          #include <ld.h>
          lw_reread();

Description:

lw_reread changes the current LD output file into the current input file. The file is positioned so that reading takes place at the beginning of the file.

E.2.11 Reposition Input File

Usage:

          #include <ld.h>
          lr_reread();

Description:

lr_reread repositions the current input file back to the beginning of the file.

E.2.12 Change Position in LD Output File

Usage:

          #include <ld.h>
          lw_seek(pos);

Where:

unsigned long pos;: indicates where you want to go.

Description:

lw_seek moves to a new position in the current output file. lw_seek can only be issued when the current write position is the first byte of a directive, and it can only be used to move to the first byte of another directive.

E.2.13 Change Position in Input File

Usage:

          #include <ld.h>
          lr_seek(pos);

Where:

unsigned long pos;: indicates the file position to which you want to move.

Description:

lr_seek moves to a new position in the current input file. lr_seek can only be issued when the current read position is the first byte of a directive, and it can only be used to move to the first byte of another directive.

E.2.14 Obtain Current Position in Input File

Usage:

          #include <ld.h>
          pos = lr_tell();

Where:

unsigned long pos;: represents the current read position.

Description:

lr_tell obtains a value representing the current read position in the LD input file. This can later be passed to lr_seek to return to the position.

E.2.15 Obtain Current Position in Output File

Usage:

          #include <ld.h>
          pos = lw_tell();

Where:

unsigned int pos;: indicates the current write position in the LD output file.

Description:

lw_tell obtains a value that represents the current write position in the LD output file. This value can be used in subsequent calls to lw_seek to come back to this same position.

E.3 Building Output Directives

The following routines all help you build an output directive.

E.3.1 Start Building a Directive for Output

Usage:

          #include <ld.h>
          lw_start(code);

Where:

unsigned char code;: is a character indicating what kind of directive you want to start. This will be the first character of the directive.

Description:

lw_start is the first step in writing out a directive. After this, you use other lw_ functions to write out the various fields of the directive. You do not have to calculate the length or the checksum of the directive -- these are calculated when you call lw_end to close off the directive you have built.

You do not need to create your own LD_DATA and LD_RELOC directives with this routine. Instead, you should build these directives using lw_dword and lw_reloc. The LD utilities will write out appropriate LD_DATA and LD_RELOC directives as they accumulate.

E.3.2 Close Off Output Directive

Usage:

          #include <ld.h>
          lw_end();

Description:

lw_end indicates that you have finished building a directive. lw_end will calculate the length and the checksum for the directive just produced, fill in these fields, and write out the completed directive.

E.3.3 Write a TWORD

Usage:

          #include <ld.h>
          lw_word(tword);

Where:

ld_target_word tword;: is the TWORD value you want to write.

Description:

lw_word writes a TWORD value to the output directive that is currently being built.

E.3.4 Write a Byte

Usage:

          #include <ld.h>
          lw_byte(byte);

Where:

unsigned char byte;: is the byte you want to write.

Description:

lw_byte writes a byte to the output directive that is currently being built.

E.3.5 Write Dvalue

Usage:

          #include <ld.h>
          lw_dvalue(dval);

Where:

ld_dvalue dval;: is the Dvalue that you want to write.

Description:

lw_dvalue writes a Dvalue to the output directive currently being built.

E.3.6 Write Block of Data

Usage:

          #include <ld.h>
          lw_data(ptr,length);

Where:

void *ptr;: points to the data you want to write.
unsigned length;: is the number of bytes that you want to write.

Description:

lw_data writes a block of data to the output directive currently being built.

E.3.7 Write String

Usage:

          #include <ld.h>
          lw_string(ptr);

Where:

char *ptr;: points to the string that you want to write. This should be a normal C string, i.e. terminated with '\0'. However, the '\0' will not be written to the directive being built.

Description:

lw_string writes a string to the output directive currently being built.

E.3.8 Write a ULONG

Usage:

          #include <ld.h>
          lw_ulong(ul);

Where:

unsigned long ul;: is the ULONG value that you want to write.

Description:

lw_ulong writes a ULONG value to the output directive that is currently being built.

E.3.9 Write a Time Value

Usage:

          #include <ld.h>
          #include <time.h>
          lw_time(tim);

Where:

time_t tim;: is the time that you want to write out, expressed in the usual C format.

Description:

lw_time converts a C time number into an LD time and writes out the time to the output directive currently being built.

E.4 Generating Data

The following routines are used to construct paired LD_DATA and LD_RELOC directives. They should not be called if there is already a partly built output directive (i.e. if you have called lw_start to begin a directive, but have not called lw_end to end it).

E.4.1 Write Out Data for LD_DATA

Usage:

          #include <ld.h>
          lw_dword(tword);

Where:

ld_target_word tword;: represents the data that you want to write.

Description:

lw_dword is used when creating an LD_DATA/LD_RELOC directive pair. The LD utility functions let you create the LD_DATA directive and its associated LD_RELOC directive "simultaneously".

The first step in the process is to indicate the segment and offset where the data should be placed. This is done with external variables declared as

          extern ld_dvalue lw_segment;
          extern ld_dvalue lw_origin;

Assign the appropriate SEGREF to lw_segment and the offset of the data to lw_origin.

Next, call lw_reloc to specify relocation information for the data. The first argument of lw_reloc gives the SEGREF of a relocatable symbol. The second argument gives the desired relocation code. If there are several relocations to be applied to the data, issue all of them via separate lw_reloc function calls.

Finally, issue an lw_dword call for the TWORD you want to write out. This call will increment lw_origin automatically, so you don't have to adjust lw_origin if you are going to output data to the next TWORD position in the same segment.

Information produced via lw_reloc and lw_dword will be accumulated as it is produced. Just before closing the file, you should call lw_flush to flush the accumulated data and relocation information, producing LD_DATA and LD_RELOC directives.

E.4.2 Write Relocation Information

Usage:

          #include <ld.h>
          lw_reloc(segref,reloc_code);

Where:

ld_dvalue segref;: is the SEGREF of the relocatable symbol that will contain the data.
int reloc_code;: is the relocation code for the data.

Description:

lw_reloc writes out relocation information about relocatable data. For further information, see the description of lw_dword.

E.4.3 Flush Data and Relocation Information

Usage:

          #include <ld.h>
          lw_flush();

Description:

lw_flush is used to flush accumulated data and relocation information prior to closing off an object module.

E.4.4 Flush and Close Literal`

Usage:

          #include <ld.h>
          lw_lend(segref);

Where:

ld_dvalue segref;: gives the SEGREF of the literal that is being closed.

Description:

lw_lend is used to flush and close off a literal segment. The literal segment should have begun with an LD_LITERAL directive which implicitly claimed a SEGREF for the literal segment. Then came calls to lw_dword and ld_reloc to create the contents of the literal segment. lw_lend flushes the accumulated data and relocation information (if necessary), and writes out an appropriate LD_END_LITERAL directive to end the literal definition.

E.5 Reading from the Input File

The following routines all obtain information from the input LD file. The read routines handle continuation directives properly, so that you can just keep reading into the continuation directive. However, if you read past the end of a directive and there is no continuation, the read routines will return garbage. The lr_eor function lets you determine when you have reached the end of a directive.

E.5.1 Read Directive from Input File

Usage:

          #include <ld.h>
          code = lr_getdir();

Where:

unsigned code;

is a code identifying the directive that has been read. For most directives, this will just be the first byte. For LD_CONTROL directives, it will be

        (LD_CONTROL << 8) | LC_code

where LD_CONTROL is '#' and LC_code is the code of the sub- directive.

Description:

lr_getdir reads a new directive from the LD input file. Subsequent lr_ functions can be used to read fields from the directive.

If the previous directive has been completely read, lr_getdir verifies its checksum. Otherwise, lr_getdir skips over whatever is left of the previous directive and goes to the beginning of the next.

E.5.2 Read a Byte

Usage:

          #include <ld.h>
          byte = lr_byte();

Where:

unsigned char byte;: is the byte read from the LD input file.

Description:

lr_byte reads a byte from the input file.

E.5.3 Read a ULONG from LD Input File

Usage:

          #include <ld.h>
          ul = lr_ulong();

Where:

unsigned long ul;: is the ULONG value that has been read from the file.

Description:

lr_ulong reads a ULONG value from the current LD input file.

E.5.4 Read a Block of Data

Usage:

          #include <ld.h>
          lr_data(ptr,length);

Where:

void *ptr;: points to an area of memory into which data should be read.
unsigned length;: is the number of bytes that should be read.

Description:

lr_data reads a block of data.

E.5.5 Read a String from LD Input File

Usage:

          #include <ld.h>
          str = lr_string();

Where:

char *str;: points to the string that has been read.

Description:

lr_string reads a string from an LD file and returns a pointer to that string. The length of the string is based on the number of bytes that remain in the directive being read. Space for this string is obtained using malloc.

lr_string adds the usual '\0' to mark the end of the string.

E.5.6 Read Static String from LD Input File

Usage:

          #include <ld.h>
          str = lr_sstring();

Where:

char *str;: points to the string that has been read.

Description:

lr_sstring is exactly like lr_string except that it stores the input string in a static location instead of obtaining space with malloc. Each call to lr_sstring will overwrite the string obtained by the previous call. For further information, see the description of lr_string.

E.5.7 Read VString from LD Input File

Usage:

          #include <ld.h>
          lr_vstring(ptr);

Where:

vs_unit *ptr;: points to a VS unit that can contain the string.

Description:

lr_vstring reads a string from the LD input file and stores it in the given VString. The result can then be manipulated by the VS_ routines in the utility library.

E.5.8 Read a Dvalue from LD Input File

Usage:

          #include <ld.h>
          dval = lr_dvalue();

Where:

ld_dvalue dval;: is the Dvalue that was read from the LD file.

Description:

lr_dvalue reads a Dvalue from the current input file.

E.5.9 Read a TWORD from LD Input File

Usage:

          #include <ld.h>
          tword = lr_word();

Where:

ld_target_word tword;: is the TWORD that has been read from the file.

Description:

lr_word reads a TWORD from the current LD input file.

E.5.10 Read a Time from LD Input File

Usage:

          #include <ld.h>
          #include <time.h>
          tim = lr_time();

Where:

time_t tim;: is the time that has been read, expressed as a time_t number.

Description:

lr_time reads a time value from the current LD input file and converts it to the time_t format recognized by standard C functions.

E.5.11 Obtain Length of Remaining Record

Usage:

          #include <ld.h>
          length = lr_length();

Where:

int length;: is the number of bytes remaining in the current record.

Description:

lr_length tells how many bytes have not yet been read in the current record. This will always be a positive integer, except for records which are continued by LD_CONTINUATION records. In this case, lr_length returns the negative of the number of bytes left in the record, to indicate that the record continues on to a new record.

Note that the correct length is returned for extended records.

E.5.12 Test for End of Directive

Usage:

          #include <ld.h>
          bool = lr_eor();

Where:

BOOLEAN bool;: is TRUE when the LD file is at the end of a record, and FALSE otherwise.

Description:

lr_eor determines whether or not you have reached the end of the current record in the input file.

E.6 Miscellaneous Routines

The following routine performs general LD file operations.

E.6.1 Copy Data from Input File to Output File

Usage:

          #include <ld.h>
          ret = ld_copy(length);

Where:

unsigned long length;: is the number of bytes to copy.
int ret;: will be FALSE (zero) if the target machine of the input file is not equal to the target machine of the output file. In this case, nothing is copied. If the target machines match, ld_copy returns TRUE (non-zero).

Description:

ld_copy copies the given number of bytes from the input file to the output file. It should only be used to copy complete directives.

ld_copy checks that the target machines of the two files match, but does no other checking. For example, it does not check to see if the checksums are correct.

Appendix F: Output Formats

In this appendix, we will examine the work that LD output writers do in preparing various output formats. Most of these formats are system-specific, and will be marked as such.

F.1 LD Format (Bull HN DPS-8)

Before writing an LD file, all relocation in the LD object code is simplified so that the triplet in the LD_RELOC directive will always refer to a SYMREF or a segment created by an LD_CRSEG directive.

Next, all LPTR_RELOC relocations are replaced with ADDR_RELOC relocations wherever addressability is guaranteed. (See Appendix D for more discussion of LPTR_RELOC.)

Finally, segments containing no global definitions are bound to their parents. Other segments cannot be bound, since further linking may grow the segment.

F.2 RU Format (Bull HN DPS-8, GCOS8 NS Mode)

The RU format uses the term SEGREF to mean something different from LD's SEGREFs. To avoid confusion, we will refer to RU's type of SEGREF as an RUREF and keep the term SEGREF itself to mean an LD SEGREF.

The output writer begins by resolving undefined SYMREFs to NULL. Undefined segments become global RUREFs. Undefined "entry" objects become ENTREFs.

Next, the output writer must determine the RU's primary entry point. If there is an Entry=name option on the LD command line, the given name will be used. Otherwise, if there is an "entry" definition starting with "......" (six dots), this will be used. Otherwise, the first primary entry definition will be used. All other "entry" definitions and "segment" names become local to the RU. The name of the RU is taken from the "Name=name" option of the LD command line; if there is none, the default name is used (see "expl ld").

LD then creates D$LINKAGE_SEG. In it, LD places the descriptors for defined segments and entries, plus slots for ENTREFs and RUREFs. Certain "known" segments are given specific SEGIDs (i.e. slots in the linkage segment); these are

          Name               SEGID   Linkage Segment Offset
          ----               -----   ----------------------
          D$NULL_DESC        6000           octal 0
          D$PRIVILEGE_SEG    6001           octal 2
          D$SLIS             6002           octal 4
          D$LINKAGE_SEG      6010           octal 20

All offsets above are given in words. NOTE: Except for D$LINKAGE_SEG, LD does not define these -- it just references them. This, module(s) in the input must supply a definition and appropriate initialization for the other segments.

The RU writer will implicitly bind non-"segment" segments to "segment +flexible" segments as long as they fit. It will then create additional unnamed "segment +flexible" segments to hold any remaining unbound non-"segment" segments.

All literals are expanded, i.e. replaced with an appropriate segment bound to the LD segment that contains the appropriate literal pool.

At present, all debugging information is discarded. In future releases, the RU writer will create the debug schema required by DSSV.

F.3 OM Format (Bull HN DPS-8, GCOS8 NS Mode)

First, any data in a "segment" segment is moved to an LD-created child, since OM format forbids placing data directly in a "segment".

All "segment" definitions become OM SEGDEF directives. Every segment reference purely for relocation becomes an SSREF directive. If a segment definition has no parent, or its parent is a "segment", the definition becomes a SECTDEF.

Every "constant" reference becomes a CONREF. Every "constant" definition and symbol whose value is not relocatable becomes a CONDEF.

Every "entry" definition is replaced by an ENTDEF and ENTREF. Every "entry" reference is replaced by an ENTREF.

All other global references are replaced by SYMREFs, and all other global definitions are replaced by SYMDEFs.

Other segment definitions are folded into their parent segments, since OMs forbid multi-level nesting.

All literals are expanded, i.e. replaced with an appropriate segment bound to the LD segment that contains the appropriate literal pool.

Triplets with LPTR_RELOC relocation are changed to ADDR_RELOC or an LDP opcode.

Unnamed objects in the output are given dummy names of the form "#N", where N is a decimal number. Local objects in the output are given dummy names of the form "original_name#N", where "original_name" is the original name of the object, and N is a decimal number that distinguishes objects with the same name but from different modules.

The symbol tables from the first input module having debug directives will become the local segment named "(SCHEMA)".

Appendix G: Outstanding Issues

The following design issues have yet to be resolved.

G.1 Sharing Subpartitions and Entries

While LD ensures that a sharable subpartition does not contain a reference (descriptor) to an unshared subpartition, the RU loader must verify that this is the case. We do not want users to be able to break security by patching their RUs.

Similarly, a routine invoked through a shared entry cannot refer to an unsharable partition. This must be verified by the RU loader.

G.2 Debugging Breakpoints

Creating breakpoints in shared code for the purpose of debugging is a tricky process. We see three alternatives:

Do not allow the setting of breakpoints in shared code. This will make debugging difficult as more and more of the library routines are implemented with shared code.
Place breakpoints in the original partition initializations in the RU. We then need a "universal debugger" that can decide which breakpoints belong to which processes. This, however, can mean a heavy processor time penalty for other users of the same library.
"Unshare" partitions when we put breakpoints into them. This means that you have to unshare the partition of any descriptor segment that contains a descriptor framing any part of the original partition. This can surprise the shared routines if the unshared partition(s) really must be shared. (This is usually only a problem if the unshared partitions are writable.)

G.3 Dynamic Linking

If Bull does not want to support dynamic linking at this time, we recommend the following:

Set up the loader so that it does not generate dynamic linking descriptors.
As part of the verification of descriptors (described in the body of the manual), never generate a dynamic linking descriptor.

In this way, you can prevent dynamic linking until you are ready to support it.

Dynamic linking can be partly simulated by replacing dynamic linking descriptors as they are encountered during instantiation. This replacement amounts to snapping the link, as described earlier. Therefore it can only be done safely if the caller and the callee both agree that the link can be snapped.