1. Introduction 1.1 How to Use this Document 1.2 Descriptive Shorthand 1.2.1 Addressing: "Above" & "Below" 1.2.2 Bytes: "Upper" & "Lower" 1.2.3 Stack Pointers: "Previous" & "Current" 1.3 Run-time Environment GMAP Macros: BMAC 1.4 Further Information 2. Initialization: B Set-up and Starting Execution 2.1 Structuring the Run-Time Memory 2.1.1 Growing to Accommodate Overlays 2.1.2 Creating the Unallocated Memory Pool 2.2 Allocating and Initializing the Main B Stack 2.3 Setting Fault Vectors 2.4 Parsing the Command Line 2.5 Calling MAIN 3. Stack-Oriented Languages 3.1 Local Storage Requirements 3.1.1 The FCV/Stack Descriptor 3.1.2 Location of the Stack 3.1.3 Stack Overflow Checking 3.2 The Stack Pointer 3.3 The Stack Frame 3.3.1 Components of the Stack Frame 4. Function Call, Entry, and Return 4.1 Passing Function Arguments 4.2 The Call/Return Linkage 4.2.1 Function Trap Linkage 4.2.2 "Cheap" Local Storage 4.3 Function Call, Entry, and Return: Detail 4.3.1 In the Caller 4.3.2 In the Callee 4.3.3 In .ENTRY 4.3.4 In .RETRN 4.3.5 Alternate Function Entry Routines 5. Memory Allocation and Management 5.1 External Storage 5.2 Auto Storage 5.3 Dynamic Storage 5.4 Memory Management: Which to Use 5.5 Memory-Image of a B Program 6. Input and Output 6.1 I/O Data Structure 6.2 Set-up and Tear-down 6.3 Terminal I/O 6.3.1 The New-Line Character 6.3.2 Flushing Output 6.3.3 Terminal Input 6.3.4 VIP Terminals 6.3.5 Raw Terminal I/O Mode 7. Register Conventions and Usage Appendix A: Variable Number of Arguments: .VNTRY A.1 Variable with a Maximum A.2 Variable with No Maximum A.3 Description of a .VNTRY Sequence A.3.1 In the Caller A.3.2 In the Callee A.3.3 In .VNTRY Appendix B: Run-Time Memory B.1 The Pool of Run-Time Memory B.1.1 Growing the Pool B.1.2 Shrinking the Pool B.2 RLSEVEC: Making Memory Unallocated B.3 GETVEC: Allocating Memory B.4 Obtaining More Run-Time Memory B.5 Wreaking Havoc with Run-Time Memory Appendix C: Function Traps and Gotos C.1 Setting a Function Trap C.2 The Trap Handler C.3 Multiple Traps C.4 Non-Local GOTOs C.4.1 Finding the Destination C.4.2 Triggering Traps C.4.3 B Library Support Appendix D: Debug Tables D.1 Loader External Symbol Tables D.2 Local Debug Tables D.2.1 Line Number Table D.2.2 Local Symbol Tables
This document describes the internal workings of programs that have been compiled by GCOS8 compilers maintained by Thinkage Ltd. This includes the following compilers.
This manual describes programs in terms of their overall structure and in terms of the equivalent GMAP assembler instructions generated for some of those structures. The information should prove helpful if you are trying to write or understand functions written in GMAP for the run-time environment. It will also help you understand the overhead associated with function call and entry, and memory allocation and management. Such "inside knowledge" can also help greatly in the efficient use of the interactive debuggers BOFF and COFF.
Because many of the topics in this manual are inter-related, most people will have difficulty assimilating all the material in a single reading. We suggest that you skim the document once or twice to get a general feel for the most important points, and then go for a detailed reading.
Before we get into the actual description of the run-time environment, it will be helpful to ensure that the reader understands some of the terminology that will be used in the rest of this document.
In visualizing the run-time environment, you should think of a program's address space originating at a base of zero and extending "upward" to higher addresses. Memory location A is "above" location B if A's address is higher than B's. The "bottom" of a block of memory or a structure is its lowest address. The "top" is its highest address.
We shall use the terms "below" to mean "lower in address" and "above" to mean "higher in address" throughout this document. Our diagrams have been drawn with this visualization in mind. Where possible, higher addresses are displayed above lower addresses. This convention must be violated in diagrams containing GMAP assembler statements, since assembler code is written "down" the page towards "higher" addresses.
The terms "upper" and "lower" refer to the left 18 bits (most significant half) and the right 18 bits (least significant half) of the machine word. Where two values are stored in two halves of the same word, we will write "upper" and "lower" to indicate where each is.
When we say each function has its "own" stack pointer, we mean that the single hardware register used as the stack pointer takes on a specific value for each function invocation. Each function invocation has its "own" value of the stack pointer.
In the same way, "the stack pointer of a function" means the value that the stack pointer has while the function is executing.
In the description of stack frames used by function calls, phrases like "the previous stack pointer" do not mean that there is more than one hardware stack pointer. This shorthand just means "the value of the stack pointer previous to the value it has now". When a function returns to its caller and "the previous stack pointer" is restored, we mean "the stack pointer is given the value it had previous to the current value".
The stack pointer is often referred to as if it had an actual memory address or location, e.g. a word lies "below the stack pointer". This sort of thing always refers to the address that the stack pointer points to. For example, saying that the linkage section of a stack frame lies "below its stack pointer" means that the locations that make up the linkage section lie below the location addressed by the function's stack pointer.
"Moving" the stack pointer means changing the value of the stack pointer, i.e. "moving" the location addressed by the stack pointer. When the stack pointer is "moved up" to the next stack frame, the address in the stack pointer register is replaced by the address of the next stack frame.
If you intend to work at the assembler level, you should look at the B system GMAP macro file. This file contains the correct sequences of GMAP assembler instructions for such things as function entry, return, defining auto storage, and bumping the stack. "expl b bmac" explains the use of these macros. The macros may be loaded with your assembler program, enabling you to write compatible GMAP functions easily and accurately. The macros also provide more detail on the exact code used to manipulate the structures described in this manual.
The information supplied in this document is subject to change, and is often abbreviated or slightly distorted for the sake of clarity. Please refer to the above macro file for the current implementation of the features described here.
More detailed explanations of the use and syntax of library functions described in this document may be obtained with "expl b lib". Note however that many internal library functions are not documented in an effort to prevent their use and abuse by normal users. Note also that the C and Pascal packages do not contain some of the functions that are distributed with B; therefore the corresponding explanations may not be available if your system does not have B.
There are several procedures which can or must be performed in every program before the function MAIN is finally invoked. The internal library routine .SETU. performs most of these procedures. They may be grouped as follows:
The loader brings your program into memory and does little else. The block of memory occupied by your program is often called the "memory hole" or (an anachronism) the "core hole". B, C, and Pascal programs require a structured memory hole to permit dynamic allocation and de-allocation of memory.
.SETU. determines the maximum memory size needed by the overlays that may be called into memory. The library function ADDMEM may be called to increase the memory-image size to accommodate the largest overlay. The memory obtained by this call to ADDMEM is kept reserved for use by incoming overlays. It cannot be used for run-time memory allocation, since the overlays must be called into this space and would overwrite any vectors dynamically allocated in this region of memory.
The B library function RLSEVEC is called to de-allocate all unused memory above the user's program, or above the highest address needed to accommodate the largest overlay, up to the next "K" boundary. This memory is now available to your program at run-time through calls to the B library function GETVEC. (See Appendix B: Run-Time Memory for details on GETVEC and RLSEVEC.)
Associated with the main stack is a standard stack descriptor, or Function Control Vector (FCV). The stacks allocated for co-routines also have FCVs (see "expl b coroutines fcv"). The main stack is a vector whose length is specified by the "Stack=" option of the command that invokes the compiler. This vector is allocated by the B library function GETVEC. The absolute high and low addresses of the stack are placed in the main stack descriptor located in .SETU.
An empty stack frame is constructed on the bottom of the new stack and Address Register 7, the stack pointer (sp), is initialized to point to the stack address of this empty stack frame. Subsequent function calls will now build their stack frames on the main stack.
Overflow faults are normally masked out. .SETU. calls the internal library function B.FALT to initialize the break and wrapup vectors. Other fault vectors are left untouched.
The command line that invoked the program is read and processed by the B library function .BSET. Once it has finished parsing the command line, .BSET releases itself into the pool of unallocated memory (using RLSEVEC). You can prevent .BSET from releasing itself by initializing an external variable named .KEEP to a non-zero value.
Finally, MAIN is called from the initialization routine. Two arguments are supplied, courtesy of .BSET:
main( argc, argv );"argc" is the number of arguments (separated by blanks or tabs) given on the command line. "argv" is a vector of pointers to those arguments (see "expl b lib .bset".)
When MAIN returns, it returns to .SETU., the initialization routine. .SETU. terminates your program.
B, C, and Pascal use a stack to store local variables and linkage information for each function. The main stack is allocated dynamically by .SETU. when the load module begins execution. The size of the stack is specified at compile time using the "Stack=" option of the compiler command. The default is 500 words. 300 words is the minimum stack needed to get through the initialization routines in .SETU., so you should not specify a value less than 300.
The memory for the stack is obtained from the pool of unallocated memory in your program (see Appendix B: Run-Time Memory for more detail on the available memory pool).
Each time a function is called, a segment of the stack is reserved for that function's local storage. When the function returns, the stack space is made available again. Each function call uses four words of stack space for the call/return linkage, plus one word for each function argument and one word for each auto variable. Each auto variable defined as a vector takes an additional "size of the vector plus one" words of stack space.
Unless special stack-monitoring routines are used, it is quite possible to "walk off the end" of the stack and overwrite other memory locations above the stack. (See Appendix B: Run-Time Memory for an example of this.) This generally leads to immediate or eventual disaster for your program; hence, it is wise to know what data gets stored on the stack, and to avoid the use of a lot of local storage when it is not needed.
The entry routine .SETU. contains a template FCV (Function Control Vector or stack descriptor) describing the main B stack. (See "expl b coroutines fcv" for a full description of the layout and contents of this descriptor.) In a regular B program, the coroutine pointer (cp=x6) points to .SETU.-6. This is the value of the FCV pointer for a regular B program, and the value returned by the B library function WMI.
The FCV for the main stack contains the absolute stack addresses in the upper and lower halves of .SETU.-9 (FCV[-3]). The current high water mark stack usage is in the upper half of .SETU.-8 (FCV[-2],upper). If the stack is severely overwritten or the registers altered, reference may be made to these locations in .SETU. to determine the actual location in memory of the main stack.
The main stack is allocated dynamically from available memory. The stack start is rounded up to an odd boundary and the absolute high and low addresses of the stack are placed in the upper and lower halves of FCV[-3].
When the stack monitoring routines are loaded with your program, the internal library routine .ENTRY performs a stack bound check before entering each function. These routines are loaded automatically unless the "-Tables" option of the B command is used, in which case "USe=.bound" will load the checking routines explicitly.
When the stack overflow checking code is being used, the stack high water mark is monitored with respect to the absolute stack addresses. If the high water mark ever runs off the top of the stack, an error message is printed and your program aborts.
If the monitoring routines have been loaded and the stack descriptor (FCV) and coroutine pointer (cp=x6) are intact after an abort, BOFF and COFF are able to interpret and print the stack bounds information from the FCV.
The stack is indexed using Address Register 7 (ar7), hereafter referred to as the stack pointer. Set-up initializes the stack pointer to point to the bottom of the stack, and the stack will grow upward toward higher memory from this point.
Since local storage is unique to each particular call or invocation of each function, each function call attempts to obtain a fresh segment of the stack for its use. The stack space associated with a particular function invocation is called a stack frame. When a function is called, a new stack frame is allocated for that function, above the stack frame used by the calling function. When the function returns, the new stack frame is abandoned and the stack pointer is once again aimed at the stack frame of the caller.
In practice, a stack frame is implemented by moving (changing the address in) the Stack Pointer (sp=ar7), and by addressing all local storage on the stack relative to this pointer. The address in the stack pointer determines the location of a stack frame. When a function is called and a new stack frame is needed, the address in the stack pointer is advanced to point a safe distance up the stack past the previous stack frame. When a function returns and its stack frame is abandoned, the address in the stack pointer is moved back down the stack to point to the previous stack frame.
The stack frame is the name given to the segment of the stack reserved for the current function invocation (call). It must be large enough to contain all the local storage needed for the function, as well as some space for function call and return linkage. In addition, it must be situated on the stack in such a way that it does not overwrite or interfere with any previous stack frames of functions still in execution. (These would be functions that called functions that called the current function.)
The components of the stack frame, in order of increasing address,
are as follows:
In keeping with the visualization of the B address space that places low addresses on the bottom, you should invert the above list so that you envision the call/return linkage as lying at the very bottom of the stack frame, "below" the stack pointer.
Note that the stack pointer points to the third word above the bottom of the stack frame, i.e. to the word containing a copy of the value of the caller's stack pointer. The first argument to the function is the first word above the stack pointer. The two words of linkage are actually below the stack pointer. There may be no arguments, autos, or temporaries in the stack frame if the corresponding function does not need such things.
In B, C, and Pascal, it is the responsibility of the caller to save any registers it needs across a function call. Most other GCOS-8 programming languages do NOT work this way.
Most of the overhead associated with a function call is to generate and organize a complete new stack frame for the function being called (the callee). This stack frame must be situated safely above the stack frame of the calling function (the caller). The creation of this new stack frame, with linkage, arguments, and space for local storage, is a shared responsibility between the caller and the callee.
Below we list the steps of this process and whether the operation is
carried out by the caller or the callee.
You will note that the caller stores arguments in the "temporaries" section of its own stack frame, but when the function call is complete the arguments must be situated immediately above the stack pointer in the callee's stack frame. In fact, the stack frames associated with each function call overlap.
The temporaries section of the stack frame contains temporary values needed to hold intermediate results while evaluating non-trivial expressions. A function call may be considered a special case of a non-trivial expression, where the arguments to the function are just intermediate results copied by the caller into the "temporaries" section.
When the stack pointer is advanced for the function call, it is only moved far enough to clear the local storage section but not the "temporaries" section of the caller. This places the stack pointer above the local storage section, but below the "temporaries" section into which the arguments have been copied. The caller's "temporaries" section containing the list of arguments becomes the start of the stack frame of the callee. In this way, the "temporaries" section is both the end of the caller's stack frame and the beginning of the callee's.
Not all arguments are stored on the stack for a function call. The first argument is passed in the A Register and the second in the Q Register; only arguments three and up are placed ahead in the "temporaries" section of the caller's stack frame, pending the function call.
Every function knows how many words are needed for its own local storage. This means that each function knows where it is safe to start storing temporaries and/or arguments for function calls. The callee is able to access this same information via the call/return linkage section, and it is this information that enables the callee to adjust the stack pointer the right number of words to point to the new stack frame being created.
This is a two-word area immediately below the stack pointer of the currently executing function. (This makes it the rock-bottom of the current stack frame.) It is not used until the current function does a function call. At this point, the callee places the call/return linkage information it needs to return to the caller into the caller's linkage section. This means that the call/return linkage for the callee is actually stored in the stack frame of the calling function, i.e. in the linkage section of the caller. Thus, the linkage section of any function's stack frame is only used by the callees of that function (possibly none).
When used normally, the linkage section contains the return address to the caller saved in the upper half of both linkage words. One of these words will be used by .RETRN to do the function return; the other is saved for a fixed reference to the first.
When a function trap is set, the return address in the lower linkage word is altered to point to the trap handler, which will ultimately call the desired trap routine. The trap is not actually executed until the function does its return transfer. Since this return is always done through the linkage, the return actually transfers into the trap handler, where additional trap processing takes place in preparation for the trap function call (see Appendix C: Function Traps and Gotos).
The return address in the upper linkage word is never altered. It is used by the BOFF and COFF debuggers (to obtain function call tracebacks), and by the B library function NARGS. Since it is possible to compare the two linkage words, one can tell if a trap is set for the associated function. If the two return addresses differ, the address in the lower linkage word actually points to the trap handler. If the addresses are the same, no trap is currently set.
Since it is the responsibility of the callee to store any return linkage that it needs, and also to bump the stack, it is possible for the callee to forego these responsiblities and use the caller's linkage for other things.
One of the things it can do is to use the caller's linkage as two words of local storage, either for arguments, auto variables, or one of each. This is done in GMAP functions that have such small local storage requirements that the overhead of a full stack-bumping function call is not needed. Only two words of local storage are available by this method and, if used, regular call and entry procedures cannot be used.
To summarize, here are the bottom three words of the stack frame, showing the two linkage words. (The pointer to the "framsi" word will be discussed later.)
Word Contents 0 saved stack pointer of caller -1 upper: return address (never changed) lower: pointer to "framsi" word in callee -2 upper: return address, or trap handler address lower: trap block address, if trap set
The next sections give more detail about the steps of function calling, entry, and return. After each list of actions is a "Notes" section giving more detail about the actions.
tsx1 func zero stbump,nargs
"stbump" is the stack-bump parameter. It is the size of the local storage requirements of the caller (arguments plus auto variables), as defined up to the point where the function call takes place. The callee will arrange to move the stack pointer up at least this many words to protect the stack frame of the caller. To ensure that double-word indirection works properly, the value of "stbump" is always kept even. The stack pointer itself will always be odd.
As will be noted later, there is enough extra space allowed in the "stbump" parameter to permit both the caller's stack pointer and the callee's call/return linkage to be stored safely below the stack pointer after it is advanced. This will put these items above the caller's local storage, and below the newly-constructed function argument list.
"nargs" is the number of words of arguments used in this function call. This may not be the same as the number of arguments if arguments take more than one word each (as may happen in C or Pascal). The B library routine NARGS returns exactly this value.
If you define auto variables in the middle of a function, the "stbump" parameter will be larger for function calls compiled after the auto definition, since there will be more local storage defined by the auto definition. This can lead to unhappy results if the function calls occurring before the misplaced auto definition overwrite an area of the stack not reserved until after the auto definition. (The first function calls will only bump the stack sufficiently to cover the auto variables defined up to that point.) This problem only exists if there is some backward transfer of control occurring after the misplaced auto definition that would cause the functions preceding the auto definition to overwrite the unprotected area of the stack.
func tra func+1 tsx0 .entry zero framsi,dbgptrThis will save the call/return linkage into the caller's linkage area, and then adjust the stack pointer to point to the new stack frame (see the description of .ENTRY below).
There are alternate function entry subroutines to .ENTRY; these are outlined in a following section and described fully in an appendix.
tsx0 .retrnThis will extract the return address from the caller's linkage and transfer through it (see the description of .RETRN below).
B functions always start with an instruction that transfers to the actual body of the function. This means that the value of a variable which is a function name, is this transfer instruction. This allows the value of a function name to be copied, and the copy used in the same way as the function. A function call using the copied name transfers to the word containing the copy, which in turn transfers to the actual function body.
funny; main(){ extrn funny, putstr; funny = putstr; /* the value of funny is now "tra putstr+1" */ funny("hi*n"); /* tsx1 funny transfers to tra putstr+1 */ }
"framsi" is the stack frame size parameter. It is the absolute high-water-mark size of the stack frame needed by this function, and should be determined at the time the function is compiled or assembled. It is the sum of the space used by function arguments, by auto variables and by temporaries, plus a few extra words to allow for the linkage section of its own callee(s). In the event of an asynchronous function call, as might occur during an interrupt, break, or fault, this value can be used by the asynchronous function to establish a safe place to construct its own stack frame, above the stack frame of the function in which the fault occurred.
"dbgptr" is a pointer to local debug tables (tables specific to the function), if they exist. Local debug tables describe auto variables and provide more detail on the line numbers of statements. If the "-Tables" option was used when compiling the program, these tables are not generated and "dbgptr" is either zero or points to the end of the function. (This applies only to B programs. C and Pascal programs have different approaches to debugging.)
At this point, the stack pointer still points to the caller's stack frame,
and all stack references here are within it.
The lower half of the upper linkage word points to the
zero framsi,dbgptrword of the callee. An asynchronous function can look in the linkage for the pointer to "framsi" and use it to establish the absolute top end of the callee's stack frame (see "In the callee").
This pointer is also the only backward link between this stack frame and the address of the associated function, since it points back to the address of the "framsi" word in the callee. Debugging programs use this feature to locate which function is associated with each stack frame on the stack.
There are several alternate subroutines that enter a function:
In addition, for functions which must not be interrupted by the BREAK key, there is protection which may be obtained through the use of the INHIB keyword in the macro call for .ENTRY, .VNTRY, or .ZNTRY. (See 1.3 Run-time Environment GMAP Macros: BMAC.)
B has exactly two data structures: "word" and "vector of words". These may be allocated in any of three ways: as externals, as auto variables, or dynamically from the pool of unallocated memory.
When the program is loaded, the memory is set aside and (optionally) initialized. This form of storage generates entries in the loader symbol and debug tables, since these are external symbols. These locations may be addressed by name in the BOFF debugger.
Located on the stack as part of the current stack frame, auto variables should be initialized locally by the function that uses them. (Otherwise, you get whatever value just happened to be on the stack where the variable is allocated.) Auto variables and their values "vanish" when the function returns and the stack frame disappears. They must be re-initialized on entry into the function again.
The upper bound on stack space is determined at compile time; if too much auto storage is used, the stack may "overflow" and data above the stack may get overwritten. If the stack monitoring routines have been loaded with your program, the message "Stack Overflow" will be printed before your program aborts. If the monitoring routines are not present, you don't find out about the problem until some critical piece of memory is overwritten.
The names of auto variables are defined in the local debug tables, which are not kept if the "-Tables" option of the B command has been used. Since they are offsets from the current value of the stack pointer (sp=ar7), they have no external (loader symbol table) names.
Vectors of memory may be allocated dynamically, from a pool of unallocated memory in your program. B supplies the run-time functions GETVEC and RLSEVEC to allocate and free vectors of words in this pool of unallocated memory. Dynamic memory is the only convenient way to allocate a two-dimensional matrix (see "expl b lib getmatrix"). For more detail on dynamic memory allocation and management, see Appendix B: Run-Time Memory.
In choosing how to allocate words or vectors, the following considerations should be kept in mind.
nn7777 exact "K" boundary, top of memory hole .. .. pool of unallocated memory .. <nnnn> .. .. Main stack (allocated dynamically .. from unallocated memory) .. <nnnn> .. .. run-time library functions .. <nnnn> .. .. user program functions .. 000144 (000100 in batch) .. .. slave prefix (program preface) .. fault vectors, program size, etc... .. 000000 zero, bottom of memory; start of slave prefix
If the command line parsing routine .BSET releases itself back into unallocated memory after parsing the command line, it will leave a large pool of unallocated memory somewhere in the section labeled "run-time library functions".
The I/O routines in the B library are designed in accordance with the strong "sequential character stream" orientation of I/O in the language. Other design goals were speed and size. To this end, almost all routines are written in assembler; the B compiler, unhappily, makes no serious attempt at code optimization. Limited facilities are provided for subverting the normal flow of a character stream (e.g. backspacing, rewinding, record and binary I/O), but these are not allowed to compromise the efficiency of the sequential character stream.
C and Pascal I/O routines are built on top of the B I/O structures. A typical C I/O routine will just set up some arguments and then call a B routine to do the actual work. Because of this, C and Pascal I/O behave in the same way as B I/O.
Underlying the whole I/O structure is a vector IO.UNITS which has range [-5:49]. Each element of this vector defines an I/O unit (the unit number being the offset into the vector) by containing either a pointer to an I/O control vector (henceforth abbreviated IOV), or 0 for an undefined unit.
The control vectors for the "system" and the teletype in TSS and the console in batch are initialized at load time, with only the line buffers being allocated dynamically in TSS (by .SETU.). Set-up of all other IOV's (files and sysout) is accomplished via OPEN which checks its arguments then allocates a buffer for the IOV and block image.
If the unit is being opened for reading, OPEN then calls .READ which maintains two variables: RD.UN., the current read unit, and RD.IOV, the current read IOV. .READ also maintains Address Register 4 (rd=ar4) pointing the same place as RD.IOV.
For units opened for writing, .WRITE is like .READ but maintains WR.UN., WR.IOV and Address Register 5 (wr=ar5).
For details on the structure of an IOV and more detail on media handling and character stream input and output, see "expl b ioplm".
There are several oddities related to reading and writing on terminals that may be relevant to programmers who are interested in the internal workings of their programs. This information is particularly important if you want to understand the sleight of hand that is associated when dealing with the new-line character. Therefore, we will begin with a discussion of new-lines.
When we speak precisely about the new-line character, we are referring to the ASCII linefeed character (octal 012). In B, this is written as "*n". In C and Pascal, it is written "\n", but we will use the B version in this manual.
The linefeed character is used to separate lines of text in text files. However, to separate lines on the terminal, you usually need two characters: a carriage return (to get the cursor to the beginning of the line) and then a linefeed. The carriage return is written as "*r" in B, and "\r" in C and Pascal. Because "*r*n" is used to separate terminal lines, this sequence is also called a new-line when speaking loosely. In the sections that follow, however, "new-line" will only refer to the ASCII linefeed.
Output written to the terminal is always buffered. The library functions only write out the buffer when it becomes necessary. Writing the buffer becomes necessary under any of the following conditions:
As output is written into the buffer, "*n" characters are converted to "*r*n" sequences. This ensures that when the buffer is finally written to the terminal, lines will be printed properly.
The routine that flushes the buffer to the terminal checks to see if the last character in the buffer is "*n". If this is the case, the "*n" is not written to the terminal immediately.
The reason for this is that it is sometimes not appropriate to write a linefeed at the end of a line of output. For example, consider what happens if the program terminates after the line of output is written. When a command terminates in build mode, TSS automatically outputs a linefeed before issuing the "*" prompt for another command. If the linefeed on the last line of output was also printed out, the result would be a blank line between the last line of output and the "*" prompt.
To avoid this problem, the library functions take special action if the last character written from a terminal output buffer is a "*n". The library outputs a carriage return "*r" and then sets a flag that indicates there is a linefeed character waiting for output on the terminal. Normally, the next I/O operation on the terminal will print out that linefeed and then carry on to read or write (as appropriate). However, if the program terminates (or if certain other special conditions occur), the pending linefeed will not be written out.
With normal CRTs, a program only receives a line of terminal input when the user presses ENTER (because of the way GCOS-8 does front end processing). Thus, all input is terminated by a "*r" character.
When a library routine reads the "*r" at the end of the input line, it sets the linefeed pending flag. As in the case with terminal output, this flag means that the library will usually output a linefeed to go to a new line on the terminal. This linefeed can be stopped if the program explicitly suppresses it using the NO.NL function (see "expl b lib no.nl").
Special things can happen if the terminal input is a null line (i.e.,
only a "*r").
In this case, the library sets a conditional pending linefeed flag.
This flag indicates that a linefeed should be printed out by the next
terminal I/O operation, unless all of the following are true.
Users can simulate an end of file condition on terminal output by entering a line that consists only of a File Separator character (CTRL-\) followed immediately by a carriage return. In CTRL-\ appears in any other way (e.g. in the middle of a longer line), the end of file condition is not induced. Note that this way of signaling end of file means that users cannot enter a line that consists only of CTRL-\, since the library will return an EOF indicator instead of the actual CTRL-\ value.
TSS treats VIP terminals unlike normal CRTs. This can be a nuisance to programs that expect to dealing with normal terminals, so the library attempts to make VIPs seem like normal terminals as much as possible.
Whenever a VIP opens for input, TSS automatically sends a linefeed character. Thus, the run-time library never sends out pending linefeed characters when opening the terminal for input.
VIP terminals submit input using the TRANSMIT key instead of a carriage return. To split the screen image into lines in the usual way, the library always puts a "*r" in the terminal output buffer after input is received.
VIPs can transmit multiple lines in one input operation. Each of these lines will be separated with "*r*n". The library input routines automatically convert all embedded "*r*n" sequences into normal "*n" delimiters.
If the user program wants to suppress all of the special handling that the library does with new-lines, it can execute the TTY.IO function on the terminal output and/or input units. This puts the terminal in "raw" mode.
If TTY.IO is applied to the terminal output unit, output to the terminal will not be processed in any way. For example, "*n" characters are not changed to "*r*n". In addition, tab characters are not expanded and the library does not keep track of what output column it is in.
If TTY.IO is applied to the terminal input unit, input is passed on to the program "raw". Carriage returns are not converted to "*n". CTRL-\ never induces the end of file condition. For VIPs, "*r*n" input sequences are not compressed to "*n", missing carriage returns are not echoed, and so on.
x1 Holds return address across call x0,x2, User scratch not preserved across call x3,x4 x5 Save before use, restore before return x6=cp Pointer to the current co-routine header x7 Holds Pascal display pointer across call
ar0,ar1, User scratch not preserved across call ar2,ar3 ar4=rd Pointer to current read IO vector Use, but change only by call to .READ or ..RD ar5=wr Pointer to current write IO vector Use, but change only by call to .WRITE or ..WR ar6 Reserved for co-routine package ar7=sp Stack pointer changed by entry/exit code A A Reg: first function argument Q Q Reg: second argument; function return value
Register x1 is used to hold the return address. If the routine uses the standard entry sequence, this is saved on the stack and x1 may be used as scratch in the program.
In Pascal environments, x7 may contain a display pointer. If so, the display must be copied to a safe place before x7 is used as scratch.
Note that the standard function entry sequence destroys x0, x1, and x2. On function entry, the AQ contains the first two words of the argument vector. On function return, the Q Register contains the function's return value, if any, as supplied by "return(value)". (In C and Pascal, the E, A, and Q registers may contain this return value.)
A function which accepts a variable number of arguments may be written in one of two ways.
Some maximum number of arguments to the function may be decided upon, and the function can assume that this maximum will not be exceeded. In this way, the maximum size of the function's stack frame (which is partially determined by the number of arguments the function is designed to accept) can be determined at compile (or assemble) time. Unused arguments are just so much unused space on the stack.
In execution, the function can call NARGS to determine exactly the number of arguments supplied by the caller. The remaining arguments, those which are not supplied, can then be initialized to default values by your program.
Since the number of arguments to this style of function is really fixed, even if not all are supplied, the standard .ENTRY function sequence may be used for this type of variable-number-of-arguments function.
The second case occurs when there can be no real specification of a maximum number of arguments, e.g. a call to CONCAT or PRINTF. One could conceivably define each function to accept 1000 arguments and only process those that were supplied, but this would waste a lot of stack space. The problem here is that the maximum size of the callee's stack frame cannot be determined at compile time, if one has to take into account the maximum number of arguments which might be supplied to the callee and which must be stored in the callee's stack frame.
The way around this is to keep the list of arguments out of the callee's stack frame, and to make the function accept and use a pointer to the list of arguments instead. When building the argument list, the arguments can be stored, as usual, in the "temporaries" section of the caller's stack frame. (Note that the caller knows how many arguments there are in the function call, and has made sure it has allocated enough "temporary" space for itself.) In the normal stack-bump procedure, this "temporaries" area becomes the beginning of the new callee's stack frame, so we need a special function entry subroutine to "double-bump" the stack frame of the callee past both the local storage of the caller (as usual) and also past the segment of the "temporaries" area where the argument list is stored. This function is .VNTRY.
(This would be a function written in GMAP to process a variable number
of arguments.)
tsx0 .vntry
zero framsi,dbgptr
The called function will have been written to accept and use this pointer to its argument list, rather than using the conventional arguments-in-callee-stack-frame approach. In this way, the arguments do not reside in the callee's stack frame and the size of the callee's stack frame can be determined at compile (assemble) time, without worrying about how many arguments will actually be supplied to the function when it is called.
Both GETVEC and RLSEVEC make use of the caller's linkage area to store local variables. Neither function needs to call .ENTRY or bump the stack, making both functions very efficient.
B supplies the run-time functions GETVEC and RLSEVEC to allocate and free vectors of words in the pool of run-time memory above the end of your program.
If you use GETVEC to request more memory than can be satisfied in this pool of run-time memory, B attempts to grow your program by obtaining additional memory from the operating system. The memory is always obtained in segments of 1K (1024 words) or more. This additional memory increases the memory-image size of your executing program correspondingly. There is an upper limit on the size of a TSS program imposed by TSS; this limit will be different at different sites.
If you use RLSEVEC to free one or more entire 1K segments of memory at the top of your program's memory-image, you may call RELMEM to return this memory to the operating system and reduce the size of your memory-image again. RLSEVEC does not call this function itself, no matter how much of the run-time memory pool you de-allocate. RELMEM must be called explicitly, and you must ensure that nothing in the segment(s) you are releasing will ever be accessed from your program again. Attempts to access this released memory will result in a memory fault.
The pool of unallocated memory above the last word of your program is managed as a linked list of unallocated blocks of words. Initially, during set-up when your program begins execution, this entire pool is made unallocated as one block by calling RLSEVEC. RLSEVEC initializes the external word VEC.CH to contain:
As memory is alternately allocated and de-allocated, the first word of each new block of unallocated memory is used in a similar manner to VEC.CH, as a pointer to the next unallocated block. The pointers all have the same format as VEC.CH.
Note that the size field in the pointer word in a block of unallocated memory does not refer to the size of the block it is imbedded in, but to the size of the block pointed to in the upper half of the word, i.e. the next unallocated block. In other words, the size of any given unallocated block of memory is contained in the pointer word of the previous block.
This chaining of pointers creates a forward-linked list of blocks of unallocated memory with an originating pointer in VEC.CH. RLSEVEC sees to it that when contiguous blocks of words are freed, they are merged into one larger block. It also ensures that newly freed blocks are inserted into the chain in such a way as to keep the linking pointers in ascending order, from low to high addresses. The last block of unallocated memory, which cannot point to anything, has its pointer set to 0777777777777.
GETVEC allocates a vector in run-time memory by scanning down the linked list of unallocated blocks, and looking for the first block of a size large enough to satisfy its request. This scan is done by one RPL assembler instruction, and is very fast. The requested size of vector is "broken off" the block of unallocated memory and the link pointer for the block is adjusted to reflect the allocation. GETVEC returns a pointer to the allocated vector.
If GETVEC fails to find a block of a size that will satisfy its request, the scan goes right to the pointer in the last block of memory with the value 0777777777777. This last pointer is actually interpreted by the RPL instruction as a 256K segment of memory (0777777) starting at location 256K (0777777). Even if the RPL scan does not find a large enough block of unallocated memory in the body of the linked list, the scan will stop here on seeing the 256K segment. This special pointer is an indication that memory will have to be obtained from the operating system to grant the GETVEC request. Failure to obtain this memory results in a "0K" abort in batch, and the message "not enough core to run job" in TSS.
Since the linked list of blocks of unallocated memory is maintained solely by the single chain of pointers and sizes running through the first word of every unallocated block, any program which tampers with or overwrites any of this chain, whether intentionally or by "walking off the end" of an allocated block of memory, is almost certainly doomed to failure the next time GETVEC attempts to scan this chain.
A favorite way of effecting this manner of program suicide is walking off the end of the main stack, which is itself one of the first vectors dynamically allocated by set-up. In a similar manner, you might write data past the end of a vector, onto the first word of some block of unallocated memory (e.g. a=getvec(10); a[11]="overwrite!";). More difficult to detect perhaps is the use of a pointer to a vector after you have freed the vector, as in
a = getvec(10); rlsevec(a,10); a[0] = "overwrite!";In this case, even staying within the bounds of the original vector leads to trouble when you overwrite the first word of the free block and destroy the link pointer put there by RLSEVEC.
When the link-following scan used by GETVEC (a GMAP RPL instruction) runs across the overwritten pointer, it may either cause an immediate memory fault or, possibly worse, it may allocate the vector at the random address formed by the upper half of the overwritten word. In the BOFF debugger, you can follow the chain of pointers starting at the external VEC.CH to determine the integrity of your run-time memory pool.
Normally, every function either returns to its caller or calls EXIT to terminate the program. Traps let you temporarily save the current return address of a function and replace it with a new one. When the trapped function does its return transfer, the new return address sends control into a "trap handler" that can perform user routines. In the B run-time library, traps are used to arrange that some special routine be performed at the future ending of a particular function invocation. This may best be illustrated by an example.
The B library function ALLOCATE sets up a temporary dynamic vector that must be released when the function that called ALLOCATE returns. The releasing of this vector is assured by the trap which ALLOCATE sets in its caller. When the trapped function does its return transfer, the trap handler gets control and makes sure the dynamic vector is released. It does not matter where the function return takes place, at the end of the function or at any point before; the trap is still invoked by the return process itself.
Traps apply to function calls because they are associated with the call/return linkage of a specific stack frame. The current invocation of a function might have a trap set in its return linkage, but a previous invocation might not. More precisely, the trap is set by replacing the lower of the two call/return linkage words currently being used by a function. The original word containing the function's return address is saved; the replacement word contains a new return address that points to the trap handler. Once control passes to the trap handler, the handler removes the trap by restoring the saved bottom linkage word back into the linkage.
Traps are temporary. The first thing the trap handler does on gaining control when a function returns is to delete the trap by restoring the bottom linkage word (containing the pre-trap return address of the function) back into the function's linkage. It then calls a user-specified function which may call other functions, free allocated memory, or print debug information before returning to the trap handler. The trap handler then does a normal return transfer using the return address in the restored bottom linkage word. If this was the only trap set on this function, this return transfer is to the function's caller, as usual.
The bottom linkage word saved and replaced at the time a trap is set may even be one which has already been saved and replaced by a previously set trap. In this case, the return transfer at the completion of the current trap will use the bottom linkage word which was replaced by the previously set trap. This sends control through the trap handler again into this previous trap, instead of into the function's caller. Similarly, a virtually endless series of traps can be chained together for the same function invocation. In effect, the traps are "stacked" and will trigger in reverse order to the order they were set, i.e. the last trap set is the first to be invoked. The first trap set is the one that saved the function's real return address from the linkage. The return transfer out of this first trap returns to the function's caller.
A trap is set using the internal library routine:
trap_blk = .trap(user_function, trap_blk_size, stack_frame);(You are cautioned not to use this routine yourself.)
"trap_blk" is a pointer to a dynamically allocated block of memory, the size of which is requested by trap_blk_size. This trap_blk is associated uniquely with this trap. "user_function" is the name of the user-supplied function to be called once the trap handler gains control. "trap_blk_size" is the size of the trap_blk. "stack_frame" is the stack pointer address of the stack frame to set the trap in.
When .TRAP is called, the trap_blk is dynamically allocated and initialized as follows:
Word Comment <nn> - <nn> is trap_blk_size . . - user requested memory space . 0 - this address returned to the user by .TRAP -1 upper: pointer to stack frame where trap set lower: trap_blk_size -2 upper: address of user_function lower: backward pointer to 3 words above the location where the trap is set -3 copy of the function's lower linkage word
Note that the actual size of the trap_blk is three words greater than the given trap_blk_size, because three words of internal information are needed. Also remember that references to the call/return linkage of any function actually reference the pair of words stored in the caller's stack frame. The trap_blk now contains all the information needed by the trap handler.
The trap is set by replacing the bottom linkage word in the specified stack_frame by a word pointing to the trap handler routine (upper half) and to the local trap_blk (lower half). When the return transfer out of the trapped function takes place, the bottom linkage word is accessed by .RETRN. The transfer takes place through the return address in this word; therefore it goes to the trap handler instead of whatever return address was in the original linkage word (normally, the return address of the caller).
This is the sequence of steps taken when control passes into the trap handler:
If the bottom linkage word saved in the current trap_blk
at the time the trap was set was itself saved and replaced
by a trap set earlier, then the return transfer out of the trap
handler is an effective "sideways bounce" into the previous trap:
This does not become an infinite loop because each trap has its own unique trap_blk allocated for it. Each invocation of the trap handler restores the bottom linkage word from its own trap_blk, processes its own user_function, and releases its trap_blk. Ultimately, the linkage word which is restored has to be the original return address, which comes from the first trap_blk allocated by the first trap set. The return address in this linkage word is not to the trap handler, and there is no associated trap_blk. This last time, .RETRN will transfer back into the caller.
Non-local gotos are transfers from an arbitrary location in one function to an arbitrary location in another. In practice, this means from one existing function invocation to another existing function invocation. The process presents two special problems for the run-time environment:
It is necessary to trigger the traps, since the traps were intended to execute when their respective functions ended. A non-local goto effectively ends several functions all at once, and it would not do to have the tidying-up features of the traps bypassed. As an example: ALLOCATE uses the trap feature to free the dynamic vectors it has allocated. If the goto bypassed these de-allocation traps, useful memory would be lost.
Since non-local gotos depend on being able to find an existing stack frame as a destination, it is only possible to go to function invocations still in existence, i.e. still having stack frames on the stack. These function invocations are all represented by stack frames below the one for the function originating the goto. Since each stack frame is centred on a word containing the saved stack pointer of the previous stack frame, it is easy to "walk backwards" down this chain of stack pointers, looking for a particular stack frame. When a match is found, the goto succeeds. If no match is found, disaster ensues when the walk finds the zero fence which marks the rock bottom of the chain of stack pointers. An error message is generated and the program aborts.
During the backward walk down the stack, traps must be identified and invoked. The fact that the return address of every function has been saved twice into both words in the call/return linkage for that function, lets us tell if one return address has been altered to point to the trap handler. If both return addresses are the same, no trap exists. If they differ, the bottom linkage word has been replaced by a trap and the trap can be invoked.
Referring back to the notes on the trap handler, remember that when the trap handler called the user_function it supplied a pointer to the trap_blk in the A Register, and the value "1" in the Q Register. The "1" indicated that the trap was being sprung normally, by a function returning. When a non-local goto is performed, the trap handler is not used to invoke the trap. Instead, the goto mechanism itself invokes the trap, and calls the user_function with a "0" in the Q Register. This indicates to the user_function that the stack is merely being "unwound", and that this is not a normal trap invocation. The user_function may switch on this value, to perform slightly different processing than during a regular trap sequence.
The B library functions SETEXIT and RESET perform non-local gotos by using the internal library routine:
.goto(stack_frame, IC [, Q_Reg ]);(You are cautioned not to use this routine yourself.)
"stack_frame" is the stack pointer address of the stack frame to transfer into. "IC" is the address at which to restart execution. The optional "Q_Reg" is a value to place in the Q Register before restarting execution.
.GOTO performs the stack unwinding housekeeping described above, invoking traps along the way until an exact match to the stack frame address is found. The "Q_Reg" value is loaded, and execution continues at location "IC".
Debug tables provide a method of keeping around the names you have given to your variables and memory locations, so that you can recognize and verify them using an appropriate debugging program (e.g. BOFF). There are three types of debugging tables generated in the course of compiling and loading a B program:
You may wonder how the loader debug tables can be of any use in debugging an aborted program if the tables are released by B set-up as soon as the program begins execution. The answer lies in the BOFF debugger's ability to use the tables from your original load module to examine and debug the program in the abrt file. See "expl boff" for details.
When released at B set-up, the address of the loader symbol table is:
ltable = *(&.setu.-4) & 0777777;Its format consists of a series of two-word entries containing the BCD symbol name and the address of the symbol:
Word Contents <nn> Fence value -1 . . . 4 (etc.) 3 upper: address of symbol lower: not used 2 BCD symbol name 1 upper: address of symbol lower: not used 0 BCD symbol name -1 upper: size (words) of loader debug table lower: not used
The local tables are pointed to by the "dbgptr" half of the word following the standard "tsx0 .entry" instruction in a function:
func tra func+1 tsx0 .entry zero framsi,dbgptrA function's local debug tables come after the actual code of the function (in higher addresses). There are two types of local table: the line number table, and the local symbol table. They follow each other:
Word Contents <nn> Fence value -1 . . <local symbol table> . . Fence value -1 . . <line number debug table> . 0 upper: backpointer to word containing dbgptr lower: line no. of func definition
In some cases, the part before the first fence (i.e. the line number table and the back pointer word) is missing. This is the case for scopes that are not associated with functions. For example, in C, each source file is a separate scope (file scope) independent of the functions inside the source file. Static objects declared outside of all functions are global in the sense that they can be referenced by any function in the file itself, but are local in the sense that they cannot be referenced by functions outside the file. Such a scope would have a local symbol table and two fences, but not a line number table.
Each entry in the line number table is a single word containing a line number in the bottom 13 bits, the type of expression identified at this line number in the next 5 bits, and the address which corresponds to the start of this expression in the upper half word. In the rare case where the line number is too large to fit in 13 bits (over 8192 lines), the line number is placed in the full word following this entry and the bottom 13 bits of the first word (which would normally contain the line number) are zeroed.
The types of expressions chosen to be recognized and listed in the debug table are: expr, break, goto, next, return(<value>), if, switch, while, repeat, else, for, do, and return. Pascal has a wider range of statement types than B; the table for Pascal is correspondingly richer in identified expressions. The table for C is about the same as the table for B, since the languages are very similar.
Each entry in the symbol table
consists of a two-word ASCII name followed by one word containing:
1 -- auto variable in stack
2 -- B label
3 -- C static object
4 -- static function
5 -- "static parent"
(A static parent entry is used when one scope is enclosed in another scope.
For example, one Pascal subprogram may be declared inside another; the
enclosing scope is the static parent of the enclosed scope.
As another example, the C file scope is the static parent of all function
scopes inside the source file.)
The ordering of the local symbol table appears random; it is actually a copy of the hashing order chosen by the compiler.