Thinkage GCOS8 SS C Reference Manual

Copyright © 1989, 1990, 1992, 1994, 1995, 1996 by Thinkage Ltd.

1. Introduction
2. Constants
    2.1  Decimal Integers
    2.2  Octal Integers
    2.3  Hexadecimal Integers
    2.4  Long Integers
    2.5  Unsigned Integers
    2.6  Floating Point Constants
    2.7  ASCII Character Constants
    2.8  Escape Sequences
    2.9  String Constants
    2.10  BCD Character Constants
3. Data Objects
    3.1  Identifiers
        3.1.1  Keywords
    3.2  Declarations
    3.3  Fundamental Data Types
        3.3.1  Character Data
        3.3.2  Integers
        3.3.3  Unsigned Objects
        3.3.4  Floating Point Numbers
    3.4  Derived Data Types
        3.4.1  Pointers
        3.4.2  Arrays
        3.4.3  Strings
        3.4.4  Structures
        3.4.5  Unions
        3.4.6  Functions
        3.4.7  Enumerated Data
    3.5  The void Type
    3.6  Type Groups
    3.7  Data Type Conversions
        3.7.1  Standard Arithmetic Conversions
4. Expressions
    4.1  Order of Evaluation
    4.2  Lvalues and Rvalues
    4.3  Primary Expressions
        4.3.1  Identifier Primary Expressions
        4.3.2  Constant and String Primary Expressions
        4.3.3  Parenthesized Expressions
        4.3.4  Subscripted Primary Expressions
        4.3.5  Function Primary Expressions
        4.3.6  The . Operator
        4.3.7  The -> Operator
    4.4  Unary Operators
        4.4.1  The Indirection Operator
        4.4.2  The Address Operator (&)
        4.4.3  The Unary Plus Operator (+)
        4.4.4  The Unary Minus Operator (-)
        4.4.5  The Logical Negation Operator (!)
        4.4.6  The Bitwise Complement Operator (~)
        4.4.7  The Auto-Increment Operator (++)
        4.4.8  The Auto-Decrement Operator (--)
        4.4.9  The Cast Operation
        4.4.10  The __alignof Operator
        4.4.11  The sizeof Operator
    4.5  Multiplicative Operators
        4.5.1  The Multiplication Operator (*)
        4.5.2  The Division Operator (/)
        4.5.3  The Remainder Operator (%)
    4.6  Additive Operators
        4.6.1  The Addition Operator (+)
        4.6.2  The Subtraction Operator (-)
    4.7  Shift Operators
        4.7.1  The Left Shift Operator (<<)
        4.7.2  The Right Shift Operator (>>)
    4.8  Relational Operators
    4.9  Equality Operators
    4.10  The Bitwise AND Operator (&)
    4.11  The Bitwise Exclusive OR Operator (^)
    4.12  The Bitwise Inclusive OR Operator (|)
    4.13  The Logical AND Operator (&&)
    4.14  The Logical OR Operator (||)
    4.15  The Conditional Operation (?:)
    4.16  Assignment Operators
        4.16.1  Simple Assignment
        4.16.2  Compound Assignments
    4.17  The Comma Operator (,)
    4.18  Constant Expressions
5. Statements
    5.1  Comments
    5.2  Expression Statements
    5.3  Compound Statements or Blocks
    5.4  The if Statement
    5.5  The while Statement
    5.6  The do-while Statement
    5.7  The for Statement
    5.8  The break Statement
    5.9  The continue Statement
    5.10  The switch Statement
    5.11  The return Statement
    5.12  Labels
    5.13  The goto Statement
    5.14  The Null Statement
6. Declarations
    6.1  Use of Identifiers
        6.1.1  Scope of Identifiers
        6.1.2  Linkage of Identifiers
        6.1.3  Name Space of Identifiers
        6.1.4  Storage Duration of Objects
    6.2  The Format of Declarations
    6.3  Storage Class Specifiers
        6.3.1  Automatic Declarations
        6.3.2  Register Declarations
        6.3.3  Static Declarations
        6.3.4  Extern Declarations
        6.3.5  Typedef Declarations
    6.4  Type Specifiers
        6.4.1  The const Specifier
        6.4.2  The volatile Specifier
        6.4.3  Structure Specifiers
        6.4.4  Union Specifiers
        6.4.5  Enumeration Specifiers
    6.5  Declarators
        6.5.1  Pointer Declarators
        6.5.2  Array Declarators
        6.5.3  Function Declarators
        6.5.4  Reading Declarations
    6.6  Initializers
        6.6.1  Static Duration Objects
        6.6.2  Automatic Duration Objects
        6.6.3  Initializing Scalar Objects
        6.6.4  Initializing Array Objects
        6.6.5  Initializing Character Arrays
        6.6.6  Initializing Structure Objects
        6.6.7  Initializing Union Objects
    6.7  Type Names
7. Program Structure
    7.1  Function Definition
        7.1.1  Variable Argument Lists
    7.2  Argument Passing
    7.3  Effect of Prototypes on Function Calls
        7.3.1  Argument Conversion Rules
        7.3.2  Passing Derived Types
    7.4  Return Values
    7.5  Flow of Control
        7.5.1  The main Function
    7.6  Program Parameters
    7.7  Program Status
8. Source Code Preprocessing
    8.1  Preprocessor Symbols
    8.2  Preprocessor Directives
        8.2.1  The #define Directive
        8.2.2  The #undef Directive
        8.2.3  The #if Directive
        8.2.4  The defined Expression
        8.2.5  The #ifdef and #ifndef Directives
        8.2.6  The #include Directive
        8.2.7  The #line Directive
        8.2.8  The #error Directive
    8.3  Implementation Specific Directives
        8.3.1  The #warn Directive
        8.3.2  The #equate Directive
        8.3.3  The #alias Directive
        8.3.4  The #eval Directive
        8.3.5  The #secondary Directive
        8.3.6  The #aligned Directive
        8.3.7  The #noreturn Directive
        8.3.8  The #optresult Directive
        8.3.9  The #used Directive
        8.3.10  The #varargs Directive
        8.3.11  The #argsused Directive
        8.3.12  The #notreached Directive
        8.3.13  The #copyright Directive
        8.3.14  The #title Directive
        8.3.15  The #version Directive
        8.3.16  The #idempotent Directive
        8.3.17  Special Directives in Standard Headers
        8.3.18  The #pragma Directive
    8.4  Trigraphs
    8.5  Input Concatenation
    8.6  Translation Phases
9. The C Library
    9.1  Library Concepts
        9.1.1  Headers
        9.1.2  Functions and Macros
        9.1.3  Standard Headers
        9.1.4  Error Return Values
        9.1.5  The Errno Symbol
        9.1.6  Error Names
    9.2  I/O Concepts
        9.2.1  Standard Streams
        9.2.2  I/O Modes
        9.2.3  Buffering
        9.2.4  Standard I/O Routines
        9.2.5  Opening Files
        9.2.6  Closing Files
        9.2.7  The getc Function
        9.2.8  The putc Function
        9.2.9  Reading and Writing Strings
        9.2.10  Formatted Output: printf
        9.2.11  Formatted Input: fscanf
        9.2.12  Redirecting Standard Streams
    9.3  String Manipulation Functions
    9.4  Memory Allocation Functions
    9.5  Using Variable Argument Lists
    9.6  Signals
    9.7  Miscellaneous Routines
Appendix A: Escape Sequences
Appendix B: Characteristics Files
    B.1  Limits
    B.2  Floating Point Characteristics
Appendix C: Library Names
Appendix D: Converting Old Programs
    D.1  The Library
        D.1.1  Time Routines
        D.1.2  The printf Family
        D.1.3  B Routines
Appendix E: Extensions
    E.1  Bit Fields
    E.2  The __typeof Operator
    E.3  Improved Constant Expressions
    E.4  Address of Constant Expression
    E.5  Reference Types
    E.6  Macros with Variable Argument Lists
Appendix F: Near, Far, and Huge Objects
Appendix G: Implementation-Defined Behavior
    G.1  Translation
    G.2  Environment
    G.3  Identifiers
    G.4  Characters
    G.5  Integers
    G.6  Floating Point
    G.7  Arrays and Pointers
    G.8  Registers
    G.9  Structures, Unions, and Enums
    G.10  Qualifiers
    G.11  Declarators
    G.12  Statements
    G.13  Preprocessing Directives
    G.14  Library Functions

1. Introduction

The C programming language was developed by B.W.Kernighan and D.M.Ritchie at Bell Telephone Laboratories. The version of C described in this manual was created and is maintained by Thinkage Ltd.

This version of C is based on the ANSI standard for C, ANSI document X3.159-1989. Features that are non-standard (i.e., extensions) are noted as such.

2. Constants

C lets you define several kinds of numeric constants as well as ASCII character and string constants.

Each kind of constant has its own data type. Data types are explained in more detail in Chapter 3. The rest of this chapter describes the kinds of constants available in C, and how to represent them in C source code.

2.1 Decimal Integers

Decimal integer constants are written as ordinary integers with no leading zero, as in

2400   5   87   1340932

Decimal integers are stored in one 36-bit word. A decimal integer has the int data type unless it is greater than the maximum positive value that can fit in an int; if it is too large for int, its data type is unsigned int.

Note: Technically speaking, negative integers are not considered constants. For example, -2 consists of the negation operator '-' followed by the integer constant 2; thus it is a constant expression, not a simple constant. The same goes for other types of numeric constants.

2.2 Octal Integers

Octal integer constants are written as integers formed with only the octal digits (from zero to seven). To distinguish octal integers from decimal integers, octal constants must begin with at least one leading zero, as in

01   007   000400   0777

Octal integers are stored in one 36-bit word. Each octal digit represents three bits. An octal integer has the int data type unless it is greater than the maximum positive value that can fit in an int; if it is too large for int, its data type is unsigned int.

2.3 Hexadecimal Integers

Hexadecimal integer constants are written as a string of hexadecimal digits. These include the numeric digits zero to nine, as well as the letters 'A' through 'F' standing for the values 10 through 15. The letters may be upper or lower case. To distinguish hexadecimal integers from other integer forms, hexadecimal constants must begin with 0x or 0X, as in

0x10   0XC0BB   0x12c   0XFFFF

Hexadecimal integers are stored in one 36-bit word. Each hexadecimal digit represents four bits. A hexadecimal integer has the int data type unless it is greater than the maximum positive value that can fit in an int; if it is too large for int, its data type is unsigned int.

2.4 Long Integers

All decimal, octal, and hexadecimal integer constants may be marked as long by immediately following the constant with an L or l (lower case L). For the GCOS8 compiler, this feature is included only for compatibility with C on other machines; because of the hardware architecture, there is no difference between a normal integer and a long one.

2.5 Unsigned Integers

You can add a u or U to any integer constant to indicate an unsigned integer data type. For example, 23U represents an unsigned value instead of a normal (signed) integer. Similarly, 135UL has the unsigned long data type. For more information about unsigned types, see "Unsigned Objects" in Chapter 3.

2.6 Floating Point Constants

Numeric constants containing a decimal point are treated as floating point numbers. Floating point constants may be preceded by leading zeros if desired.

Floating point constants may have an exponential part as well as the usual integer part, decimal point, and fractional part. The exponential part consists of an e or E followed by a signed integer exponent. If the exponent is positive, you may omit the + sign before the exponent. The following are examples of valid floating point constants.

2.3   1.   3.E5   72.48e-4   0.34e2  .123

Floating point constants are normally stored in double precision format, which means they occupy a total of 72 bits and have the double data type. To obtain a single precision constant (the float data type), add an f or F to the end of the number, as in 1.0F. You can also add an l or L to a floating point constant to indicate the long double type, but on GCOS8 machines, there is no difference between double and long double. For more information on floating point types, see "Floating Point Numbers" in Chapter 3.

2.7 ASCII Character Constants

An ASCII character constant consists of one to four ASCII characters enclosed in single quotes, as in

'a'    'ab'    'abc'    'abcd'

An ASCII character constant is stored in one 36-bit word. Each character occupies nine bits (one byte). If there are less than four characters in a constant, the given characters are stored right-justified in the word and the remaining bytes on the left are filled with 0-bits. A byte of 0-bits is represented by '\000' or more simply '\0'. Thus the constants 'ab' and '\0\0ab' are identical once they have been stored.

ASCII constants with more than one character may be handled differently in different implementations of C. In the interest of portability, you should avoid constants that contain more than one character.

ASCII character constants always have the int data type (even though C also has a char data type for character data). Because ASCII character constants are ints, they are always signed quantities. For more information on character types, see "Character Data" in Chapter 3.

2.8 Escape Sequences

Constructs like '\0' are known as escape sequences. These constructs represent certain ASCII characters, most of which are non-printable. They generally appear as characters within ASCII character constants and ASCII string constants.

We have already noted that '\0' stands for the ASCII null, octal 000. Other common ASCII escape sequences are '\n' for a new-line (linefeed), '\t' for a horizontal tab, '\b' for a backspace, '\\' for a backslash, and '\'' for a single quote. For example,

'\'\n'

is an ASCII character constant consisting of a single quote and a new-line character. For a list of other accepted escape sequences, see Appendix A.

The most general escape sequence is '\' followed by up to three octal digits. This stands for the ASCII character whose octal representation is given by the digits. For example, 'A' is equivalent to '\101'. Note that the escape sequence stops with the first character that is not an octal digit (or at the third octal digit). Therefore '\08' consists of two characters: '\0' and '8' (since '8' is not an octal digit). However, '\000' is equivalent to '\0'.

Another general escape sequence is '\x' followed by any number of hexadecimal digits. This stands for the ASCII character whose hexadecimal representation is given by the digits.

2.9 String Constants

A string constant (also called a string literal) is a sequence of zero or more ASCII characters enclosed in double quotes, as in

"this is a string"
""
"the above is a null string"
"this is split\nover two lines"

In the final example, the '\n' breaks the string into two separate lines when it is printed. This example also illustrates that ASCII escape sequences may be used freely inside strings.

Internally, a string constant is stored as an array of characters. This type of data object is described in "Strings" in Chapter 3. Characters are packed four to a word, as with ASCII character constants.

The compiler adds a null byte '\0' to the end of each string constant. Thus "abc" is stored as an array with the contents 'abc\0'. Programs that scan the string can easily locate the end of the string by looking for the '\0'.

In source code, double quote characters appearing within the string must be preceded by a backslash (e.g., '\"') as in

"He said, \"Hello there.\"\n"

A string constant cannot normally be broken over more than one line of source code. For example,

"Hello
there"

would receive an error message. If you want a string to contain a new-line character, use the '\n' escape sequence. If you want to break a string over more than one line of source code, put a backslash at the end of each internal new-line character. C discards the backslash and the new-line character that follows it, thereby joining input lines. For example,

"Hello\
there"

is valid and is equivalent to

"Hellothere"

2.10 BCD Character Constants

A BCD character constant consists of one to six BCD characters enclosed in grave accents, as in

`a`      `012`      `abcdef`

Such a constant is stored in a 36-bit word. Each BCD character occupies six bits. If there are fewer than six characters specified for the constant, the given characters are right-justified in the word and filled on the left with BCD zeros (octal 00).

BCD character constants are not recognized by the ANSI standard; they are an extension. In the interests of portability, programmers should avoid their use.

A number of library routines convert BCD characters to ASCII (e.g. _bcdasc). For the purpose of the conversion, the BCD characters are considered to be lowercase, except when stated otherwise.

3. Data Objects

C recognizes many different classes of data objects. These include simple variables of various types, as well as arrays, pointers, structures, and unions. This chapter describes how these objects are used.

3.1 Identifiers

Data objects in C are referenced using names or identifiers. Identifiers may be formed from the uppercase letters ('A'-'Z'), the lowercase letters ('a'-'z'), the digits ('0'-'9'), and the underscore ('_'). Identifiers may not begin with a digit.

The C compiler accepts identifiers of any length, and all characters are significant.

Unlike some other programming languages, C pays attention to the case of letters in identifiers. Thus the identifiers SUM, Sum, and sum refer to three distinct entities.

3.1.1 Keywords

The keywords of C may not be used as identifiers in C programs. These keywords must be entered in lowercase.

The recognized keywords in C are listed below.

__alignof  __far      __huge     __near
auto       break      case       char
const      continue   default    do
double     else       enum       extern
float      for        goto       if
int        long       register   return
short      signed     sizeof     static
struct     switch     typedef    union
unsigned   void       volatile   while

The keywords beginning with two underscore characters may not be recognized by other implementations of C. The keywords __far, __huge, and __near are discussed in Appendix F.

In addition to the keywords listed above, your programs should avoid creating names beginning with an underscore, or with any of the following character sequences:

E    is    mem    SIG    str    to

Such names are reserved to allow for future expansion of the ANSI standard library.

You should also avoid defining data objects or subprograms whose names are the same as library routines and other library symbols. The language does not prevent the creation of such matching names, but the practice almost always gets you in trouble eventually. Appendix C gives a list of library names.

3.2 Declarations

Every identifier that appears in a C program must be declared. Such declarations are generally given at the beginning of every function that references the identifier. Thus the first statements in a function usually consist of declarations for every identifier used in the function. In addition to this kind of declaration, one class of variables (external variables) must be declared separately, outside of all defined functions (hence the name "external").

The format of declarations is described in Chapter 6. For now, we simply note that a declaration lists the names of one or more data objects and describes the nature of these objects. In particular, a declaration states the type and storage class of one or more objects. A declaration may also indicate that an object is an array, a function, a pointer, a structure, or a union. The types that may be given to an identifier are described in the next sections.

3.3 Fundamental Data Types

The fundamental data types are

int       integer
char      character
double    floating point

The sections to come describe each fundamental data type and give examples of how to declare simple variables of that type. Note that these examples also show how to initialize simple variables using a declaration of the form

type identifier = value;

For example,

int i = 1;

initializes the integer i to the value 1. These initializations are merely intended to give an idea of what the data types look like; for specific details on initializations, see Section 6.6.

3.3.1 Character Data

Objects declared as char are allocated enough storage to hold one ASCII character. On GCOS8, this implies that the object occupies at least nine bits. The amount of space required to hold a char value is called a byte.i.byte;

Examples:
char a;
char i, j, k;
char A = 'A';

Objects declared as char may be used as numeric values. The numeric value of a char object is always non-negative. When a char value is used numerically, it is converted to a non-negative value of the signed int type.

Because a char object always represents a non-negative value, it is considered to be an unsigned type. To emphasize this, you may declare objects to be unsigned char; this is equivalent to the normal char type.

Objects may also be declared as signed char. A signed char value is considered negative if the high order bit of the value is on. When a signed char value is used numerically, it is converted to a positive or negative int value, depending on whether the sign bit is off or on.

If a character constant with more than one character is assigned to a character variable, the variable is assigned the last character in the constant. For example,

char a;
a = 'bc';

assigns the character 'c' to the variable a. Such an operation makes for confusing code and should be avoided.

Even though char means unsigned char in this implementation of C, the ANSI standard also allows C compilers to treat the char type as signed char. If you wish to write portable programs, you must pay particular attention to situations where the difference between signed and unsigned char values may become important. In such cases, you should explicitly declare the values signed or unsigned.

Long Characters: As an extension to the ANSI standard, the GCOS8 C compilers also support a long char type. long char values are 18 bits long. By default, the long char type is unsigned. You can also declare data to be signed long char.

3.3.2 Integers

Objects declared as int are allocated one 36-bit word of storage on GCOS8. This word is presumed to contain an integer in the system's standard format.

For compatibility with C compilers on other systems, integer objects may be declared as short int, int, or long int. On GCOS8 systems, there is normally no difference between these three data types. However, if you specify the +18bitShorts option on the command line when compiling, short objects are treated as long char (which means that they have 18 bits instead of 36). Note that long char items are always 18 bits long, regardless of command line options.

Examples:
int a, b, ccc, xone, pq;
int octal10 = 010;
int decim10 = 10;
int hex10 = 0X10;
long int v = 234 + 647;
short int k = 1;

short int may be abbreviated to short, and long int may be abbreviated to long.

An object declared to be int, short, or long may contain a positive or negative value. Therefore such objects are said to be signed. To emphasize this, you may declare objects to be signed int, signed short, or signed long. These types are equivalent to normal int, short, and long (respectively).

3.3.3 Unsigned Objects

Just as you can declare an unsigned char object, you can declare objects to be unsigned int, unsigned short, or unsigned long. An object whose type is just declared as unsigned is taken to be unsigned int.

An unsigned object takes up the same amount of memory space as its signed counterpart; however, the value of an unsigned object is always considered non-negative. The high order bit of an unsigned value is part of the number, not a sign bit. This means that an unsigned object can hold twice the range of positive numbers as a signed object can.

Examples:
unsigned int t1 = 0100000U;
unsigned long int t2 = 0100000LU;
unsigned char odd = '\377';
unsigned u = 1U; /* unsigned int */

Arithmetic with unsigned values is always performed using the laws of arithmetic modulo the largest value of the type, plus 1. For example, arithmetic with unsigned int values is performed modulo 2**36. Unsigned arithmetic operations can never overflow because large results are reduced modulo 2**36.

3.3.4 Floating Point Numbers

Objects declared double are allocated 72 bits of storage on GCOS8. This is presumed to contain a double precision floating point number in the system's standard format.

Objects declared float are allocated 36 bits. This is presumed to contain a single precision floating point number in the system's standard format.

Objects can also be declared long double. On a GCOS8 machine, this is equivalent to double.

Examples:
float x, y, z;
float b = 3.56e10;
double bb = 3.56e10;
long double g = 0.5 / 1.0E5 ;

3.4 Derived Data Types

Derived data types are constructed from the fundamental data types described in the previous section. For example, a simple array is a collection of elements, each of which may be a single object of one of the fundamental types.

3.4.1 Pointers

A pointer variable contains information that "points" to an area of memory. A pointer occupies at least 18 bits in memory on GCOS8.

A pointer's type indicates what sort of data is stored in the memory area where the pointer points. For example, a program may have pointers to integers, pointers to character data, or pointers to more complex data types (e.g., structures or arrays).

Declaring Pointers: A declaration indicates a pointer variable by putting a '*' in front of the variable's name. For example,

int *p;

declares a variable p that can be used to point to an integer value.

int x, *px;

declares a standard integer x and a pointer named px.

Chapter 4 discusses details of pointer arithmetic. This chapter simply gives a few examples of how pointers may be used. To do this, we must introduce two operators that are commonly used in C programs.

Obtaining Pointer Values: The & operator obtains a pointer to a C variable. Thus in

int x, *px;
px = &x;

the second statement uses &x to obtain a pointer to the variable x, then assigns this pointer value to px. In other words, the pointer px is pointed towards x. This is one way of initializing pointer variables. Another way would be

int x, *px = &x;

Pointer References: If p is a pointer variable, *p stands for the data object to which p points. For example, consider

int u, *v;
v = &u;
*v = 12;

First, v is set up to point at u. Then, the assignment assigns the value 12 to the object indicated by v. The result is to assign the value 12 to u. As another example, consider

char a, b, *pa = &a, *pb = &b;
b = 'b';
*pa = *pb;

pa points to a and pb points to b. The last statement in the example assigns the value of variable b to variable a.

There is an important difference between the two statements

*pa = *pb;
pa = pb;

The first assignment is equivalent to a=b, since *pa refers to a and *pb refers to b. The second assignment is an assignment between pointers; thus pa and pb now point to the same item in memory (the variable b).

As shown above, pointers may be initialized when they are declared. If they are not initialized in this way, they remain undefined until they are explicitly pointed towards an appropriate object. Errors will probably occur if a pointer is used before it has been pointed at a data object.

Null Pointers: Every pointer type has a special value called the null pointer. You may have a null int pointer, a null float pointer, and so on. A null pointer is guaranteed not to point to any object of the given type. (This is not the same as an uninitialized pointer, which points to a random location in memory.)

To set a pointer to the null pointer value of its type, you assign the integer constant 0 to the pointer, as in

double *p;
p = 0;

You can also test to see if a pointer is a null pointer, as in

if (p == 0) ...

The use of 0 to represent null pointer values is merely a source code convention. The actual value of a null pointer may not be equivalent to the integer 0.

3.4.2 Arrays

An array is an ordered collection of individual data objects, each of which has the same type. For example, a program might have an array of integers, an array of characters, or an array of pointers. The data objects that make up an array are called the elements of the array.

Vectors: The simplest sort of array is just a list of individual elements, stored in consecutive blocks of memory. This sort of array is sometimes called a vector.

To declare a vector, you must indicate the type of elements in the vector, the name (identifier) of the vector, and the number of elements in the vector. For example, the declaration

int k[20];

declares a vector named k which has room to contain 20 separate integers. Note that the number of elements in the vector is given in square brackets immediately following the name of the vector in the declaration. This number can be given with any constant expression, as in

long la[20+30];

The same sort of notation is used to reference individual elements in the vector. Thus you might see a statement like

k[5] = 12;

to assign the value 12 to element five of the vector k. The number inside the square brackets is called the index or subscript of the element.

C uses the integer zero to index the initial element of a vector or array. Thus the 20 elements of k would be referred to as

k[0], k[1], k[2], ..., k[19]

Note that the maximum index of the vector is one less than the number of elements in the vector. This is a result of using zero to index the initial element.

Multi-Dimensional Arrays: Arrays of two or more dimensions are declared in a manner analogous to the declaration of vectors. For example,

float r[10][20];

declares an array of 10 arrays, each of which contains 20 floating point numbers. Individual elements in the 20 arrays are referenced with constructs such as r[0][0], r[3][5], r[9][12], and so on. The array holds a total of 200 elements (10*20). The maximum value of the first subscript is 9 and of the second subscript is 19.

C arrays may be declared with any number of dimensions. Thus,

char str[10][5][3][30];

declares a large four-dimensional array whose elements are characters.

Pointer Arrays: To declare an array of pointers, you use a declaration like

float *parr[20];

This declares an array containing 20 elements, each of which is a pointer to a floating point number. Thus parr[0] is the initial pointer in the array while *parr[0] is the number to which the initial pointer points. Note that the above statement only declares and creates the array of pointers to numbers; it does not allocate space for the numbers themselves. As with individual pointers, each element in an array of pointers must be explicitly "aimed" at an existing data object before it can be used.

Array Names as Pointers: When the name of an array appears without subscripts, it is interpreted as a pointer to the first element in the array. For example,

int ar[10];
*ar = 1;

assigns the value 1 to ar[0] (since ar points to this initial element). Other elements of the array may also be accessed by treating the array name as a pointer. For example,

*(ar+5)

is the location in memory that is five integers beyond ar; thus it is equivalent to ar[5]. Similarly, any construct of the form E1[E2] (where E1 and E2 are expressions) is equivalent to

*( (E1) + (E2) )

Addition involving pointers is described more fully in Chapter 4.

Undimensioned Arrays: You may sometimes omit the size of an array in the declaration. There are two cases in which this is allowed.

3.4.3 Strings

There is no specific "string" type for variables, but because strings are so frequently used, this section gives a brief outline of how they may be manipulated. Suppose a function contains the following two lines.

char *msg;
msg = "Good-bye world!"

The first line declares msg as a pointer to characters. The second line is implemented in two steps:

  1. a string constant
    Good-bye world!

    is created in memory;

  2. A pointer to this string is assigned to the pointer variable msg.

msg points to the 'G' at the beginning of the string. Thus *(msg) refers to the 'G', *(msg+1) refers to the first 'o', and so on. As stated previously, this terminology is equivalent to msg[0], msg[1], etc. The two statements above could be compressed into

char *msg = "Good-bye world!";

which creates the given string constant and points msg towards the first character of the constant.

All string constants have a '\0' to mark the end of the string. The following example simulates this situation.

char s[20];
s[0] = 'A';
s[1] = 'B';
s[2] = '\0';

This declares a vector s with space for 20 characters and assigns the first two characters to 'A' and 'B'. The third character is assigned a '\0' to indicate the end of the string. Thus the string in s looks the same as the string you would get with the string constant "AB".

The compiler may arbitrarily store a string constant in read-only memory. If you say

char *msg = "Good-bye world!";

you should not say

msg[0] = 'x';

The reason is that msg may be pointing into read-only memory, so the attempt to change a character in that memory results in an error.

Later chapters give more examples of strings and string manipulation.

3.4.4 Structures

A structure is a data object consisting of several sub-objects called elements. Each element has a name and a type. Below we give a typical structure declaration.

struct record {
   char name[30];
   int age;
};

This defines a structure type named record. As shown above, the name of the structure type is an identifier that appears immediately after the keyword struct in the declaration. This name is also known as the structure tag. Every object of the declared record type consists of two elements:

Once the structure of record has been declared in this way, items with the record structure may be declared with declarations of the form

struct record john, mary, bruce;

This declaration declares three objects, all having the record structure. In general, this sort of declaration consists of the keyword struct, the tag of a structure that has already been declared, and a list of variables that should have the structure type.

Allocating Structures: Structure objects always begin on a machine word boundary. The elements of a structure are stored in memory in the order in which they are declared. Elements are aligned on addressing boundaries appropriate to their type. This avoids the undesirable situation of data objects straddling a word boundary.

One result of this allocation method is that there may be unnamed holes within the memory allocated to structure elements. For example, objects of the type

struct hole {
   char a;
   int b;
};

occupy two words of memory. The first word contains a single character in its first byte and the second word contains a standard integer. Thus there is an unnamed gap in the first word.

Derived Structure Types: Pointers to structures may be declared in the same way that other pointers are declared. For example,

struct record *ph;

declares a variable ph that points at objects that have the record structure (declared previously). The declaration

struct tree {
   int entry;
   struct tree *left;
   struct tree *right;
};

may be used to define a node in a standard binary tree, with entry holding the contents of the node and left and right acting as pointers to lower nodes in the tree.

Arrays of structures may be allocated in the same way as other arrays. For example,

struct tree nodes[100];

declares an array named nodes. nodes contains 100 elements, each having the structure defined as tree. A binary tree may be established using elements in this array by setting the appropriate left and right pointers of each node.

Structures may be initialized. This process is described in Section 6.6.6.

Structures Containing Undimensioned Arrays: As an extension to the ANSI standard, the last element of a structure may be an undimensioned array, as in

structure sample {
    int count;
    double list[];
}

This declares a structure that ends with an array whose length is not determined when the program is compiled. In the example above, the "count" could give the number of elements in the "list" array.

You cannot create a structure object of this kind with a direct declaration. However, you can allocate space for such a structure using the "malloc" function (discussed in Section 9.4), allocating as much space as needed to hold the trailing array.

If you apply the sizeof operator (explained in Section 4.4.11) to this kind of structure, you get the length of the structure up to but not including the array. In the example above, you'd get the length up to the beginning of "list". The size is therefore 8 because of the alignment requirements of the double array.

Bit Fields: Structures may contain bit fields. A bit field is declared with a signed or unsigned int declaration followed by a colon, followed by a number or constant expression, as in

int h : 3;

The number after the colon indicates how many bits the field should occupy. In this case, h occupies three bits. This feature may be used to optimize memory use in a structure or to name bits inside structures.

As an extension to the ANSI standard, this version of C lets you declare bit fields to have an enum type, as well as signed and unsigned int types.

Bit fields do not have to be named. For example,

struct example {
   int a : 5;
   : 25;
   int b : 2;
};

includes an unnamed bit field of 25 bits. As a special case, a bit field of the form

: 0

forces alignment to the next alignment boundary suitable for an int value. You can also specify a type before the colon, as in

double : 0

which forces alignment to the next boundary suitable for that type. Specifying a type in this way is an extension to the ANSI standard.

Named and unnamed bit fields do not straddle word boundaries. If a field requires more bits than are left in the current word, the field begins with the next word.

Accessing Structure Elements: The elements of a structure may be accessed using the . operator. For example,

john.age

refers to the age element within the record named john.

mary.name[0] = 'M';

assigns 'M' to the initial element of the name character array within the record named mary.

Structures Containing Structures: A structure may contain another structure, as in

struct A {
    int a;
    struct {
        int c;
        double d;
    } b;
} xxx;

In this example, the internal structure has the element name b. Therefore,

xxx.b.c

refers to the element c within the structure b within the structure xxx.

As an extension to the ANSI standard, this version of C lets you have unnamed structure elements inside other structures, as in

struct A {
    int a;
    struct {
        int c;
        double d;
    };
} xxx;

This is almost like the previous example, except that the internal structure has no element name. The expression

xxx.c

refers to the element c inside the unnamed structure within xxx.

In simple examples like this, there's no good reason to have unnamed structure elements. They only become useful when you have complicated types containing many nested levels of structures and/or unions (discussed below). In such cases, omitting names on internal structures can help you shorten references to elements in those internal structures and make your code easier to read.

When structures contain unnamed structure elements, there may be conflicts between elements names. For example, consider

struct A {
    int a;
    struct {
        int c;
        double a;
    };
} xxx;

In this case, xxx.a is an ambiguous reference; it could refer to the int a at the beginning of the structure or the double a inside the unnamed structure. You should make an effort to avoid ambiguities of this type; in future versions of the compiler, they may be considered errors.

3.4.5 Unions

The simplest definition of a union is that it is a single object which may have a number of different interpretations. The declaration of a union type is similar to that of a structure: the declaration begins with the keyword union, followed by an identifying tag, followed by a description of the union. For example,

union multi {
   int fix;
   char c[10];
   float flt;
};

declares a union type with the tag multi. The entries in the brace brackets describe possible interpretations of such a union object. An object of the type multi may be an integer, a character vector of length 10, or a floating point number.

Variables of a union type may be declared in a way similar to the declaration of structure variables. For example,

union multi x, y, z;

declares variables x, y, and z of the union type multi.

The length allocated for a union object in memory is sufficient to hold the largest interpretation of the union. For example, an object of type multi must be long enough to hold a character array of length 10. This length is also sufficient to hold an integer or a floating point number. The alignment of a union object is chosen to be appropriate for all possible interpretations of the object.

Union objects always begin on a machine word boundary.

Selecting the Interpretation of a Union: You select a particular interpretation of a union object using the same notation as selecting structure elements. For example,

x.fix = 3;

indicates that the union object x of type multi is to take on the interpretation named fix (an integer), and that the object should be assigned a value of 3.

x.flt = 3;

indicates that x is to take on its interpretation named flt (a floating point number) and again assigned a value of 3. C automatically converts this to the floating point representation of 3 (3.0) because C performs automatic conversion of integers to floating point when necessary. This shows that the interpretation of a union object has a significant effect on the meaning of C statements.

Derived Union Types: Union arrays and pointers to union objects may be declared in the usual way. For example,

union multi t, *pt, tarr[5];

declares t as a union object, pt as a pointer to such a union, and tarr as an array of such unions.

The possible interpretations of a union object may include structures, as in

union types {
   struct person {
       char *name;
       int age;
   } t1;
   struct company {
       char *name;
       int capital, revenue, expenses;
   } t2;
};

This kind of object may be interpreted as two different types of structure. It has the length of the longest structure.

Initializing Unions: Unions may be initialized. The initializer always applies to the first given interpretation of the union. For example,

union fi {
   float x;
   int i;
} var = 3;

initializes var to a floating point 3.0 value because the first interpretation of the union is floating point.

Unions Within Structures: A structure may contain named union elements. As an extension to the ANSI standard, a structure may also contain an unnamed union element; such elements are treated in a comparable way to unnamed structure elements.

3.4.6 Functions

A function is a subprogram; functions contain all the blocks of executable code found in the program. Functions typically calculate results. For example, a "square root" function would calculate the square root of a number.

Programs must declare the type of result that a function calculates. For example,

float func(int x);

declares that the function named func returns a float value. The parentheses after func show that it is a function. The int x inside the parentheses says that the function takes a single int argument named x. As another example, consider

float x, y, fsin(float x);
x = 3.14;
y = fsin(x);

This declares two floating point variables x and y, and a floating point function fsin that takes a single floating point argument. The final assignment statement shows how the fsin function may be used.

Functions may return any type of value, including pointers, structures, and other derived types. For example,

char *fn(char c);

indicates that the function fn returns a pointer to a character.

Chapter 7 explains more about functions.

3.4.7 Enumerated Data

Enumerated data types are special types created by the user. The user begins by naming all the values that an object of a certain type may have. This is done in a declaration of the form

enum identifier { value, value, value, ...  };

as in

enum weekday {
    sun, mon, tue, wed, thu, fri, sat
};

The identifier after the keyword enum is the tag of the enumerated type. The values inside the brace brackets are the values that objects of this type may take on. Each of these values must be a valid C identifier.

Once the values of the data type have been listed as shown above, the program may declare variables having that data type, as well as pointers to such objects, arrays, and so on.

enum weekday today, *pday, week[7];

declares today as a variable of type weekday, pday as a pointer to weekday objects, and week an array of seven weekday objects.

The values of an enumerated type and variables of that type may be declared in a single statement, as in

enum month {
   jan, feb, mar, apr,
   may, jun, jul, aug,
   sep, oct, nov, dec
} year[12], *pmon;

Variable initializations are performed in the usual way, as in

enum month start = jan ;

The variable start is initialized to the value jan.

Enumerated Constants: The values named in the enumeration list are called enumerated constants. They are considered to be int constants and may be used wherever constants are allowed. Normally, the first value shown in the enum declaration has the value 0, the next 1, and so on. Thus feb+3 is equal to the integer 4.

The values in the enum declaration can be given specific integer values with a construction like

enum coins {
   cent=1, nickel=5,
   dime=10, quarter=25
};

Some of the values in the list may be given explicit integer equivalents while others are not. In this case, the ones with = enumerators are given the specified values and the ones without are assigned values by beginning at the specified values and adding one for each new item in the list. For example,

enum roman { I=1, II, III, IV };

gives I the value 1, II the value 2, and so on.

3.5 The void Type

The void type represents the absence of data. For example, a function that does not return a value has the type "function returning void". Similarly, the void type can be used in declarations of functions that take no arguments. Lastly, converting an expression to the void type effectively discards the expression's result.

(void *) is a "generic" pointer type. This kind of pointer does not point to any specific type of data; it can point to any type of data.

3.6 Type Groups

C uses special names for certain groups of related types.

Integral types
Signed and unsigned int and char types of all lengths (e.g. short, long), plus the enum types.
Floating types
float, double, and long double.
Arithmetic types
All integral and floating types.
Scalar types
All arithmetic and pointer types.
Aggregate types
All array and structure types.
Derived types
All aggregate, union, function, and pointer types.

3.7 Data Type Conversions

Various operations may cause implicit or explicit conversions from one data type to another. This section discusses the way in which these conversions are performed.

Characters to Integers
Signed and unsigned char values may be used in any expression where int values are valid. The char object is converted to the int value that is equal to the character's ASCII representation. If a signed char is negative, it is converted into the appropriate negative int; otherwise, it is converted to the appropriate positive number (or zero). Notice that unsigned char objects are converted to signed int objects with the same numeric value.

The same process holds for long char.

Bit Fields to Integers
Bit fields are converted to int values in the same way as char values. Note that an unsigned bit field is converted to a signed int unless the bit field is the length of an int.
Integers of Different Lengths
Since integers on this system's hardware all have the same length, no change occurs when converting longer integer types to shorter ones or vice versa.
Signed Integer to Unsigned
Positive integers are converted to the unsigned integer that has the same value. Negative integers are converted to the unsigned integer that is congruent to the int value, modulo 2**36. Since the GCOS8 hardware uses a 2's complement representation, there is no actual change in the bit pattern of the int item.
Unsigned Integer to Signed
If the value of the unsigned object can be represented as an int, the result of the conversion is equal to the original unsigned value. If the value of the unsigned object cannot be represented as an int, it is converted to the int value that is congruent to the unsigned value modulo 2**36. Again, this makes no change in the actual bit pattern of the value.
Floating Point to Double Precision
All floating point arithmetic in C is carried out using double precision. Thus floating point numbers that appear in an expression are converted to double precision.
Double Precision to Floating Point
When a double precision result is assigned to a single precision float item, it is rounded before it is shortened to single length.
Floating Point to Integer
Positive floating point numbers are converted to integers by truncation (so that 3.7 becomes 3). Negative floating point numbers are converted to integers by truncation towards zero (so that -3.7 becomes -3). Undefined results occur if the floating point number is too large or too negative to be represented as an integer.
Integer to Floating Point
Integers are converted to floating point numbers in the obvious way (e.g. 3 becomes 3.0). Some loss of accuracy may occur if the integer has more significant digits than may be represented in floating point format.

3.7.1 Standard Arithmetic Conversions

A large number of binary operators cause automatic conversions of operands and results. The conversions follow the process described below, known as the standard arithmetic conversions.

  1. Signed or unsigned bit fields and char operands are converted to int. (If an unsigned bit field has a length of 36 bits, it is converted to unsigned int.)
  2. float operands are converted to double. (Note: in other implementations of C, float operands may remain float.)
  3. If either operand is double, the other operand is converted to double if necessary. In this case, the type of the result is double.
  4. If either operand is unsigned long, the other operand is converted to unsigned long. In this case, the type of the result is unsigned long.
  5. longunsigned int, if a long can represent all values of an unsigned int, the operand of type unsigned int is converted to long; if a long cannot represent all the values of an unsigned int, both operands are converted to unsigned long.
  6. Otherwise, if either operand is long, the other operand is converted to long if necessary. In this case, the type of the result is long.
  7. Otherwise, if either operand is unsigned, the other operand is converted to unsigned if necessary. In this case, the type of the result is unsigned.
  8. Otherwise, both operands are converted to int and the type of the result is int.

4. Expressions

This chapter deals with expressions in C. These include numeric expressions, logical expressions, relational expressions, and so on.

4.1 Order of Evaluation

The various operations in an expression are evaluated according to a set order of precedence. For example, any multiplications in an expression are normally evaluated before any additions (as in conventional arithmetic). This standard order of operation may be altered using parentheses in the usual manner.

Some operations share the same precedence (e.g., addition and subtraction). The set of all operations with a given precedence forms a precedence class. When the time comes for operations of this class to be performed, they may be evaluated from right to left or left to right, depending on the class.

The ANSI standard allows some freedom for the order in which C evaluates an expression. For example, if A and B are expressions, A+B may be evaluated by calculating A first then B, or vice versa. In such situations, the order of evaluation is unspecified; the C compiler considers itself free to evaluate sub-expressions in the most efficient or convenient way it finds, even if the order of evaluation has side effects. As an example, consider

i = 1;
j = i + (i++);

In the second statement, if the compiler evaluates i++ first, j takes the value 3; if the compiler evaluates i++ second, j takes the value 2. In cases like this, you can't even use parentheses to dictate order of evaluation. If a particular order is necessary, the expression usually has to be broken into two separate expressions with the result of one stored in a temporary variable.

Arithmetic overflows may also affect the result of an expression. For example, you can construct expressions of the form A+B where an overflow occurs if you evaluate A first but not if you evaluate B first. Since the GCOS8 compiler normally performs overflow-checking, your program usually terminates if such an overflow occurs. However, if you turn off overflow-checking, the program keep on going when the overflow occurs and the order of evaluation may not be significant.

The sections in this chapter describe the operators in each precedence class. These sections are arranged from highest precedence to lowest. Subsections within the sections describe individual operators. However, before the operator descriptions can be given, it is necessary to speak about the difference between Lvalues and Rvalues.

4.2 Lvalues and Rvalues

We use the term Rvalue to refer to the contents of a location in memory or the value of an expression. The term Lvalue is used for any expression that refers to an address in memory. A modifiable Lvalue is an Lvalue that can be used in expressions to change the associated memory address. Some Lvalues are not modifiable, such as ones that have the const attribute (described in Section 6.4.1).

The names "Rvalue" and "Lvalue" come from their uses in an assignment statement like A=B. The expression on the LEFT of the assignment must be a modifiable Lvalue; it must refer to an address in memory that can be assigned a new value. Standard examples of Lvalues are variable names or expressions of the form *p where p is a pointer.

The expression on the RIGHT of the assignment A=B is treated as an Rvalue. For example, if B is a variable, the quantity which is assigned to A is the value of B, not the address of B.

Any expression that can be used as an Lvalue can also be used as an Rvalue. For example, the name of a variable can be used to supply the value of the variable (on the right hand side of an assignment) or the address of the variable (on the left hand side). Some expressions can only be used as Rvalues. For example, something like 2+3 is valid on the right hand side of an assignment, but

2 + 3 = x ;

is a syntax error.

The question of whether an expression is treated as an Lvalue or an Rvalue depends on the operator that uses that expression. For example, A+B uses the Rvalues of A and B; A=B uses the Rvalue of B and the Lvalue of A. The descriptions in the sections that follow explicitly tell when an operand is taken as an Lvalue. In all other cases, operands are taken as Rvalues.

4.3 Primary Expressions

Primary expressions have the highest precedence of evaluation. They are evaluated from left to right. Primary expressions have one of the following forms.

identifier
constant
string
( expression )
primary-exp [ expression ]
primary-exp ( expression-list )
primary-Lvalue . identifier
primary-exp -> identifier

4.3.1 Identifier Primary Expressions

Section 3.1 described what constitutes a valid identifier. An identifier is a primary expression, provided that it has been suitably declared before it is used. The type of an identifier is specified in its declaration.

An identifier with the type "array of type" may not be used as an Lvalue. Expressions with this type include array identifiers when they are not followed by subscripts, or multi-dimensional array references with fewer subscripts than dimensions. Such values are automatically treated as pointers to the first element of the appropriate array or sub-array (though they still may not be used as Lvalues). For example, consider

int x[30],y[10][20];
int *ip;
ip = x;      /* equivalent to ip = &x[0]; */
ip = y[5];   /* equivalent to ip = &y[5][0]; */

As the comments explain, the array name x is treated as a pointer to the first element in the x array; the reference y[5] is treated as a pointer to the first element in the y[5] sub-array.

Similarly, when an identifier is declared as the name of a function but the identifier appears without the parentheses that are used in a function call, the identifier is treated as a pointer to a function of the appropriate type.

4.3.2 Constant and String Primary Expressions

Chapter 2 described the form of valid constants and string literals. Integer and character constants have the type int or unsigned. Floating point constants with the suffix 'f' or 'F' are float; otherwise, they are double.

The appearance of a string literal in an expression creates an array of char containing the given characters plus a trailing '\0'; however, the value that is actually used in evaluating the expression is a pointer to the first character of this array. Thus when a string appears in an expression, it is treated as the type "pointer to char".

4.3.3 Parenthesized Expressions

Expressions in parentheses are regarded as primary expressions because the parentheses indicate high precedence of evaluation. The type and value of the parenthesized expression are the type and value of the expression inside the parentheses.

4.3.4 Subscripted Primary Expressions

A primary expression followed by an expression in square brackets denotes a subscripting operation. The primary expression before the brackets is generally a pointer to some data object; remember that an array identifier is actually a pointer. The expression inside the brackets is generally of type int. The resulting subscripted primary expression has the type of the data object to which the pointer points.

The intuitive meaning of the expression E1[E2] is subscripting as described above. However, the precise meaning of E1[E2] is

*( E1 + (E2) )

Consequently, E1 and E2 can be any expressions that make sense in the above operation, whether or not E1 is a pointer. For example, if a is the name of an array, a[1] and 1[a] are both valid and both refer to the same data object.

4.3.5 Function Primary Expression

The primary expression

primary-expression ( expression-list )

is a function call, as in sin(x). The primary expression before the parentheses must have the type of "function returning X" where X is a valid data type (e.g., "function returning float"). The result of the function call has type X. If the compiler encounters an undeclared identifier immediately followed by a left parenthesis, it assumes that the identifier refers to a function of type int. Thus int functions need not be declared explicitly, but functions of all other types must be. (NOTE: for good programming style, int functions should still be declared.)

The expression-list in the parentheses consists of a number of valid expressions separated by commas, as in

f(x, y, 2 + z)

These expressions are passed by value to the function which is being invoked (passing by value is explained in Section 7.2).

The expression-list in a function call may be null; it need not contain any elements. The parentheses must still be included as in

no_args()

Chapter 7 gives more information on how function calls work.

4.3.6 The . Operator

The . operator is used to access a specific element of a structure or a particular interpretation of a union. It creates a primary expression of the form

primary-Lvalue . identifier

The item before the . must be a primary expression that yields the Lvalue of a structure or a union. For example, if c is an element in a structure b which itself resides in a structure a, you can refer to the element as a.b.c.

The result of the . operator is an Lvalue referring to the named structure element or union interpretation.

4.3.7 The -> Operator

The -> operator is another way to refer to an element in a structure or an interpretation of a union. The operator is made from a minus sign "-" and a greater-than sign ">".

In a primary expression of the form

primary-ex -> identifier

the item before the -> must be a pointer to a structure or a union and the identifier after the -> must be the name of an element of the structure or an interpretation of the union. The resulting primary expression is an Lvalue referring to the named structure element or union interpretation. Thus the primary expression

E1->IDEN

is equivalent to

(*E1).IDEN

4.4 Unary Operators

Unary operators are evaluated after primary expressions have been computed. They are evaluated from right to left. Recognized unary operators are:

*Rvalue
&Lvalue
+Rvalue
-Rvalue
!Rvalue
~Rvalue
++Lvalue
--Lvalue
Lvalue++
Lvalue--
( type-name ) Rvalue
__alignof Rvalue
__alignof ( type-name )
sizeof Rvalue
sizeof ( type-name )

4.4.1 The Indirection Operator (*)

The argument following the unary * must be a pointer. If it is a pointer to a function, the result is a function designator which can be used in a function call. If it is a pointer to an object, the result is the Lvalue of the object to which the pointer points. If the type of the pointer is "pointer to X", the type of *pointer is X.

4.4.2 The Address Operator (&)

When the unary & is applied to an Lvalue, the result is a pointer to the memory location associated with that Lvalue. For example, &y is a pointer to the memory for the variable y. If the Lvalue's type is X, applying & gives the type "pointer to X". For any Lvalue expression E, *&E is equal to E.

4.4.3 The Unary Plus Operator (+)

The result of +A is just A. The operand must have an arithmetic type. If the operand is short, char, long char, or a bit field, the result is int. Otherwise, the result has the same type as the operand.

4.4.4 The Unary Minus Operator (-)

The unary minus operator returns the negative of a numeric operand. If it is applied to an item of type char, the character is first converted to int and then negated. If it is applied to an item of type unsigned, the result is unsigned, computed by subtracting the operand from 2**36.

4.4.5 The Logical Negation Operator (!)

The logical negation operator ! may be applied to any scalar type (arithmetic or pointer). The result is always int.

If the operand X is zero (or a null pointer), !X is the integer 1. Otherwise, !X is the integer 0.

4.4.6 The Bitwise Complement Operator (~)

The ~ operator yields the one's complement of its operand. In other words, the result has a 1-bit wherever the operand has a 0-bit, and a 0-bit wherever the operand has a 1-bit.

The standard arithmetic conversions are performed. If the type of the operand is int or (signed or unsigned) char, the type of the result is int. If the type of the operand is long, the type of the result is long. If the type of the operand is unsigned, the result is unsigned. Other types of operands are invalid.

4.4.7 The Auto-Increment Operator (++)

The expression

++Lvalue

is evaluated by incrementing the value of the Lvalue. Numeric arguments are incremented by the integer 1; pointer arguments are incremented by the size of the object to which the pointer points. Thus if a pointer into an array of structures is incremented with ++, it points at the next structure in the array. The result of the ++ expression is the incremented value of Lvalue. For example,

int i,j;
i = 1;
j = ++i;

is evaluated in the following way.

  1. The first statement assigns i the value 1.
  2. To begin evaluation of the second statement, i is incremented by 1. i now has the value 2.
  3. The new value of i is assigned to j.

When the ++ follows the Lvalue as in

Lvalue++

the result has the current value of the Lvalue. After this result has been obtained, the current value of the Lvalue is incremented as before. For example,

double *p1, *p2, v[10];
p1 = v;
p2 = p1++;

is evaluated in the following way.

  1. The first statement points p1 towards v[0].
  2. The second statement obtains the current value of p1 and saves this temporarily.
  3. Once the current value of p1 has been saved, p1 is incremented by an appropriate amount so that it points to the next element in the array, namely v[1].
  4. The previously saved value of p1 is assigned to p2. The result is that p1 points at v[1] and p2 points at v[0].

4.4.8 The Auto-Decrement Operator (--)

The auto-decrement operator -- works in a similar way to the auto-increment operator ++.

--Lvalue

decrements Lvalue by 1 if it is numeric. If Lvalue is a pointer, it is decremented by the size of the object pointed to. The decremented value is the result of the expression.

Lvalue--

takes the current value of the Lvalue as the result of the expression and then decrements the Lvalue as before.

4.4.9 The Cast Operation

The cast operation converts the result of an expression to a given data type. It has the general form

( type-name ) Rvalue

type-name may be one of the recognized fundamental types. For example,

(int) (3.14 + a)

converts the double result of the addition into int. type-name may also be more complex, as in

(struct X *) p

which converts p into a pointer to the structure type X. For further details, see Section 6.7.

The cast operator can be applied to any arithmetic type and convert to any other arithmetic type. It can also convert a pointer type to any other pointer type. However, the result of casting a pointer of type A into a pointer of type B may not be a valid pointer value of type B if the alignment requirements for A are less strict than B. For example, in

char *cp;
double *dp;
dp = (double *) cp;

the value assigned to dp may not be a valid double pointer. char values only have byte alignment, while double values need double-word alignment. Converting in the opposite direction (i.e. from a stricter alignment type to a less strict one) is guaranteed to work. For example, if you cast a double pointer to a char pointer and back, you get the original double pointer.

A pointer to a function of one type may be converted to pointer to a function of another type and back again without loss of information. However, if a converted pointer is used to call a function with anything other than the original type, the result is undefined.

Pointers may be cast into long integers and vice versa. The result of casting a long into a pointer may not be a valid pointer; for example, it might lie outside the program's address space. If a pointer is converted to long and back again, there is no loss of information.

The cast operation is only necessary when the standard arithmetic conversions do not yield a result of the proper data type. Proper use of function prototypes means that cast operations are almost never needed in function calls.

4.4.10 The __alignof Operator

The __alignof operator obtains the alignment of a data type or object. (Note that the operator name begins with two underscore characters.)

The argument of __alignof is either an expression or a parenthesized data type, as in

__alignof X
__alignof (double)

The result of __alignof is an int value that gives the multiple-byte boundary on which the data type or expression result is aligned. For example,

__alignof (char)

has the value 1, indicating that char values can be aligned on any byte boundary.

__alignof (double)

has the value 8, indicating that double values must be aligned on a 8-byte boundary.

__alignof is an extension to the ANSI standard, and therefore its use is non-portable.

4.4.11 The sizeof Operator

The sizeof operator returns the size of a data object, in bytes. In the case of complicated data objects such as structures, sizeof does not give the exact size of the object but rounds up to the next alignment boundary for objects of that type. For example,

sizeof (struct x {int y; char z} )

returns a value of 8, even though a record of type x only occupies 5 bytes of storage.

The first form of the sizeof operator is

sizeof expression

If expression is the name of an array, the result is the number of bytes in the entire array. If expression is a structure, the result is the number of bytes in the structure. If expression is a union, the result is the size of the longest interpretation of the union.

The sizeof operator may also have the form

sizeof ( type-name )

For example,

sizeof ( double )

returns the number of bytes in an arbitrary object of type double. For further information on type names, see Section 6.7.

The result of the sizeof operator is a type named size_t. This type is defined with a typedef statement in the standard header <stddef.h>. (The typedef statement is described in Section 6.3.5 and headers are explained in Section 9.1.1.) size_t is an unsigned integral type (which means that any expression using sizeof is evaluated with unsigned arithmetic).

The sizeof operator is useful for eliminating machine-dependent source code in programs. Note that sizeof is evaluated at compile time. During compilation, the compiler replaces the sizeof expression with the equivalent integer constant; hence, a sizeof expression is valid anywhere an integer constant is valid in C source code.

The operand of sizeof is not evaluated. Therefore

sizeof (i++)

calculates the size of the result of the expression, but does not actually increment i. Similarly,

sizeof func(arg)

determines the size of the value returned by func but does not actually call the function.

4.5 Multiplicative Operators

The multiplicative operators follow the unary operators in order of precedence. They are evaluated from left to right. The multiplicative operators are

Rvalue * Rvalue
Rvalue / Rvalue
Rvalue % Rvalue

4.5.1 The Multiplication Operator (*)

The binary * is used for normal multiplication. Each operand must have an arithmetic type, and the standard arithmetic conversions are performed.

Multiplication is associative, so expressions may be regrouped by the compiler. For example, in

A*B*C

the compiler may evaluate B*C first, even though operators in this precedence class are usually evaluated from left to right. Furthermore, if A, B, and C are expressions themselves, the compiler may evaluate these sub-expressions in any order.

4.5.2 The Division Operator (/)

The binary / is used for normal division. Each operand must have an arithmetic type, and the standard arithmetic conversions are performed.

When one positive integer is divided into another, the result is truncated towards zero. For example, 7/2 has a result of 3. For other types of integer division, the remainder has the same sign as the dividend. For example:

-7/2:   result  -3     remainder  -1
7/-2:   result  -3     remainder  +1
-7/-2:  result  +3     remainder  -1

No special notes need to be made about floating point division.

4.5.3 The Remainder Operator (%)

The binary % is used to return the remainder in an integer division. For example, A%B gives the remainder obtained when A is divided by B. Each operand must have an integral type, and the standard arithmetic conversions are performed.

In mathematical terms, A%B gives A modulo B when both are positive. To calculate A%B when A and/or B is negative, % is defined so that

(A/B)*B + A%B

is always equal to A (provided that B is non-zero).

4.6 Additive Operators

The additive operators follow the multiplicative operators in order of precedence. They are evaluated from left to right. The additive operators are

Rvalue + Rvalue
Rvalue - Rvalue

4.6.1 The Addition Operator (+)

The binary + is used for addition in several senses.

Numeric Addition: The most straightforward sense is conventional numeric addition. Each operand must have an arithmetic type and the standard arithmetic conversions are performed.

Numeric addition is associative, so expressions may be regrouped by the compiler. For example, in

A+B+C

the compiler may evaluate B+C first, even though operators in this precedence class are usually evaluated from left to right. Furthermore, if A, B, and C are expressions themselves, the compiler may evaluate these sub-expressions in any order.

Pointer/Integer Addition: You may also have additions that add an integral type to a pointer type. If p is a pointer to an object in an array, p+1 points to the next object in the array, p+2 the object after that, and so on. The result of adding an integral value to an array pointer is always a pointer to the same type of data object. If the pointer does not point into an array, the result of the addition is undefined (not meaningful in the context of the program).

Users should be cautioned that adding too large a value to a pointer may cause it to point beyond the end of the array into other regions of memory.

4.6.2 The Subtraction Operator (-)

The binary - is used for subtraction in several senses.

Numeric Subtraction: The most straightforward sense is conventional numeric subtraction. Each operand must have an arithmetic type and the standard arithmetic conversions are performed.

Pointer/Integer Subtraction: Values with an integral type may be subtracted from pointers in the same way that they may be added to pointers. For example, if p points to an object in an array, p-1 points to the array object which immediately precedes *p. If the pointer does not point into an array, the result of the subtraction is undefined.

Pointer/Pointer Subtraction: It is also possible to subtract two pointers, provided they have the same type and point to elements in the same array. The result of the subtraction is the difference between the subscripts of the elements to which the pointers point. For example, if you have

int arr[100],*p,*q;
p = &arr[4];
q = &arr[7];

then q-p is 3, since q points to arr[7] and p points to arr[4]. p-q would be -3.

The result of subtracting two pointers has a type named ptrdiff_t, defined with a typedef statement in the header <stddef.h>. (The typedef statement is described in Section 6.3.5 and headers are explained in Section 9.1.1.) ptrdiff_t is a signed integral type.

4.7 Shift Operators

The shift operators follow the additive operators in order of precedence. They are evaluated from left to right. The shift operators have the form

Rvalue << Rvalue
Rvalue >> Rvalue

4.7.1 The Left Shift Operator (<<)

The binary << shifts the bits of an integer value to the left. In the expression A<<B, the bits of A are shifted left by the amount B. Vacated bits are filled with zeros. For example, 007<<3 is 070.

Each operand must have an integral type. The right operand is converted to int. The type of the result is the type of the left operand. If the right operand is negative, or greater than or equal to the number of bits in the left operand, the result is undefined.

4.7.2 The Right Shift Operator (>>)

The binary >> is used to shift the bits of an integer to the right. In the expression A>>B, the bits of A are shifted right by the amount B. For example, 070>>3 is 007. If the left operand is signed, vacated bits are filled with the sign bit (i.e., an arithmetic shift). If the left operand is unsigned, vacated bits are filled with zeros (i.e., a logical shift).

Note: The ANSI standard does not specify whether right shifts should be logical or arithmetic. Therefore different implementations may use different algorithms for this kind of shifting. As noted above, GCOS8 C uses arithmetic shifting for signed values and logical shifting for unsigned.

Each operand of >> must have an integral type. The right operand is converted to int. The type of the result is the type of the left operand.

According to the ANSI standard, the result of >> is undefined if the right operand is negative, or greater than or equal to the number of bits in the left operand, the result is undefined. However, as an extension to the standard, this implementation lets you shift by any non-negative value up to and including 127.

4.8 Relational Operators

The relational operators follow the shift operators in order of precedence. They are evaluated from left to right, but this is seldom useful. The relational operators have the form

Rvalue < Rvalue
Rvalue > Rvalue
Rvalue <= Rvalue
Rvalue >= Rvalue

< stands for "less than". > stands for "greater than". <= stands for "less than or equal to". >= stands for "greater than or equal to".

The result of each relational operation is always int: 1 if the relation is true and 0 if the relation is false. For example, A<B equals 1 if A is less than B, and 0 otherwise.

Each operand of a relational operation may have an arithmetic type, and the standard arithmetic conversions are performed before the comparisons are made. This can lead to some unusual results. For example, consider

int i = -1;
unsigned u = 1;
    ...
if (i > u) ...

u is unsigned, so by the standard arithmetic conversions, i becomes unsigned as well. Since a negative int becomes a very large unsigned value, i compares greater than u, even though i is negative and u is positive. Avoid comparing signed and unsigned quantities if possible. If not, use explicit casts, as in

if (i > (int) u) ...

Relational comparisons can also be made between pointers. The result of pointer comparison is meaningful when the two pointers point into the same array. One pointer is greater than another when it points to a higher address. If P is a pointer to the last element of an array, the expression (P+1) can be compared to P, even though (P+1) is actually undefined.

Note that expressions like a<b<c are seldom useful. When this expression is evaluated, the first relation a<b is tested and a 1 or 0 returned. Comparing this 1 or 0 to c is seldom a useful operation.

4.9 Equality Operators

The equality operators follow the relational operators in order of precedence. They are evaluated from left to right. The equality operators have the form

Rvalue == Rvalue
Rvalue != Rvalue

== stands for "equal to". != stands for "not equal to". The following types of operands may be compared.

Like the relational operators, the equality operators return an int 1 if the specified relation is true and 0 if the relation is false.

Important Note: One of the most common programming errors in C is using the assignment = when you actually mean the equality operator ==. You should watch for this problem when you are writing and debugging programs.

4.10 The Bitwise AND Operator (&)

The binary & operator is used to "AND" together bits in operands with integral type. It takes the form

Rvalue & Rvalue

The standard arithmetic conversions are performed on the operands. The result has a 1-bit where both operands have a 1-bit, and a 0-bit in all other bit positions.

Because the bitwise & is associative, the compiler may regroup expressions.

Programmers often run into trouble with the order of evaluation for == and &. For example, you might write

if ( A == B & C ) ...

thinking it will be evaluated as

A == (B & C)

In fact, it is evaluated as

(A == B) & C

which is not usually what you want.

4.11 The Bitwise Exclusive OR Operator (^)

The binary ^ (caret) operator obtains the exclusive "OR" result of the bits in operands with integral type. It takes the form

Rvalue ^ Rvalue

The standard arithmetic conversions are performed on the operands. The result has a 0-bit in every position where the operands both have 0-bits or both have 1-bits; the result has a 1-bit in positions where one operand has a 1-bit and the other has a 0-bit.

Because the bitwise ^ is associative, the compiler may regroup expressions.

4.12 The Bitwise Inclusive OR Operator (|)

The binary | operator obtains the inclusive "OR" result of the bits in operands with integral type. It takes the form

Rvalue | Rvalue

The standard arithmetic conversions are performed on the operands. The result has a 0-bit in positions where both operands have 0-bits, and a 1-bit in positions where either operand or both have a 1-bit.

Because the bitwise | is associative, the compiler may regroup expressions.

4.13 The Logical AND Operator (&&)

The binary && operation is evaluated from left to right. The operation takes the form

Rvalue && Rvalue

Each operand must have a scalar type. The result of the operation is an int 1 if the two operands are non-zero and int 0 otherwise. Unlike the bitwise &, && always evaluates its operands from left to right; no regrouping occurs.

The second operand is not evaluated if the first operand is found to be zero. For example, in

(A && B++)

B is not incremented if A is zero.

Users should be careful not to confuse the bitwise & with the logical &&.

4.14 The Logical OR Operator (||)

The binary || operation is evaluated from left to right. The operation takes the form

Rvalue || Rvalue

Each operand must have a scalar type. The result of the operation is an int 1 if either of the two operands is non-zero and int 0 if both are zero. Unlike the bitwise |, || always evaluates its operands from left to right; no regrouping occurs.

The second operand is not evaluated if the first operand is found to be non-zero.

4.15 The Conditional Operation

Conditional expressions are evaluated from right to left. They have the form

Rvalue ? Rvalue : Rvalue

The first step in evaluating a conditional expression is evaluating the expression before the ?. If this expression is non-zero, the second expression is evaluated and its result is the result of the whole conditional expression. If the first expression is zero, the third expression is evaluated and its result is the result of the whole conditional expression. For example,

(A>B) ? A : B

checks whether A is greater than B. If this comparison is true, the result of the condition expression is the value of A; if the comparison is false, the result of the expression is the value of B. In other words, the conditional expression returns the maximum of A and B.

The first operand of a conditional expression should have a scalar type. The second and third operands must satisfy one of the following conditions.

In any conditional expression, one of the operands after the ? is evaluated and the other is ignored. The ignored operand is not evaluated.

4.16 Assignment Operators

Unlike many other languages (e.g., FORTRAN), there is no specific "assignment statement" in C. Instead, there are a number of assignment operators which may be used in normal expressions. Assignments in an expression are executed from right to left.

The recognized assignment operators are listed below.

Lvalue  =  Rvalue
Lvalue +=  Rvalue
Lvalue -=  Rvalue
Lvalue *=  Rvalue
Lvalue /=  Rvalue
Lvalue %=  Rvalue
Lvalue >>= Rvalue
Lvalue <<= Rvalue
Lvalue &=  Rvalue
Lvalue ^=  Rvalue
Lvalue |=  Rvalue

All of these require an Lvalue on their left and an Rvalue on their right. The Lvalue cannot have the const (described in Section 6.4.1). The value of any assignment expression is the value that is assigned to the Lvalue on the left. The type of the result is the type of the Lvalue, but is not an Lvalue itself.

4.16.1 Simple Assignment

Simple assignments use the = operator. Each operand may have an arithmetic type, in which case the right operand is converted to the type of the left operand.

If the left operand is a pointer, the right operand may be a pointer of the same type, a pointer of the (void *) type, or an integral constant expression with the value 0 (representing the null pointer). You may also assign any type of pointer value to a (void *) pointer.

In general, you may not assign a pointer value of one non-void type to a pointer of a different non-void type; you must explicitly cast the right operand to the type of the left operand or to (void *). However, you may assign a pointer to a type which does not have the const and/or volatile attribute to a pointer of the same base type with the const and/or volatile attribute. For example,

const int *cip;
int *pi;
  ...
cip = pi;

is valid. However, the converse

pi = cip;

is an error. In other words, you can make an assignment that adds the const and/or volatile attributes, but you cannot make an assignment that drops them. For more details on const and volatile, see Section 6.4.

If the left operand is a union or structure, the right operand must be a union or structure of the same type.

The GCOS8 C compiler also allows the following assignments:

pointer = integer
integer = pointer

In such cases, the assignment simply copies values without performing any conversions. This feature may not be portable to C compilers on other machines; furthermore, pointers with values assigned in such an assignment may cause addressing faults if used. The GCOS8 C compiler issues a warning message if you try such operations.

4.16.2 Compound Assignments

When the assignment operator has the form op= for some suitable binary operator op,

L op= R

is the same as

L = L op R

except that L is only evaluated once. For example,

A += 2

is equivalent to

A = A + 2

To understand what we mean when we say L is only evaluated once, consider

Arr[i++] += 1

and compare it to

Arr[i++] = Arr[i++] + 1

In evaluating the first assignment, i is only incremented once. In evaluating the second assignment, i is incremented twice.

Each operand in a compound assignment may have any arithmetic type, provided that the type is consistent with the operation being performed as part of the assignment. For example, the operands >>= must both have an integral type.

With += and -=, the left operand may be a pointer. In this case, pointer addition or subtraction takes place appropriately.

4.17 The Comma Operator (,)

The comma operator separates expressions. The operator groups left to right.

In an expression of the form

Rvalue , Rvalue

the left expression is evaluated and then discarded. The right expression is then evaluated; the type and value of the result are the type and value of this right operand. For example, in evaluating

a = (pi = 3.14 , 2 * pi)

the variable pi is assigned the value 3.14 and then 2*pi is evaluated. The result of 2*pi is assigned to a.

In contexts where a comma has another meaning (e.g., in a list of function arguments or in a list of variable initializations), comma operations must be enclosed in parentheses. For example,

sin( (pi=3.14, 0.5*pi) )

has only one argument, namely 0.5*pi. On the other hand,

func( pi=3.14, 0.5*pi)

has two arguments: 3.14 (the result of the assignment) and 0.5*pi. Note that this call to func may not evaluate its arguments in they order they appear within the parentheses. Therefore, there is no guarantee about the value of pi used in evaluating the second argument.

4.18 Constant Expressions

Constant expressions may be used in many contexts where a single numeric constant is usually expected: in specifying initialization values for static and extern variables, declaring the size of arrays, and enumerating cases in switch statements (see Section 5.10). A constant expression is made up only of integer constants, character constants, and sizeof expressions, using the unary operators

-   ~

the binary operators

+   -   *   /   %   &   |   ^
<<  >>  ==  !=  <   >   <=  >=

and the conditional operator ?:. Parentheses are allowed for grouping operations, but may not be used for creating function calls.

When constant expressions appear as initialization values for variables, you may use floating point constants and cast operations in addition to the operators listed above. You may also use the & operator to obtain the address of extern or static objects or the address of elements in an extern or static array (provided that the subscript for the element is given by a constant expression as described previously). Essentially, initialization expressions must yield a constant, or the address of a previously declared extern or static object plus or minus a constant.

5. Statements

This chapter deals with the statements of the C language. Statements control all of the actions performed by a program (except for assignment operations to initialize variables in declarations).

Before giving specifics about the various statement types, a few general remarks are in order.

5.1 Comments

Comments are not really statements at all, but it is convenient to deal with them at this time. The beginning of a comment is signaled by the characters /* and the end of the comment is signaled by */. Everything in between is ignored. The comment may stretch over several lines, as in

/*
 *  This
 *  is
 *  a
 *  comment
 */

Comments may be used anywhere that white space would normally appear, even in the middle of statements. Comments may appear inside or outside functions.

Comments may not be nested. For example,

/* This looks /* nested */ but it's wrong! */

In code, comments are considered white space. Therefore

a/*comment*/b

is equivalent to

a b

5.2 Expression Statements

The simplest type of statement takes the form

expression ;

where the expression is any valid expression made up of operators described in Chapter 4. The most common type of expression is one that performs an assignment, as in

a = b + 1;

Function call expressions are also common.

5.3 Compound Statements or Blocks

A block or compound statement may be used wherever a single statement is expected. Blocks consist of zero or more declarations followed by one or more statements, all enclosed in braces. For example, a typical block might be

{
    int temp;
    temp = a;
    a = b;
    b = temp;
}

The variable temp declared in the block is local to the block (i.e., it is only accessible within the block and is not available to statements outside the block). The variables a and b used within the block must be declared somewhere in the source code which precedes the block.

It is possible to transfer into the middle of a block using a goto statement. However, this is bad programming style and often leads to errors.

5.4 The if Statement

An if statement executes another statement, if a particular condition is true.

The first form of the if statement is

if (expression) statement

The given expression must have a scalar type. The expression is evaluated and if it is non-zero, the statement is executed. For example,

if (a) b=8;

executes the statement b=8; provided that a is non-zero. (Notice that the expression of an if statement may have a pointer type. The pointer is considered "zero" if it is a null pointer.)

The second form of the if statement is

if (expression) statement1
else statement2

Again, the expression must have a scalar type. The first step in executing this statement is the evaluation of the given expression. If the result is non-zero, statement1 is executed; otherwise, statement2 is executed.

ifs and elses may be nested. Each else is associated with the most recent else-less if. Below we give two examples; if-else pairing is shown by indentation.

if (...)
   if (...) statement;
   else statement;
else statement;
if (...) statement;
else
    if (...) statement;
    else statement;

Note that an if-else statement is a single statement even though it may contain several semicolons as shown above.

5.5 The while Statement

A while statement executes a statement or block repeatedly while a given condition is true. It has the form

while (expression)
statement

The expression must have a scalar type. If the expression is non-zero, the statement is executed. The expression is evaluated again after the execution of the statement and if it is still non-zero, the statement is executed again. This process is repeated until the expression evaluates to zero on some iteration. For example,

int i, a[10];
i = 0;
while (i < 10) a[i++] = 3;

repeats the statement

a[i++] = 3;

until i is no longer less than 10. In other words, it sets all the elements of a equal to 3.

Note that the expression is tested before executing the statement. If the expression is zero to begin with, the statement is not executed at all.

5.6 The do-while Statement

The do-while statement is also used to execute a statement or block repeatedly. It has the form

do statement
while (expression);

The expression must have a scalar type. The statement is executed and then the expression is evaluated. If the expression is non-zero, the statement is executed again and so on. For example,

int i, a[10];
i = 0;
do a[i++] = 3; while (i < 10);

repeats the statement

a[i++] = 3;

until i is no longer less than 10. Although it may look odd, the semicolon is required after the 3 and before the while, since the statement between do and while must be a complete statement (or statement block).

Note that do-while tests the expression after executing the statement. Thus the statement is executed at least once, regardless of whether the expression is zero or non-zero.

5.7 The for Statement

The for statement has the form

for (exp1 ; exp2 ; exp3) statement;

where exp1, exp2, and exp3 are valid expressions. Except for the action of the continue statement (described later), the for statement is equivalent to

exp1;
while (exp2) {
    statement
    exp3;
}

The first expression performs initializations before the while loop. The second expression is the test used in the while. The third expression is executed at the end of each loop, often incrementing a variable that counts iterations. For example,

for (i=0 ; i<10 ; ++i) a[i] = b[i];

initializes i to zero, then executes the given statement while i is less than 10. The ++i serves to increment i by 1 at the end of each iteration. The result of the above for statement is to copy the first 10 elements of vector b into vector a.

Any or all of the three expressions may be omitted in the parentheses. In particular,

for (exp1 ; ; exp3) ...

is equivalent to

exp1;
while (1) {
    ...
    exp3;
}

while (1) loops repeatedly because the expression (1) is never zero.

If either exp1 or exp3 is omitted, they are simply dropped from the while loop expansion.

5.8 The break Statement

The statement

break;

is used to "break out" of the smallest enclosing while, do-while, for, or switch statement. Control passes to the statement immediately following the while, do-while, for, or switch. For example,

for (i=0 ; i<10 ; ++i) {
    if ( b[i] == -1) break;
    else a[i] = b[i];
}

is much like the for loop above which copies 10 elements of b into a. However, if any of the b elements is equal to -1, the break breaks out of the for loop and the remaining b elements are not copied.

5.9 The continue Statement

The statement

continue ;

passes control to the "condition-testing" part of the smallest enclosing while, do-while, or for. When continue is found in a while, control returns to the top of the loop to test the given expression and determine if another loop is required. For example,

while (++i<10) {
    if (a[i] <= -1.0) continue;
    b[i] = (a[i] + 2.0) / (a[i] + 1.0);
};

calculates b[i] unless a[i] is -1.0, in which case it simply drops out of this iteration of the while loop and goes back to increment i to see if it is still less than 10. (Obviously the above code could have been written with a normal if statement; however, the example shows how continue behaves.)

When continue appears in a do-while loop, it passes control to the condition-testing part at the end of the loop. When it appears in a for loop, it passes control to the "incrementing" expression at the end of the equivalent while statement.

5.10 The switch Statement

The switch statement evaluates an expression and passes control to one of a series of statements depending on the value of the expression. It has the form

switch (expression) {
    case-list;
}

The given expression must have an integral type, and will undergo the standard arithmetic conversions if necessary (i.e. if it is a char or bit field). The case-list in the braces consists of one or more valid C statements or blocks. Any statement or block within the case-list may be preceded by one or several of the following labels.

case constant-expression:
default:

There may only be one default label inside any switch. The constant expression in a case label must have an integral type and will be converted to the same type as the expression following the keyword switch. It is an error for two case labels in the same switch to have constant expressions with the same value.

A case label of the form

case constant-expression:

is said to be satisfied if the constant-expression is equal to the expression following the switch. For example, if you have

switch (3) ...

it would satisfy such cases as

case 3:
case 1+2:
case 12/4:
/* etc. */

switch statements are executed in the following way.

  1. The expression following the keyword switch is evaluated.
  2. The first case label is examined. If this case is satisfied, execution begins with the first statement following the case label.
  3. If the first case is not satisfied, the second case is examined and so on through all the cases until one is satisfied.
  4. If no case in the block is satisfied, the program looks for a default label. The block of each switch statement may contain at most one default label. If such a label is found, execution begins with the first statement following the default.
  5. If no case in the block is satisfied and there is no default label, execution begins with the statement following the block. In other words, nothing inside the block gets executed.

Several case labels may prefix a single statement in the block. If the code associated with one case is being executed and another case or default label is encountered, execution continues with the statements that follow the new label; it does NOT break out of the block when a new label is found. The usual method for breaking out of a switch statement is the break statement.

Below we give an example of a typical switch statement. The char variable c contains an ASCII character going into the switch. If the character is '+', it is converted to a blank and the variable sign is set to 1. If the character is '-', it is left as is and sign is set to -1. If the character is anything else, sign is set to 0.

switch (c) {
  case '+':
    c = ' ';
    sign = 1;
    break;
  case '-':
    sign = -1;
    break;
  default:
    sign = 0;
}

5.11 The return Statement

The return statement lets a function return to its caller before reaching the last statement in the function. It also provides the means to return a value to the caller.

The simplest form of the statement is

return ;

When this statement is executed in a function, it returns to the function's caller. If the caller expected the function to return a result, the result is undefined.

The second form of the statement is

return expression ;

This returns the value of the expression to the function's caller. For example,

return (A>B) ? A : B ;

returns the maximum of A or B to the caller. If necessary, this return value is converted to the type declared for the function in which it appears. A return statement with an expression may not appear in a function returning the void type.

5.12 Labels

Any statement may be preceded by one or more labels of the form

identifier:

Such a label may be used as the target of a goto . Labels are local to the function in which they appear. A function may not contain two labels with the same name.

5.13 The goto Statement

The statement

goto identifier;

transfers control unconditionally to a statement labeled by the given identifier. The label must be in the same function as the goto but may be in a different block. For example, you can jump into the body of a while loop, although this is poor style.

5.14 The Null Statement

The null statement is simply

;

It performs no action. It is useful for labels that require a statement (as in

label: ;

in which the null statement has the label label) and in looping constructs that do not need any body. For example,

for (i=0 ; i<10 ; a[i++] = 0 ) ;

assigns zero to the first 10 elements of a.

6. Declarations

A declaration tells how one or more identifiers may be used. Identifiers may refer to types, functions, or data objects.

Declarations may appear at the beginning of any block, preceding the statements that describe the action of the block. Declarations may also appear outside blocks, either associated with a particular function or external to all functions.

A type declaration describes a particular type. For example, one kind of type declaration describes the elements of a structure or interpretations of a union. Another kind (known as a typedef declaration) gives a name to a particular type, after which the name can be used instead of the associated type. This is described later in this chapter.

A function declaration indicates that a particular identifier refers to a function. The declaration may also indicate the type of value returned by the function, specify a "scope" for the function, and describe the "formal parameters" of the function.

A variable declaration indicates that a given identifier refers to a particular data object. In this case, the identifier is commonly called a variable. The declaration may also specify a type for the data object, a "storage class", and an initialization value.

Declarations provide information for the translation of the C program, but they may also result in operations performed at execution time. For example, some declarations must allocate memory for variables and assign initialization values to those variables.

6.1 Use of Identifiers

Before we describe the syntax of C declarations, we must first discuss some important concepts related to the use of identifiers.

6.1.1 Scope of Identifiers

The scope of an identifier is the region of source code in which the identifier is recognized. There are four kinds of scope: function scope, file scope, block scope, and prototype scope.

The only kind of identifier that has function scope is a statement label. The label identifier is implicitly declared when it is used to label a statement. A label may only be used by goto statements that appear in the same function as the label. A goto statement referring to a particular label may appear before the statement that bears that label. All the label identifiers in a function must be unique; for example, you cannot have two different statements labelled A:.

If the declaration for an identifier appears outside all blocks, the identifier has file scope. The identifier is recognized in all source code from the end of the declarator that declares the symbol to the end of the source file where the declaration appears, as well as in any source files that include the source file or are included by the source file through preprocessor directives.

If the declaration for an identifier appears inside a block or in the list of parameter declarations that begins a function, the identifier has block scope. The identifier is recognized in all source code from the end of the declarator that declares the symbol to the closing brace that ends the block containing the declaration. If there is a declaration for the same identifier in some outer block or outside all blocks, the declaration in the inner block hides the outer declaration until the closing brace that ends the inner block. For example, consider

int i;
void func(void)
{
    float i;
    ...
}

Outside the function func, i refers to an integer, but inside func, i refers to an unrelated floating point number.

If the declaration for an identifier appears in the list of parameters of a function prototype that does not start a function definition, the identifier has prototype scope. This extends from the end of the declaration of the parameter to the end of the function declarator. Function prototypes are discussed in Section 7.1.

6.1.2 Linkage of Identifiers

When the same identifier appears in different scopes of a program, C must decide whether or not the different occurrences of the identifier refer to the same object or function. This means that C must decide how different occurrences of the same identifier are linked.

There are three different ways an identifier may be linked.

The linkage of a particular identifier is determined by the form and position of the declaration that declares the identifier.

6.1.3 Name Space of Identifiers

C lets you use the same identifier for different purposes inside the same scope, provided that it is possible to distinguish between the different uses of the identifier. For example, C lets you create a function that uses A both as a variable name and as a statement label. The two meanings of A can be distinguished easily: when the name appears in a goto statement, it is being used as a statement label; when the name appears in an expression, it is being used as a variable.

The ways in which identifiers may be used are divided into separate categories. These categories are called name spaces. C has several kinds of name spaces.

An identifier may be declared (implicitly or explicitly) with more than one meaning in a particular scope, provided that the meanings belong to different name spaces.

6.1.4 Storage Duration of Objects

If an object is declared inside a block and has no linkage, it has automatic storage duration. Each time execution enters the block that contains the declaration, a new instance of the object is created in memory. This happens whether the block is entered in the normal flow of execution or through a goto into the middle of the block. Automatic objects are discarded when execution leaves the block in any way (e.g. by normal termination, by using goto to jump out of the block, or by executing a break or return statement).

The next time execution enters the same block, a new instance of the object is created. Usually, this new object is not stored in the same memory location as the previous object, and it is not likely to have the same value as it held when the block previously finished execution. If the block is called recursively, there are several instances of the object in memory simultaneously.

Any object that does not have automatic storage duration has static storage duration. Such objects exist throughout the execution of the entire program and retain any values they are given until the program explicitly assigns new values.

6.2 The Format of Declarations

A declaration consists of zero or more specifiers followed by a declarator list, followed by a semicolon.

As an example, consider the declaration

extern int i[12], j=10;

extern is a storage class specifier, int is a type specifier, and i[12] and j=10 are declarators.

6.3 Storage Class Specifiers

Storage class specifiers control storage duration, scope, and linkage. A declaration may have at most one storage class specifier. The recognized storage class specifiers are

auto
static
extern
register
typedef

If a declaration does not have a storage class specifier, a default storage class is assigned, based on where the declaration appears. When the declaration appears inside a function, auto is assumed. When the declaration appears outside all functions, extern is assumed. Omitting the keyword extern in such a declaration has a special meaning, described later in this chapter.

If a function declaration has no storage class specifier, the function is assumed to be extern no matter where the declaration appears.

6.3.1 Automatic Declarations

Automatic or auto declarations may appear at the beginning of any block (i.e., any compound statement). An auto identifier has block scope, and the associated object has automatic storage duration.

When an auto declaration is encountered during execution, memory is allocated for the declared items. Initializations are also performed at this time.

Automatic variables declared in a block are discarded when the block finishes execution. The memory that was allocated to them is then made available for other purposes. If you attempt to access this memory location after the block has terminated (e.g., using a pointer that points to the location), the behavior is undefined.

6.3.2 Register Declarations

The register storage class specifier may be used anywhere auto is allowed. A register identifier has block scope, and the associated object has automatic storage duration.

Historically, the register specifier was interpreted as a hint that the declared object would be heavily used and that efficiency could be improved if the object was stored in a hardware register instead of main memory. There was no guarantee that the object would actually be stored in a register, since the implementation might need such registers for other purposes. If the object couldn't be stored in a hardware register, it would be treated as if it were a normal auto object.

Recently, the register specifier has been made available for more general optimization processes. Because the specifier once indicated that the object might be stored in a register, you may not use && to obtain the address of a register object. This guarantees that you cannot change the value of a register object through a pointer. Such knowledge can allow a C implementation to perform optimization operations, even if the object is not actually stored in a register.

6.3.3 Static Declarations

If the first declaration of a function in a source file contains the static specifier, the function has internal linkage.

If an object is declared static and the declaration appears outside all blocks, the object has static storage duration. If the declaration contains an initialization value, the object has that value when the program begins execution. The identifier will have file scope.

If an object is declared static and the declaration appears inside a block, the object still has static storage duration. The storage used by a static object in a block is not discarded upon completion of the block. This means that you can still refer to the value of this object (indirectly through a pointer) even after the block has terminated. The static object retains its value from one invocation of the block to the next.

If a static object has an initializer, it is only initialized once (before program execution begins). Even if the static object has block scope, it is not initialized each time the block begins execution.

6.3.4 Extern Declarations

A declaration with the extern specifier may appear inside a function or outside all functions. The extern specifier indicates that each item in the declarator list has an external data definition elsewhere in the collection of source files that make up the C program. An external data definition may take any of the following forms.

If the same source file contains several declarations of the same object outside all functions, the first one that contains an initializer is taken as the definition of the object. If none of the declarations has an initializer, the first declaration that does not contain a storage class specifier is taken as the definition of the object. If all of the declarations have the extern specifier, the definition of the object is assumed to be found outside the source file. In this case, the object has external linkage.

To give some examples, suppose a function contains the declaration

extern int y;

This says that y is defined somewhere outside of all functions in the program. The definition of y could appear in the same source file as the declaration or in a different source file. If the definition appears in the same source, it could have several forms.

static int y;

is a definition that says y has internal linkage and file scope. This definition would have to precede the declaration with the extern keyword. extern implies external linkage unless internal linkage has already been established, so there would be a conflict if a symbol had an extern declaration (external linkage) followed by a static declaration (internal linkage).

int y = 5;

has an initializer but not the keyword static. Therefore this definition would indicate that y has external linkage and file scope.

int y;

has no initializer. It serves as a definition for y provided it is the first declaration of y that appears outside all functions in this source file and there is no other declaration of y that has an initializer.

If all the other declarations of y also have the form

extern int y;

the actual definition of y must appear in another source file. That definition has an initializer, or else it does not have the keyword extern.

6.3.5 Typedef Declarations

Declarations with a storage class of typedef do not actually declare objects that can be stored in memory. Instead they define identifiers which can afterwards be used in place of type specifiers.

The form of a typedef declaration is

typedef type-specifier identifier;

as in

typedef float TEMPERATURE;
typedef struct {
    float real, imag;
} complex;

This defines the identifier TEMPERATURE as synonymous with float and complex as synonymous with the given structure. One can then use declarations of the form

TEMPERATURE fahren, celsius;
complex u[10];

to declare two floating point variables fahren and celsius, and an array of 10 elements, each a structure consisting of two floating point elements named real and imag.

Defined types can also refer to derived data types. For example,

typedef char STRING[30];
STRING s;

declares s as a 30-element array of characters.

typedef does not create new types; it merely sets up a synonym for an existing type. For example, a variable of type TEMPERATURE has exactly the same type as any other float object. There are no special compatibility rules for combining TEMPERATURE data with float data (as there might be in a language like Pascal).

6.4 Type Specifiers

A type specifier refers to a data type. The simple type specifiers are

char      short   int      long     signed    void
unsigned  float   double   const    volatile

The "derived" type specifiers refer to structure or union types, enumerated types, array types, pointer types, or named types created with typedef declarations.

A declaration may have more than one type specifier. The arithmetic type specifiers may be combined with each other in the ways described in Section 3.3. Arithmetic type specifiers may not be combined with derived types. The const and volatile specifiers may be used with any other type, simple or derived.

If a declaration does not have a type specifier, or if it only has const, volatile or both, an additional specifier of int is assumed. Thus

extern x;
extern int x;

are equivalent.

6.4.1 The const Specifier

The const type specifier indicates that an Lvalue is not a modifiable Lvalue. If you attempt to change such an Lvalue, you normally receive an error.

The address of a const object cannot be assigned to a pointer to a type without the const attribute, unless you use an explicit cast operation. For example, suppose you have

int *p;
const int i;

Then the operation

p = &i;

would be invalid. The reasoning is that you might use p to change the contents of i, even though those contents are supposed to be "constant". If you really want to make this sort of assignment, you must use an explicit cast, as in

p = (int *) &i;

This shows that you are aware you are performing an unusual action. The results of using a pointer created in this way are undefined.

If an aggregate (structure or array) object is declared const, every member of the aggregate is also considered const. For example, if a structure is declared to be const, each element of the structure is also regarded as const.

The const specifier is often found in pointer declarations, as in

const char *p;

This says that p is a pointer to a character, and the character has the const attribute. The pointer p itself is not constant and can be changed to point to any character. However, the character to which p points is constant and the program cannot perform operations like

*p = 'x';

to try to change the character. Even if p is aimed at a non-constant character, as in

char c;
p = &c;

you cannot use *p as a modifiable Lvalue.

The const attribute is commonly used when declaring function parameters. For example, if a parameter is declared

const char *s;

the function may not use the s pointer to change the contents of the character or character string to which s points. In other words, this use of const says that the function only looks at the object pointed at by s; the function does not try to change the object's value.

6.4.2 The volatile Specifier

An object declared with the volatile specifier may change without the program's knowledge or action. A good example of a volatile object is a system clock that is run by the hardware. Note that such a clock could also be regarded as const if the program was not allowed to change the time. This shows that a data object may be both volatile and const.

Declaring an object volatile tells the implementation to check the value of the object every time it is used. For example, an implementation is not allowed to store the value of the object in some temporary storage location in one statement and then use this stored value in a later statement, since the value of the original object might have changed in the meantime. The volatile specifier limits the number of "short-cuts" and optimizations that the implementation can perform when handling the object.

The address of a non-volatile object may be assigned to a pointer that normally points to volatile objects. The address of a volatile object may not be assigned to a pointer to a non-volatile type, except through an explicit cast. If an aggregate object is declared volatile, every member of the aggregate is also considered volatile.

6.4.3 Structure Specifiers

A structure type specifier has one of the following forms.

struct { element-list }
struct tag-identifier { element-list }
struct tag-identifier

An element-list consists of one or more declarations for the elements of the structure. They are stored in memory in the order they are listed. The optional tag-identifier is a name that can be used to refer to the structure type after it has been defined. For example, in

struct complex {
    double real, imag;
};
struct complex z1, z2;

the code defines a structure with the tag complex, then declares z1 and z2 to have the complex structure type.

Each structure specifier that has an element-list defines a new structure. Subsequent declarations of the structure in the same scope may use the tag-identifier but are given an error if they repeat the element-list.

A program may contain an incomplete specifier of the form

struct tag-identifier

before declaring an element-list for the structure, provided that the program doesn't need size of the structure. For example, this kind of specifier can be used when declaring a typedef name as in

typedef struct xyz S;

even if the contents of the structure xyz have not yet been declared. The same technique can be used when declaring a pointer to a structure type. The classic example is a structure like

struct link_list {
    int value;
    struct link_list *next;
};

The element next is declared as a pointer to a link_list structure even before the link_list structure has been fully defined. This construction lets you make a list of structures where each entry in the list has a pointer to the next entry in the list.

If a program uses an incomplete specifier as shown above, the program must eventually declare a complete specifier for the structure in the same scope.

A special case occurs when you want to use an incomplete structure specifier in an inner scope and there is a structure with the same tag in an outer scope. For example, suppose you have

struct ll {
    /* file scope */
    int fvalue;
    struct ll *fnext;
};
void func(void)
{
    struct ll {
        /* block scope */
        float bvalue;
        struct ll *bnext;
    };
    /* and so on */

Since the block scope ll structure has not been fully declared when the bnext element is declared, bnext is regarded as a pointer to a file scope ll structure. If you want bnext to point to the block scope structure, put a declaration of the form

struct ll;

inside the block, before the beginning of the full declaration of the block scope ll. This special declaration "masks" the outer scope ll so that any incomplete structure specifiers refer to the inner scope ll.

The scope of a structure begins at the end of its declaration. The scope of a structure element begins at the end of the element's declaration and extends to the end of the enclosing structure's scope.

6.4.4 Union Specifiers

A union type specifier has one of the following forms.

union { interpretation-list }
union tag-identifier { interpretation-list }
union tag-identifier

Union specifiers follow the same rules as structure specifiers.

6.4.5 Enumeration Specifiers

An enumeration type specifier has one of the following forms.

enum { enum-list }
enum tag-identifier { enum-list }
enum tag-identifier

The elements in the enum-list have one of the forms

identifier
identifier = constant-expression

and elements in the list are separated by commas.

The rules for using enum tags are the same as those for structure and union tags.

6.5 Declarators

A declarator consists of an identifier and an indication of how the identifier will be used (e.g. as a pointer, a function, or a normal variable). A declarator may also include an initializer for the identifier being declared. The possible declarator forms are

identifier
(declarator)
* declarator
* type-specifier-list declarator
declarator[ constant-expression ]
declarator()
declarator( parameter-type-list )
declarator( identifier-list )
declarator = initializer

The binding of the operators *, (), and [] are the same as for expressions.

When the declarator consists only of an identifier, the identifier refers to a variable of the type and storage class given by the specifiers.

A declarator in parentheses has the same meaning as an unparenthesized declarator. Parentheses are merely used to alter the binding of operators associated with the declarator. Examples of the use of parentheses are shown in later sections.

6.5.1 Pointer Declarators

A declarator of the form

*declarator

declares a pointer to an object of the type given by the type specifiers of the declaration. For example,

char *str;

indicates that str is a pointer to a char object.

In the form

* type-specifier-list declarator

type-specifier-list can only contain the type specifiers const and/or volatile (separated by white space). For example, we might have

char * const cons_ptr;

A later section of this chapter discusses the difference between this and

const char * ptr_to_const;

6.5.2 Array Declarators

A declarator of the form

declarator[ constant-expression ]

declares an array whose elements each have the type given by the type specifiers of the declaration. The number of elements in the array is given by the constant expression in the square brackets. For example,

char j[30];

declares an array named j with space for 30 characters. Since the subscript of the initial element is 0, the maximum subscript for j is 29.

An array declaration may omit the size of the first dimension if the size can be determined by counting initialization values, or if the size is irrelevant (i.e. when storage is not being allocated as in a declaration that includes the word extern). For example, a function might contain the declaration

extern int arr[];

to say that arr is an array whose size is established in some external data definition.

6.5.3 Function Declarators

A declarator of the form

declarator( parameter-type-list )

is called a function prototype. The elements of the parameter-type-list describe the arguments that the function accepts. Each element in the parameter-type-list is either a declaration of a function parameter, as in

double f(double x,double y)
char *strcpy(char *str1, const char *str2)

or a declaration that omits the parameter name, as in

double f(double,double)
char *strcpy(char *, const char *)

The only storage class specifier allowed in elements of a parameter-type-list is register.

As a special case, the keyword void can be used to indicate a function that takes no arguments, as in

unsigned rand(void);

Function prototypes may also use the notation "..." to indicate a variable argument list, as described in Section 7.1.1.

A function prototype may be used to begin a function definition or simply to describe the way a function is called. Prototypes which begin function definitions must specify parameter names in the parameter-type-list. The identifier names in the list have function scope and will be used as parameters referring to the argument values passed by the function's caller.

If a prototype does not start a function definition, any identifiers in the parameter-type-list have function prototype scope. This scope extends only to the end of the prototype.

Every prototype of a function must agree with the definition of the function in the number and types of the parameters, plus the use of the ellipsis notation "...". This includes agreement on the number of dimensions in arrays, and on the bounds of each dimension (including the first dimension if it is specified).

A declarator of the form

declarator()

declares a function that returns a value of the type given by the type specifiers of the declaration. For example,

double f();

indicates that the function f returns a double value. This declarator form is supported for historical reasons, but prototypes are preferred.

The form

declarator( identifier-list )

is an outdated way to begin the definition of a function. For example, you might see

double f(x,y,z)
double x,y,z;
{
    /* function definition */
};

at the start of a function definition. The identifiers in the parenthesized list after f name all the parameters of the function, in the order they will be passed by a caller. After this you see one or more declarations for the parameters, then the body of the function. If there are no identifiers in the identifier-list of a function definition, as in

unsigned rand()
{
    /* function definition */
};

the function takes no arguments. This declarator form is supported for historical reasons only, and again, function prototypes should be used instead.

6.5.4 Reading Declarations

C declarations can be difficult to understand. Two declarations which look almost the same can define very different data objects. You should therefore spend some time learning to read declarations.

The most important rule to remember is to pull declarations apart piece by piece, starting at the identifier and going right as far as you can, then returning to the identifier and going left as far as you can. If the declaration contains parentheses, you may have to go right then left several times.

We now give a few examples of declarations, emphasizing the format of items in the declarator list.

char c, ca[20], *cp;
The variable c contains a single character. ca is an array containing 20 characters, indexed from 0 to 19. cp is a pointer to a single character.
char *cpa[20];
We start by going right. cpa is an array with 20 elements. What are the elements? We go back to the identifier and go left. Each element is a pointer (because there is a "*" to the left of the identifier). What do the elements point to? Keep going left. They point to characters. Therefore, cpa is an array of 20 character pointers.
float ff(), *fpf();
ff is a function that returns a float value. To understand fpf, we go right, then left. fpf is a function, it returns a pointer, and the pointer points to a float value.
float (*ffp)();
The parentheses show that we look at (*ffp) first. Therefore, ffp is a pointer. Outside the parentheses, go right, then left. The thing that ffp points to is a function and the function returns a float value.
int (*ip[10])();
ip is an array of ten pointers to functions that return integers.
char stra[5][20];
stra is an array of five vectors with 20 characters each. It may be useful to think of this as an array of five strings, with each string 20 characters long. stra[0] refers to the first of these five strings. stra[0][19] refers to the last character in the first string.
const char *ptr_to_const;
ptr_to_const is a pointer that points to a character and the character cannot be modified (i.e. it has the const attribute).
char * const cons_ptr;
cons_ptr is not modifiable and it points to a character. Thus cons_ptr is a "constant" (unchangeable) pointer, while ptr_to_const points to a "constant" object.
long (*fxp)(void);
fxp is a pointer to a function that takes no arguments. The function returns a long value.
const double *diff(double x,int (*fp)(double));
diff is a function. The parameters of diff are a double value named x and a function pointer named fp. The function that fp points to takes one double argument and returns an int. diff itself returns a pointer to an unmodifiable double value.

6.6 Initializers

A declarator may include an initializer that assigns a value to the object being declared. One exception to this is that declarations for the parameters of a function may not contain initializations.

6.6.1 Static Duration Objects

Objects with static storage duration (i.e. static or external objects) may only be initialized with constant expressions. These constant expressions may include the & operator to obtain the address of a static duration object.

If a static duration object has no explicit initializer, the implementation initializes it as if the object (or each member of the object) is assigned the integer constant 0. This is not necessarily the same as setting all the bits of the object to 0. For example, if a float object is assigned the integer value 0, the value becomes 0.0 and the bit pattern for 0.0 may not be all 0-bits.

6.6.2 Automatic Duration Objects

An object with automatic duration may be initialized with any expression involving constants, function calls, and assignments. The expression may also contain previously declared identifiers that have been assigned values. An automatic initialization behaves exactly like an assignment statement. Therefore,

auto int i = 1;
auto int j = i++;

not only initializes i and j, but also increments the value of i at the same time that j is being initialized.

Note that the scope of a local variable begins at the end of its declarator but before any initializers. This means that the following coding trick

int i = 1;
float f()
{
    auto int i = i * 2;
        ...

does not work in ANSI standard code. In earlier versions of C, the scope of the local i did not begin until the end of the initialization, so the i in the initialization expression referred to the value of the external i. This is no longer true, since ANSI allows expressions like

auto int i = (i=3,i+3);

where the initializer itself contains an assignment. All uses of i in the above initialization refer to the local i being declared.

If an automatic duration object is not initialized, its value is undefined until a value is assigned in some other way. Unlike static duration objects, automatic duration objects are not initialized in any default way.

6.6.3 Initializing Scalar Objects

An object with scalar type is initialized by putting an = after the declarator form, followed by a single expression. This expression may be enclosed in brace brackets if desired. For example,

int i = 7;
int i = {7};

are equivalent. The value of the expression after the = is the value used to initialize the object.

6.6.4 Initializing Array Objects

The initializer of an array object is a list of initialization expressions. These must be constant expressions, even if the array has automatic duration. The expressions are separated by commas and the entire list is enclosed in braces, as in

float x[3] = {
    3.1415,
    2.0 * 3.1415,
    3.0 * 3.1415
};

It is not necessary to initialize an array in its entirety. For example,

char alpha[26] = {'a','b','c'};

initializes the first three elements of the array and says nothing about the rest. If you initialize some of the elements of any array, the rest of the elements are implicitly initialized to 0.

To write a declaration for a multi-dimensional array, picture the object as an array of arrays. Each element of the array is itself an array which is initialized by a list of initialization expressions in brace brackets. Thus the initializer is a list of lists. For example,

int oct[4][3] = {
    { 000, 001, 002 },
    { 010, 011, 012 },
    { 020, 021, 022 },
    { 030, 031, 032 }
};

initializes oct[1][2] to 012, oct[3][1] to 031, and so on. The first level of nesting refers to the first subscript, the second to the second subscript, etc.

Again, you do not have to initialize every element in a multi-dimensional array.

int upper_triangular[4][3] = {
    { 1, 1, 1 }
    { 1, 1 },
    { 1 },
};

only initializes some elements in some "rows" of the array.

Internal braces may be omitted when initializing multi-dimensional arrays, as in

int two_by_two[2][2] = {
    1, 2, 3, 4
};

In this case, initialization values are assigned to the array elements in row major order (with the rightmost subscript varying fastest). In the above example, you would have

two_by_two[0][0] == 1
two_by_two[0][1] == 2
two_by_two[1][0] == 3
two_by_two[1][1] == 4

When an array declaration contains an initializer the size of the leftmost dimension may be omitted. In this case, the size is calculated from the number of initialization expressions provided. For example,

int vec[] = { 0, 1, 2, 3 };

creates an array with four elements because there are four initialization values.

int matrix[][3] = {
    { 0, 0, 0 },
    { 1, 1, 1 }
};

creates a 2 by 3 array, because the number of initialization values produces two "rows".

6.6.5 Initializing Character Arrays

In addition to the usual way of initializing arrays, you may initialize an array of char values with a string constant. For example,

char arr[10] = "abcd";

assigns 'a' to arr[0], 'b' to arr[1], and so on. A '\0' character is placed in the array element which follows the element that gets the last character of the string. In the above example, a[3] is assigned 'd' and a[4] is assigned '\0'. If the size of the character array is exactly the number of characters in the string literal, the '\0' is not added. An error is issued if the string literal has more elements than the given array.

Note the difference between the above declaration and

char *p = "abcd";

which creates a literal string and points p towards the first character. The values of the elements of arr may be changed; the characters to which p points may not be changed because they belong to a string literal.

6.6.6 Initializing Structure Objects

An automatic duration structure object may be initialized by a single expression whose value has the same type. For example,

auto struct X  A = B;

is valid if B is a structure of type X and the elements of B have already been assigned values (e.g. in an outer scope).

Otherwise, a structure initializer is similar to an array initializer: a list of constant expressions separated by commas and enclosed in brace brackets. The values of the expressions are used to initialize the elements of the structure in the order they are given. For example, you might have

struct person {
    char name[40];
    int age;
} john = {
    "John Smith", 30
};

If a structure contains another structure, nested brackets are used in the same way they are used to initialize multi-dimensional arrays. The same principle applies with arrays of structures, as in

struct person list[] = {
    {"John", 30},
    {"Mary", 36},
    {"George", 54}
};

This creates an array of three elements, each of which is a person structure.

As with multi-dimensional arrays, brace brackets inside brace brackets may be omitted to get one long list of initialization expressions. In this case, expressions are taken from the list as needed. For example, the declaration above could have been written

struct person list[] = {
    "John", 30, "Mary", 36, "George", 54
};

It is not necessary to initialize all the elements of a structure. For example,

struct person boss = {
    "Gwen"
};

initializes the name element of the structure but not the age.

6.6.7 Initializing Union Objects

When an initializer is specified for a union object, the first interpretation of the union is used. For example, in

union if {
    float x;
    int i;
} sample = 3;

the float interpretation is used. Therefore the 3 is converted to 3.0 before it is assigned.

6.7 Type Names

Type-names are used in cast operations, in function prototypes, and with the sizeof operator. Type-names are specified as declarations without object identifiers. They consist of a type specifier (e.g. int) followed by an abstract declarator. Such declarators take the following forms:

null
(abstract-declarator)
*abstract-declarator
abstract-declarator()
abstract-declarator[ constant-expression ]

null means that no declarator is specified. In order to avoid confusion with function declarators, the declarator inside parentheses in abstract-declarator may not be null.

From these possibilities, one can construct type-names which reflect every data type that may be constructed in C. For example,

  char          /* character */
  char *        /* pointer to character */
  char [10]     /* array of 10 characters */
  char *()      /* function returning ptr to char */
  char (*)()    /* ptr to function returning char */
  char (*)[10]  /* ptr to array of 10 characters */
  char *[10]    /* array of 10 ptrs to character */
  char * const  /* constant character pointer */

7. Program Structure

This chapter describes the layout of C programs. This ties together all the separate elements discussed in preceding chapters.

A C program consists of a series of definitions. Some define objects, some define data types, and some define the functions that do the actual work of the program.

7.1 Function Definition

A function definition begins with a declaration of the function. This declaration should give the type of the function's result, its storage class, and the names of the function's parameters. If the type is omitted, int is assumed. If the storage class is omitted, extern is assumed.

The parameter names may be supplied in two ways.

In both forms, no parameter can have the same name as a type defined in a typedef statement.

Format (b) above is "deprecated" by the ANSI standard, which means it may not be supported in later updates. The form may be seen in older C programs, but new programs should define all functions with prototypes. We therefore use prototype declarations for all our function definitions.

The body of a function is enclosed in braces and follows the prototype or parameter declarations that begin the function. Typically, the function body begins with declarations for any automatic variables used in the function. After the necessary declarations come the actual statements of the function.

As an example,

int chkstr(const char *s,int c)
{
    int i;
    for (i=0; *s != c ; ++i)
        if (*s++ == '\0') return -1;
    return i;
}

defines a function named chkstr. The parameters are named s and c. These represent a character pointer and a character passed when chkstr is called by another function. The const in the declaration of s indicates that the function does not use s to change any data. The character pointer is assumed to point to a string whose end is marked with a NUL ('\0').

chkstr determines if the argument character represented by c appears in the string indicated by the character pointer. If the character is found, chkstr returns the offset of the character from the beginning of the string. Thus chkstr returns 0 if the character is the first character of the string, 1 if it is the next character, and so on. If chkstr reaches the '\0' at the end of the string and has not yet found the character, chkstr returns -1.

The parameters of a function may not be declared with any storage class except register. They have function scope. If they are declared without a type specifier, int is assumed.

If a function definition begins with a function declaration of the form

specifiers function-name()

the function takes no arguments. This is equivalent to

specifiers function-name(void)

7.1.1 Variable Argument Lists

Most C functions are defined to take a fixed number of arguments. However, there is a way to define functions that take different numbers of arguments from call to call. For example, you might define a function that determines the maximum value in a list of integer arguments. You could pass this function two integers, three integers, four integers, or more.

C also lets you define functions that have a fixed set of arguments plus a set of arguments whose number varies from call to call. For example, a function may require some arguments all the time, and may also accept some optional arguments as well. The best known example of this kind of function is printf, which always takes a format string and may take other arguments too. For a description of printf, see Section 9.2.10, or "expl c lib printf".

If a function takes a variable number of arguments, its definition begins with a prototype whose parameter list ends in an ellipsis "...". For example, the prototype for printf is

int printf(const char *format, ...)

printf takes one argument that is always present, a string called format. It may also take an indefinite number of other arguments.

Section 9.5 explains how to access the arguments in a variable argument list.

7.2 Argument Passing

All function calls pass arguments by value. This means that a function is passed copies of the argument values, not the actual arguments specified by the caller. If the function changes the value of one of its parameters, this change does not affect the caller's arguments. For example, consider the function

void fakeswitch(int a,int b)
{
    int temp;
    temp = a;
    a = b;
    b = temp;
}

What happens if we call this function with

fakeswitch(X,Y);

The fakeswitch function is passed copies of the values of X and Y. The actions that fakeswitch performs on the parameters a and b do not affect the arguments X and Y, because fakeswitch only works with copies.

A function that is passed a pointer to an object can use this pointer to change the value of the object. For example, consider

void trueswitch(int *a,int *b)
{
    int temp;
    temp = *a;
    *a = *b;
    *b = temp;
}

If we now make the call

trueswitch(&X,&Y);

the effect is much different. The trueswitch function receives copies of pointers to X and Y. Using these pointer values, trueswitch can affect the real values of X and Y. Thus the call to trueswitch exchanges the values of X and Y.

7.3 Effect of Prototypes on Function Calls

If the definition of a function is in the scope of a prototype for the function (for example, if the function definition starts with a prototype), every call to that function must be in the scope of a semantically equivalent prototype. "Semantically equivalent" means that the type of the function and of each parameter named in the prototype must have the same meaning as the corresponding types in the function definition. For example, in

typedef int ABC;
float f(ABC arg);
     ...
float f(int a)
{
    /* function body */
};

the prototype for f and the definition prototype are semantically equivalent, even if they are not identical. The parameter names in the prototypes do not matter, and ABC is just another name for int.

7.3.1 Argument Conversion Rules

When a function call is in the scope of a prototype of the function, all specified argument values are converted to the type given in the prototype. For example, in

void f(float x);
    ...
f(0);

the int constant 0 is automatically converted to a float 0.0F before being passed to the function. Similarly in

void g(int *p);
    ...
g(0);

the int 0 is converted to a null integer pointer.

If a function call is not in the scope of a prototype for the function, the following argument conversion rules are applied.

ORIGINAL TYPE          CONVERTED TO
-------------          ------------
float                    double
char                     int
short                    int
signed char              int
signed short             int
unsigned char            unsigned
unsigned short           unsigned

Other types are not changed. For example, in the call

char c = 'a';
float x = 3.0;
    ...
func(c,x);

the value of c is automatically converted to int before it is passed and the value of x is automatically converted to double.

When a function definition does not begin with a prototype, the function parameters undergo the same argument conversions. For example, consider

double f(x)
float x;
{
    /* function body */
};

Even though x is declared to be float, the corresponding argument value is treated as if it were double. The expectation is that there will be no prototype used when the function is called, so argument values undergo the argument conversions. Thus the parameters should be changed to reflect those conversions.

Finally, the argument conversion rules apply when a prototype contains "..." to indicate a variable argument list. Since the prototype does not indicate types for arguments in the variable part of the list, those arguments are converted according to the argument conversion rules. For example, in

float x;
   ...
printf("%f",x);

the value of x is converted to double before it is passed, since x corresponds to the variable argument list in the prototype

int printf(const char *format, ...);

While the default argument conversions are sometimes unavoidable, consistent use of prototypes eliminates most of the confusion of automatic argument conversion.

7.3.2 Passing Derived Types

If a function call contains an argument which is the name of an array, the value passed to the function is a pointer to the first member of the array. Effectively, an argument value whose type is "array of type" is automatically converted to "pointer to type". For example, in

int a[10];
   ...
f(a);

f is passed an int pointer to the first member of a. Because this happens, a function parameter declared as "array of type" is treated as "pointer to type". For example, the prototypes

int f(char *s)
int f(char s[])

are equivalent. The first dimension of an array parameter need not be declared, as shown above.

Note: The ANSI standards committee has recently begun discussing the possibility of giving the two parameter forms

char *s
char s[]

slightly different meanings. It is too early to predict whether this will actually happen; however, we recommend that you use the

char *s

format for the time being, for passing both pointers and arrays. This format is the most frequently used, so its meaning is not likely to change.

If a function call contains an argument which is the name of a function, the callee is passed a pointer to the named function. Thus an argument value whose type is "function returning type" is converted to "pointer to function returning type".

Structures and unions may be passed as argument values and returned as function results. If a function expects to be passed a union value, the argument that is passed must be a union of the same type. It is not enough to pass a value of a type that matches one of the interpretations of the union.

7.4 Return Values

If an expression is specified in a return statement, the expression is automatically converted to the type that the function returns (specified at the beginning of the function definition).

If a function call is within the scope of a declaration for the function, the result of the function is assumed to have the type given by the declaration. If a function call is not within the scope of a declaration, the result of the function is assumed to be int.

7.5 Flow of Control

When a function is called, execution begins at the start of the block that makes up the body of the function. Memory is allocated for any auto or register variables declared at the beginning of the function block. If any of the declarations have initializers, initializations are performed as if they were assignment statements appearing at the beginning of the block. After these initializations, execution proceeds to the first statement in the function block.

Execution of a function continues statement by statement until some action passes control out of the body of the function. This can happen in several ways.

7.5.1 The main Function

Program execution consists of functions calling other functions. Obviously, there must be one function that is invoked to start the whole process. The first user-written function that is invoked is called main. (Some internal functions are executed before main is invoked.) Every working C program must have a main function.

The support software begins executing the program with the equivalent of a function call to main. When main reaches its final statement or issues a return, program execution terminates. Another way of terminating execution is to call the exit library function anywhere in the program.

Any function can call any other function in the program (except for static functions that appear in a different source file). In particular, functions can call themselves recursively. Functions can also call main (in which case, completion of main only returns to its caller, instead of terminating program execution).

7.6 Program Parameters

There are two ways in which you may define your main routine. The simplest is

int main(void)
{
    /* function block for "main" */
}

With this definition, main has no parameters. The second way is traditionally written

int main(int argc, char *argv[])
{
    /* function block for "main" */
}

or equivalently

int main(int argc, char **argv)
{
    /* function block for "main" */
}

You may use different names instead of argc and argv, but almost no one does. argc and argv are known as program parameters.

argv is an array of pointers that point to strings. These strings contain the arguments that were given on the command line that invoked the C program. For example, if a program was invoked with

prog filename word=value

the strings pointed to by argv would be

"prog"
"filename"
"word=value"

argc is the number of elements in the argv array. argc is always greater than zero. By convention, argv[argc] is always a NULL character pointer.

The string argv[0] represents the program name; in the above example, this is prog.

The strings pointed to by argv[1] through argv[argc-1] are called program arguments. These can be used to set options for the program and to provide other kinds of information.

All the strings pointed to by elements of argv can be modified by the program.

If you want a particular command line argument to contain white space characters, the argument must be enclosed in single or double quotes. For example, a command line of the form

prog "This contains blanks"

would result in an argv vector of pointers to

"prog"
"This contains blanks"

7.7 Program Status

As noted above, main is declared as a function that returns an int value. If the program actually terminates by main executing a return statement that returns an int value, the value is returned to the system as the status of the program. The same thing happens when you call the library function exit and pass exit a status value.

8. Source Code Preprocessing

C's preprocessing facilities modify C source code in the translation phase. These facilities allow for text and macro substitution, conditional translation, and the inclusion of other source files.

8.1 Preprocessor Symbols

The preprocessor facilities make use of several special identifiers. These names are replaced with other values in preprocessing phases. Each identifier begins and ends with two underscore characters.

The identifier __LINE__ stands for the line number of the current source file line. __LINE__ is replaced by the decimal integer constant that is one greater than the number of new-line characters read up to the point where __LINE__ appears. The value of __LINE__ is automatically set by the preprocessor as it reads the source file. It can also be set artificially by the #line directive (described in Section 8.2.7).

The identifier __FILE__ stands for the name of the current source file. Wherever it appears in the source code, __FILE__ is replaced by a string literal containing the appropriate file name. The file name can be changed artificially with the #line directive.

The identifier __DATE__ stands for a string literal of the form

"Mmm dd yyyy"

giving the date of translation. Mmm is the first three letters of the month, as in Jan, Feb, etc. dd is the current day of the month; if this number is less than 10, the first character is a blank. yyyy is the current year.

The identifier __TIME__ stands for a string literal of the form

"hh:mm:ss"

representing the time of translation of the source file (on a 24-hour-clock).

The identifier __STDC__ is the decimal constant 1. It indicates that a particular implementation of C conforms with the ANSI standard. Since this implementation conforms with the ANSI standard, the value of __STDC__ is always 1.

None of these symbols may be defined with #define or undefined with #undef.

8.2 Preprocessor Directives

Preprocessing directives control C's preprocessing facilities. These directives appear as single lines in the C source code (although there is a way to extend a directive to more than one line, described shortly). These lines are independent of the rest of the source code, and can be inserted in the middle of other lines containing normal source code. The last section of this chapter explains the order in which preprocessing directives are executed.

Only blanks and horizontal tab characters are allowed as white space within preprocessor directives. Directive lines may begin with any number of white space characters, but the first character that is not white space must be #. After the # may come more white space, then a keyword indicating what kind of directive it is. The ANSI standard keywords are

define   elif     else    endif
error    eval     if      ifdef
ifndef   include  line    pragma
undef

Some directives take more information after the keyword, while others do not. Details are given in the directive descriptions in later sections.

Preprocessing directives normally end at the first new-line character after the #. If you wish to break a directive over more than one source line , put a backslash character (\) immediately before each new-line character inside the directive. A backslash before a new-line tells the preprocessor to discard both the backslash and the new-line. For example,

#define printint(A) printf("%d\n",\
                           A)

is equivalent to

#define printint(A) printf("%d\n",A)

The same principle applies to normal code as well.

x = "abc\
def";

is equivalent to

x = "abcdef";

To let you make source files more readable, ANSI C lets you have source lines that only contain the # character and white space. These are called null directives. For example, you might write

#
# define SAMPLE "This stands out more clearly"
#

to make the #define directive stand out more clearly. Null directives are simply discarded during preprocessing.

8.2.1 The #define Directive

The simplest form of the #define directive is

#define identifier char-sequence

where identifier is a normal C identifier and char-sequence is any character sequence that does not contain a new-line. Note that there is no semicolon; the directive is terminated by the new-line character at the end of the line.

The #define directive instructs the preprocessor to replace the specified identifier with the given string wherever the identifier appears in the source file. We will call this sort of identifier a manifest. As an example,

#define VECSIZE 20

defines a manifest named VECSIZE. After this has been defined, you may use the identifier in source code, as in

float vector[VECSIZE];

During preprocessing, the identifier VECSIZE is replaced by the text 20. The preprocessor does not interpret this text as a number: it is only a sequence of characters. A later parsing phase recognizes 20 as an integer constant.

The #define directive can also have the form

#define name(name, ..., name) char-sequence

where each name is a valid C identifier and char-sequence is any character sequence not including a new-line. There must not be white space between the first name and the opening parenthesis "(".

This form of the directive defines a macro. The name of the macro is the name immediately before the "(". The names inside the parentheses are the parameters of the macro. The character sequence after the parentheses is the body of the macro.

Macros are called in source code the same way as functions: the name of the macro, followed by argument values enclosed in parentheses. The preprocessor replaces the macro call with the body of the macro, as given in the #define statement that created the macro. Wherever a macro parameter appears in the body of the macro, the preprocessor replaces the parameter name with the corresponding argument value. For example,

#define DIAG(A,B) {A,0}, {0,B}
int iden[2][2] = {DIAG(1,1)};
int negi[2][2] = {DIAG(-1,-1)};
int zero[2][2] = {DIAG(0,0)};

shows the definition of a macro named DIAG and its use. The preprocessor changes the above code to

int iden[2][2] = { {1,0}, {0,1} };
int negi[2][2] = { {-1,0}, {0,-1} };
int zero[2][2] = { {0,0}, {0,0} };

The number of arguments given in the macro call must match the number in the macro definition. Arguments in the macro call are merely token strings separated by commas. For example, suppose MAC is a macro. Then

MAC(A B C,D E F)

has TWO arguments: the token sequences A B C and D E F. If a token contains commas, the token should be enclosed in double quotes or parentheses to make sure that the comma is not taken as a delimiter for the argument list.

Whenever a manifest or macro is replaced by the associated text, the result is scanned again to see if it contains other manifests or macros. This allows you to #define something in terms of other #defined items. However, if the text that replaces a macro or manifest contains a reference to the same macro or manifest, the reference is not replaced. In other words, you cannot #define something to act recursively. If you try

#define MAC(X) MAC(X+5)
    ...
MAC(3)

the preprocessor makes one replacement, to get

MAC(3+5)

but does not try to expand MAC again.

Character constants and string literals are not scanned for manifests or macros. For example, in

#define CAT 9
   ...
char s[] = "A cat has CAT lives";

the CAT inside the string literal will not be changed.

Similarly, strings that are given in a macro body are not examined for formal parameter names. For example, in

#define LIVES(WHO,N) "A WHO has N lives"

the WHO and N inside the string will not be changed.

The scope of a macro or manifest extends from the end of the #define directive to the end of the source file, plus any files included by the source file, plus any files that include the source file. However, the manifest can be "undefined" with the #undef directive described in Section 8.2.2.

An identifier that has been defined as a macro or manifest may not be redefined with another #define directive, unless the second definition is identical to the first. To be identical, macro definitions must have identical parameter lists. For example,

#define add(A,B) A+B
#define add(X,Y) X+Y

may have the same behavior, but are not identical definitions. Therefore the second definition would be regarded as an error. If you want to change the meaning of a #defined item, you must first use #undef to undefine the identifier. After that, a new #define directive is valid.

The result of replacing a macro or manifest is never treated as a preprocessor directive, even if it looks like one.

The ANSI standard uses the term "macro" both for macros that are defined with parameters and for manifests. We prefer to distinguish the two because they are used in different ways.

Because macro parameters are not recognized inside strings that appear in macro definitions, a special notation is needed if you want a macro argument value to be placed inside a string. When the body of a macro contains a # character followed by a parameter name, the sequence is replaced by a string literal containing the value of the corresponding macro argument. For example, if we define

#define SAMPLE(A) #A

then the macro call

SAMPLE(xyz)

changes into

"xyz"

The #A construct is replaced by the macro argument enclosed in double quotes. As another example,

SAMPLE(A B)

changes into

"A B"

In situations like this, where an argument value contains white space characters, the white space characters are converted to a single space in the resulting literal string. White space before the first token and after the last token is deleted. Therefore

SAMPLE(    A       B    )

produces

"A B"

If the argument value contains character constants or string literals that contain double quotes or backslashes, suitable backslashes are added to maintain the original sense of the special characters. For example,

SAMPLE(A "abc" B)

produces

"A \"abc\" B"

8.2.2 The #undef Directive

The directive

#undef identifier

"undefines" the given identifier. This means that a previous meaning set by a #define directive is discarded. The scope of the identifier ends at the #undef directive instead of continuing to the end of the source file. It is valid to undefine an identifier that is not currently defined.

8.2.3 The #if Directive

The #if directive tells the C compiler to discard a section of source code if a particular condition does not hold. It is commonly used as shown below.

#if constant-expression
  /* one set of statements */
#else
  /* another set of statements */
#endif

If the constant expression given in the #if directive is non-zero, the compiler uses the source code between the #if and #else and discards the source code between the #else and #endif. If the constant expression is zero, the compiler discards the source code between the #if and #else and uses the statements between the #else and the #endif. For example, consider

#define SITE_A 1
#if SITE_A==1
char site_name[] = "SITE_A Name";
#else
char site_name[] = "***Unknown Site***";
#endif

The string site_name is initialized to different values depending on the value of the manifest SITE_A. This lets you write a program's source code to apply to various sites. To change sites, you just have to change the preprocessor definition of SITE_A.

You may omit the #else directive after an #if. In this case, the statements between the #if and #endif are only compiled if the constant expression in the #if directive is non-zero.

#if constructions may be nested, as in

#if exp1          -+
   /*stuff1*/      |
                   |
#if exp2     -+    |
   /*stuff2*/ |    |
#else         |    |
   /*stuff3*/ |    |
#endif       -+    |
                   |
   /*stuff4*/      |
#else              |
   /*stuff5*/      |
#endif            -+

The directive

#elif constant-expression

may be used in place of

#else
#if constant-expression

in nested #if constructions. This usually makes the code more readable. The whole construction only needs one #endif, as in

#if A
  ...
#elif B
  ...
#elif C
  ...
#else D
  ...
#endif

When evaluating the constant expressions in #if and #elif directives, all integer constants are converted to the long type. The expression in a #if directive must not contain a sizeof operator, a cast operation, or an enumerated constant. However, it may contain defined expressions (described below).

If the constant expression contains manifests or macros, they are replaced with their defined values before the expression is evaluated. Identifiers that have not been defined using #define are replaced with the text 0.

8.2.4 The defined Expression

The expressions

defined identifier
defined ( identifier )

can be used in preprocessor directives to determine if an identifier has been defined as a macro or manifest.

defined(X)

has the value 1 if the name X is currently defined as a macro or manifest, and has the value 0 if X has never been defined with #define, or if it has been undefined with #undef since its last definition.

8.2.5 The #ifdef and #ifndef Directives

The directive

#ifdef identifier

is equivalent to

#if
defined(identifier)

The code following the #ifdef is compiled if the given identifier is defined. The directive

#ifndef identifier

is equivalent to

#if !defined(identifier)

and is therefore the opposite of #ifdef.

You may freely nest #if, #ifdef, and #ifndef constructions. #else and #elif directives may follow #ifdef and #ifndef in the same way they are used with #if. #ifdef and #ifndef constructions end at an #endif.

8.2.6 The #include Directive

When the preprocessing facilities find a directive of the form

#include "filename"

the line is replaced by the entire contents of the specified file. For example,

#include "xxx"

is replaced by the contents of a file named xxx. The implementation searches for this file beginning in the directory that contains the original source file. If the file is not found there, the search continues through any directories named in Include= options on the compiler command line, then through a sequence of standard "include" directories under C_G8_SS/8CL3.2/INCLUDE.

If the directive takes the form

#include <filename>

as in

#include <stdio.h>

the implementation immediately searches through any directories named in StandardInclude= options on the compiler command line, then through the standard include directories. A file specified in this way (inside angle brackets) is called a header file.

It is valid to have a directive of the form

#include manifest

provided that the defined meaning of the manifest has one of the two previous forms of #include. For example, you could have

#define FILENAME "myfile"
#include FILENAME

Similarly, you may write

#include macro(arg,arg,...)

if the final result has one of the two accepted forms of the #include directive.

A file that is obtained via #include may itself contain #include directives. In this way, #include directives may be nested.

The last line in an included file must end in a new-line.

8.2.7 The #line Directive

A directive of the form

#line number string

sets the value of __LINE__ to the given number and __FILE__ to the given string. This effectively makes the compiler begin numbering lines at the given number, and makes it believe that the name of the source file is the one given by string. For example,

#line 30 "newfile"

makes the compiler believe that the next source line is line 30 and the current input file name is newfile.

The string argument may be omitted; if so, __FILE__ is not changed.

The two arguments for #line may be macros or manifests. In this case, the arguments are expanded and then used.

8.2.8 The #error Directive

A directive of the form

#error message

issues the given message as part of a compiler diagnostic message. The message may be any sequence of tokens. #error also generates a compilation error, so that the compilation fails. For example,

#ifndef A
#error A is undefined
#endif

issues an error message if the macro or manifest A is not defined.

8.3 Implementation Specific Directives

In addition to the standard ANSI directives explained above, this version of C supports a number of other directives. These directives are not generally portable to other versions of C. All of these may be expressed as #pragmas if desired. For example, you may write either

#warn message
#pragma warn message

Although the directives that follow can be written as #pragmas, we recommend that you use the non- #pragma versions in non-portable code; that way, if you do try to port the code, you will receive error messages and be warned about the problem. This is the reason why we describe these in a separate section rather than putting them in with other #pragmas.

8.3.1 The #warn Directive

A directive of the form

#warn message

is similar to an #error directive, in that it prints out a compiler diagnostic message. The difference is that #error generates an error status causing the compilation to fail, while #warn just causes a warning. The message may be any token sequence.

The #warn directive is an extension to the ANSI standard.

8.3.2 The #equate Directive

The #equate directive has the form

#equate identifier text

where the identifier is a valid C identifier and text is any sequence of non-white space characters extending to the end of the input line. This directive states that the given text should be equated to the identifier in the list of extern items for the program.

For example, suppose that the program wishes to call an assembler routine named c.call. This cannot be done directly with a call of the form c.call() because c.call is not a valid C identifier (the "." is not accepted in names). Invoking the function can be done indirectly via

#equate newname c.call
    ...
newname();

In the list of extern variables that the compiler creates, c.call is equated to newname. Thus every reference to newname is replaced with a reference to c.call. In this way, c.call can be called indirectly.

Typically, the identifier in #equate is declared or defined somewhere in the source file that contains the #equate; if so, the #equate directive should precede the first declaration. The text (alternate name) in #equate may not be used as an identifier anywhere else in the program, except in another #equate directive.

If the identifier is not declared or defined somewhere in the source file, the #equate directive has no effect.

#equate directives are transitive. For example, the directives

#equate a b
#equate b c

have the side effect of equating a with c.

Note that there is no semicolon on the end of the #equate directive line.

This directive is an extension to the ANSI standard.

8.3.3 The #alias Directive

#alias C_name true_name alt_name

establishes alternate names (aliases) for a C function or object.

The first argument (C_name) is a valid C identifier. Typically, the source file contains a declaration or definition for this name.

The second argument (true_name) is considered to be the true name of the function or object associated with C_name. All references to C_name in this source file are replaced by references to true_name. However, true_name does not have to be a valid C identifier. For example, it may contain characters that are not valid in C identifiers.

The third argument (alt_name) is an alternate name associated with the function or object. The alternate name does not have to be a valid C identifier. This argument may be omitted if it is not needed.

If the first argument is defined in the source file that contains the #alias directive, all three names are externally visible. If the first argument is only referenced, all such references are changed to refer to the second (true) name.

As an example, consider

#alias c_rtn c.rtn c$rtn

Suppose this source file contains a definition for a function that the file calls c_rtn. The valid C name c_rtn is used throughout the file. However, all references to c_rtn are changed to refer to c.rtn. The alternate name c$rtn is an alternate name for the function. All three names are externally visible and refer to the same function.

As a second example, consider

#alias c_obj c.obj c1_obj

and assume that the source file only references c_obj; it does not contain a definition for c_obj. In this case, all references to c_obj and c1_obj are changed into (external) references to c.obj.

Suppose a source file defines a symbol A and wants the symbol to be externally visible under both the names A and B. With

#alias A B

both names become externally visible. Inside the resulting object module, all references to A are automatically changed into references to B. The directive

#alias A A B

also makes A and B refer to the same function or object. However, in this case, all references to A or B are changed into references to A because its name appears in the true_name position.

It is not necessary for the source file to contain a declaration or definition for the C name that appears in an #alias directive. In this case, the #alias directive has no effect.

A source file is allowed to contain the same #alias directive several times.

This directive is an extension to the ANSI standard.

8.3.4 The #eval Directive

This directive has the form

#eval symbol expression

It associates the value of the expression with the given symbol. The expression must be a constant expression. It may contain macros, but the macros must yield constant expressions. For example,

#define PI 3.14159
#eval PI2 2*PI

This assigns the numeric value 6.28318 to PI2. This is not the same as

#define PI2 2*PI

#eval assigns the appropriate numeric value; #define assigns the text 2*PI. If you use #eval, the value of PI2 will not change once it is set. If you use #define, PI2 will change if PI changes.

All integer values are converted to long for the purposes of calculation, and all floating point values are converted to long double.

If the expression contains any token that doesn't have a value at preprocessing time (for example, the name of a variable), the preprocessor uses a value of zero for that token.

8.3.5 The #secondary Directive

#secondary name

indicates that the given name should be created with a secondary SYMDEF rather than a primary one.

The given name may be an external identifier or a name defined in an #equate or #alias. For example, you could have

extern int a;
#secondary a
#equate b c
#secondary b
#secondary c

Notice that the #secondary directive may be applied to either the first or second argument of #equate or #alias.

8.3.6 The #aligned Directive

#aligned

may be used immediately before the definition or declaration of a function that returns a pointer result, particularly a (void *) pointer. It indicates that the pointer is suitably aligned for casting into any other pointer type.

For example, this directive precedes the prototype of malloc to indicate that malloc returns a pointer that can be cast into any other type. Without this directive, the compiler would generate a warning message indicating that the pointer might not be able to be cast into other types.

The compiler assumes you are correct and that the result is properly aligned. The compiler generates optimized code which will fail if the result is not properly aligned.

The #aligned directive is an extension to the ANSI standard.

8.3.7 The #noreturn Directive

#noreturn

is used immediately before the definition of a function that does not return (for example, a function that calls exit). It prevents warning messages that might otherwise be generated.

The compiler assumes you are correct about the function not returning, and generates code accordingly.

This directive is an extension to the ANSI standard.

8.3.8 The #optresult Directive

#optresult

is used immediately before the definition or declaration of a function that returns an "optional" result.

A result is considered optional if it is not central to the use of the function. For example, the result of printf is optional. The #optresult directive indicates that the program can safely ignore the result of the function.

This directive is an extension to the ANSI standard.

8.3.9 The #used Directive

#used NAME NAME ...

indicates that the named symbols should be considered "used" by the program. This avoids warning messages in code where some symbols are declared but not used. If any of the NAMEs are declared to be external, #used also creates SYMREFs for those NAMEs, to force corresponding SYMDEFs to be loaded when the program is being linked.

A second form

#used

may be put in front of source code that is apparently unreachable (for example, unlabeled code immediately after an unconditional goto). Normally, the compiler does not generate any object code for source code that is unreachable. However, #used tells the compiler to generate object code anyway, on the assumption that the user has created specialized assembler code that somehow accesses the unreachable instructions. The #used directive also suppresses the warning that is usually issued for unreachable code.

The #used directive is an extension to the ANSI standard.

8.3.10 The #varargs Directive

#varargs

is used immediately before the definition or declaration of a function that takes a variable number of arguments.

#varargs N

indicates that at least N arguments must be specified in every call to the function.

#varargs N M

indicates that at least N arguments must be specified, and that the first M arguments should be type-checked, to make sure that the type of value passed in a function call matches the type of argument expected. (Note that this form of #varargs is no longer needed, since the use of "..." in a function prototype provides the same information.)

#varargs printf

can be used immediately before the declaration of a function whose arguments are similar to printf. Such a function should have the general format

type func(args,format,...);

where args are zero or more required arguments, format is a printf-style format string, and "..." is the usual indication for a variable argument list. The #varargs directives tells the compiler to check the format string and the corresponding arguments for validity, in the same way that the compiler checks printf arguments to make sure that their types match the types of the placeholders inside the format string.

The directive

#varargs scanf

works the same way, except that it treats the format string like a scanf-style string and checks for validity with the arguments that follow.

Note that both the scanf and printf directive forms require the format string to be the last argument before the "...".

The #varargs directive is an extension to the ANSI standard.

8.3.11 The #argsused Directive

#argsused

may immediately precede a function definition. It indicates that no messages of the form

Argument X not used

should be generated for that function.

This directive is an extension to the ANSI standard.

8.3.12 The #notreached Directive

#notreached

indicates that the current position in the source code is unreachable. For example, consider the following code.

void leave(int i)
{
    exit(i);
}

int f(int i)
{
    if (i > 0) return 5;
    leave(0);
#notreached
}

exit is a C library function that terminates a C program. Since the leave function calls exit, the position marked by #notreached can never be reached. If this position was not marked, the C compiler would think that there was a possibility f would terminate without returning a value. The compiler would normally give a warning message to this effect, but #notreached indicates that the warning is unnecessary.

When the C compiler encounters a #notreached directive, it stops generating code until the next code construct that can be reached in some way (for example, a statement label or the statement following the end of the block). This means that if the #notreached directive was incorrectly used (in front of code that could actually be reached), the generated code will be incorrect.

The #notreached directive is an extension to the ANSI standard.

8.3.13 The #copyright Directive

#copyright TEXT

is one way of adding a copyright notice to your source code. The TEXT may be any string. #copyright has no effect on the way your program is compiled; in particular, it does not add any sort of copyright notice to the object code output.

The #copyright directive is an extension to the ANSI standard.

8.3.14 The #title Directive

#title TEXT

specifies the title of this program. The TEXT may be any string. The TEXT in #title will be used in the title field of the $OBJECT card associated with the compiled object program.

The #title directive is an extension to the ANSI standard.

8.3.15 The #version Directive

#version TEXT

specifies the version of this program. The TEXT may be any string. The first six characters of the TEXT in #version will be used in the "ttldat" field of the $OBJECT card.

The #version directive is an extension to the ANSI standard.

8.3.16 The #idempotent Directive

The directive

#idempotent

may be used anywhere in an #include file to indicate that the file does not need to be included more than once. If the preprocessor recognizes an attempt to include a file that was previously marked as idempotent, the preprocessor does not try to read the file again.

This directive can speed up compilation by reducing the number of times a popular #include file is read and processed. Note that the preprocessor simply remembers the name of the file that contained the #idempotent and doesn't try to read that file again; it's possible that the file will be read again anyway if it's referenced under a different name.

If it's important that you don't process the file more than once, you have to do more than just #idempotent. For example,

#ifndef FILE_INCLUDED
#define FILE_INCLUDED
#idempotent
   /* body of include file */
#endif

makes sure that the contents of the file are only processed once, even if the file is read more than once.

The #idempotent directive is an extension to the ANSI standard.

8.3.17 Special Directives in Standard Headers

The standard headers (<stdio.h>, <string.h>, etc.) contain several special directives which are extensions to the ANSI standard. These are only intended for use in the standard headers; they are explained here because you may be curious about them.

The new directives are needed because the compiler now has built-in information about the symbols that are defined in the standard headers. The compiler doesn't have to read the headers themselves; it just has to be told to use the information that normally appears in the headers.

#protoset header

turns on all the definitions which are associated with a particular header. The header argument is given without its ".h" suffix. For example,

#protoset stdio

turns on all the definitions from <stdio.h>.

#proto_hide symbol

tells the compiler to forget the definition of a particular symbol. For example,

#proto_hide malloc

discards built-in information about malloc.

#proto symbol

turns on the definition of a single symbol. Note that this does not reverse the effects of #proto_hide. Once #proto_hide tells the compiler to discard built-in information, there's nothing that can bring that information back.

8.3.18 The #pragma Directive

The #pragma directive is provided by the ANSI standard to support directives that may not be found in other C compilers. The directive has the form

#pragma text

In this implementation of C, all #pragma directives correspond to other directives which are extensions to the ANSI standard. This means that any preprocessing directive of the form #word can be written as a #pragma, and vice versa. For example,

#varargs N
#pragma varargs N

have the same behavior. The difference between the two forms is that other compilers should not issue an error measure if they find a #pragma they don't recognize, while they typically do give an error for other unrecognized directives.

As a result, we recommend that you use the #word form for directives that are crucial to the behavior of your program, and the #pragma form for ones that do not have a significant effect. Then when you port the code to another compiler, the compiler will ignore the insignificant #pragma directives, and warn you about the crucial unsupported #word ones.

Note: This compiler does issue a diagnostic message if it encounters an unrecognized #pragma directive; the same may be true of other compilers. However, the message is only a warning. It does not affect the success of the compilation.

The following #pragma directives are supported:

#pragma aligned
#pragma noreturn
#pragma optresult
#pragma used NAME NAME ...
#pragma used
#pragma varargs
#pragma argsused
#pragma notreached
#pragma copyright TEXT
#pragma title TEXT
#pragma version TEXT
#pragma idempotent
#pragma protoset header
#pragma proto_hide symbol
#pragma proto symbol

8.4 Trigraphs

Trigraphs let you represent special characters in source code if you cannot enter the characters in the normal way (for example, if you are working on a terminal which doesn't have the characters on the keyboard). They are similar to escape sequences; however, trigraphs are handled in the preprocessing phase while escape sequences are converted while parsing. In other words, trigraphs are used to represent characters in the source code, while escape sequences are used for characters in actual data (string and character constants). Most programs use escape sequences; only people who don't have full ASCII terminals need trigraphs.

All trigraphs begin with ??. Below we list the recognized trigraphs and the source code characters they represent.

??=    #
??(    [
??)    ]
??/    \
??`    ^
??<    {
??>    }
??!    |
??-    ~

If C finds two question marks followed by another character, and the combination is not recognized as one of the above trigraphs, the combination is just put into source code as is. For example, ??. remains two question marks followed by a dot.

If you want to use a literal character sequence that would normally be interpreted as a trigraph, write the second question mark as '\?'. For example,

?\?=

is interpreted as two question marks followed by an equal sign, not as the trigraph for '#'.

8.5 Input Concatenation

When two or more string literals appear in input, separated only by white space, they are concatenated into a single string constant. For example,

x = "a" "b" "c";

is equivalent to

x = "abc";

When the symbol ## is found in the definition of a #define directive, the two tokens on either side of the symbol are concatenated into one token, if possible. For example, if you have

#define POINT(A) 0. ## A

the macro call POINT(6) turns into 0.6. The ## operator may not appear as the first or last token in a definition. For example,

#define SOMETHING(A) ## A

is invalid.

If the two tokens cannot be concatenated into a single valid token, the concatenation does not take place. For example, if a #define directive contains

1 ## x

the result is

1 x

because 1x is not a valid token. However,

x ## 1

yields the identifier x1.

8.6 Translation Phases

The order in which preprocessing operations are carried out has a significant effect on the compilation of source code. The following list divides the compilation process into separate phases.

  1. Characters in the physical source file are mapped into ASCII, if necessary. Trigraphs are replaced by single-character internal representations.
  2. Wherever a backslash is immediately followed by a new-line, the two characters are deleted, joining physical source lines into logical ones.
  3. The source file is decomposed into preprocessing tokens and sequences of white space characters (where comments are treated as white space). Each comment is replaced by a single space. New-line characters are retained.
  4. Preprocessing directives are executed and macro invocations are expanded. Headers or files obtained with #include are processed beginning with the first step above.
  5. Escape sequences in character constants and string literals are converted to single characters.
  6. Adjacent string literals are concatenated.
  7. White space characters separating tokens are no longer significant. Preprocessing tokens are converted into normal tokens. Unsuccessful conversion of a preprocessing token is equivalent to violation of a syntax rule. The resulting tokens are parsed and appropriate object code is generated.

The linking phase follows compilation. All external data and function references are resolved. Library elements are obtained to satisfy external references to functions and objects not defined in the user-written code. The translated parts of the program are joined to form a single program "image" that contains all the information required for execution of the program.

9. The C Library

Much of the work of C programs is handled by the library functions provided as part of the C software package. This chapter examines the most important features of the C library. We do not discuss every function in the library, nor do we give full details of every function discussed. You can get full descriptions of the functions in the C library using the EXPLAIN system. For example, "expl c lib printf" explains the printf function. References of this form abound in this chapter.

9.1 Library Concepts

In order to describe how C library functions work, we must begin by explaining some of the fundamental concepts that underlie the C library.

9.1.1 Headers

A header is a file supplied as part of the C package. A header may contain any or all of the following.

The information stored in a header is obtained with an #include directive of the form

#include <name>

where name is the name of the header whose information you want. For example,

#include <stdio.h>

obtains the information from a header named "stdio.h" (the header that is commonly required for I/O operations). All of the headers required by the ANSI standard have names that end in ".h".

Each header is a file containing C code. #include directives obtain code from these header files and insert it in program source files that need the information.

Normally, you should put any required header #include directives at the very beginning of a source file so that the header information is available to all the code in the file. It is not necessary to include a header more than once in a given scope, but it is not an error if you do so.

9.1.2 Functions and Macros

The prototype of any library function is declared in one and only one header. For this reason, if you want to call a particular library function, you should #include the header that declares the function's prototype.

An ANSI implementation is allowed to define any library routine as a macro. When this is the case, the program may not declare the routine as if it were a function; that causes an error. Because of this possibility, programs should never explicitly declare prototypes for library functions. Instead, they should #include the appropriate header. The explain files tell which library routines are actually implemented as macros.

You may issue an #undef directive that "undefines" the name of any library routine, as in

#undef getchar

This discards any macro definition for the routine name. If you now call the routine, you get a version that is implemented as a function. In other words, there must be a function definition for every library routine, even if there is also a macro definition. If you #undef the macro, your program uses the underlying function.

9.1.3 Standard Headers

The following list gives a rough description of each standard header and the items it declares.

assert.h
declares information used by the assert function.
ctype.h
declares a number of routines for testing characters (for example, testing if a letter is upper case) and for converting characters (for example, converting letters from upper to lower case).
errno.h
defines the errno symbol and its possible values.
float.h
defines a number of manifests that describe the way the implementation handles floating point numbers.
limits.h
defines a number of manifests describing aspects of the hardware (for example, how many bits in a byte).
math.h
declares a number of mathematical functions (for example, sin, log, sqrt).
setjmp.h
declares functions and data types that can jump out of one function and into another.
signal.h
declares functions and other symbols for exception handling. For example, <signal.h> must be included if an interactive program wants to be able to handler user interrupts.
stdarg.h
declares macros and data types used by functions that take variable length argument lists.
stddef.h
declares commonly used symbols (for example, NULL).
stdio.h
declares routines and types used in input and output.
stdlib.h
declares miscellaneous routines and symbols.
string.h
declares routines for string manipulation.
time.h
declares routines and symbols for obtaining the time and date in various forms.

9.1.4 Error Return Values

When a library function does not succeed in performing a requested operation, the function usually indicates that something has gone wrong by returning some value that could not be obtained from a successful operation. For example, when the getchar function is unable to read a character for some reason, it returns a value that is not a valid ASCII character. The calling function can test the return value to determine whether or not the library routine succeeded in its job.

9.1.5 The Errno Symbol

An error return value from a library routine tells the caller that an error was discovered. However, it usually doesn't tell the caller what kind of error was found. For this reason, ANSI C uses an external symbol named errno to provide additional information about errors detected by library functions.

errno behaves as if it were a volatile int variable. It does not have to be implemented this way (for example, it could be implemented as a macro that expanded to an int expression) but it must give a modifiable Lvalue. An appropriate declaration for errno is obtained with

#include <errno.h>

All source files that refer to errno should have this #include directive.

When a library function encounters an error, errno is assigned a positive integer value indicating what kind of error was found. For example, when a program attempts to take the square root of a negative number, the library routine doing the square root operation sets errno to a value that indicates "Invalid argument value".

When a program begins execution, errno is initialized to zero. From this point on, library routines do not touch errno except to assign error values. Note especially that they do not set errno to zero if an operation is successful. This means that user programs must set errno back to zero after an error occurs.

9.1.6 Error Names

Programs use symbolic names to refer to the possible values of errno. These symbolic names are defined in various headers.

The ANSI standard only defines two possible symbolic values for errno. The symbol EDOM is used when a library function receives an invalid argument value (for example, when the sqrt function is asked to take the square root of a negative number). The symbol ERANGE is used when the result of a library function cannot be determined validly (for example, when the exp function generates a result that is too large to be represented as a double value). Other possible values of errno are discussed in "expl c lib errno".

9.2 I/O Concepts

C attempts to make all I/O operations look the same, regardless of the device involved. This means that the same I/O functions are used for I/O on disk files, terminals, etc. C uses the generic term I/O stream for any file, device, or facility on which a program may perform I/O.

I/O streams usually must be opened before I/O can be performed on them. The actions taken by the opening operation depend on what kind of I/O device is being opened: for example, opening a disk file accesses the file and determines if the user has appropriate permissions to read or write the file, whereas opening a terminal performs different operations.

The work involved in opening a stream is transparent to the user program: the program calls the appropriate library function to perform an opening action, and the library function decides what work has to be done to prepare the stream for I/O.

When a stream has been opened successfully, the library routine that opened the stream returns a pointer to a collection of information that is needed for performing I/O on the stream. This information is stored in a data object whose type is FILE, a type defined with a typedef statement in <stdio.h>.

Most functions that perform I/O on a stream need to be passed a pointer to the FILE block describing the stream.

9.2.1 Standard Streams

Three streams are opened automatically when a C program begins execution. The library also defines pointer variables that point to the FILE information blocks associated with these streams. Below we list the pointer variable names and the streams with which they are associated.

stdin
is associated with the standard input stream.
stdout
is associated with the standard output stream.
stderr
is associated with the standard error stream.

These standard streams are used by a variety of library routines. For further information on these streams, see Section 9.2.12.

9.2.2 I/O Modes

C can perform I/O on files in two different modes: text mode and binary mode.

Text mode treats input and output as an ordered stream of bytes. Conceptually, I/O takes place character-by-character, although data is often buffered and there are functions that can read or write many bytes at a time. In this mode, data is often translated during the I/O process. For example, when the program asks to write a new-line character, the output function may write a carriage return followed by a new-line (linefeed) character to get the effect of going to a new line on a terminal screen. Because of this translation process, the characters that are read from a file may not be exactly the same as the characters written to the file.

Binary mode treats input and output as a collection of arbitrary bits. If a program writes data to a file and reads it back, the program reads back exactly what was written. However, this does not guarantee that the bits written out to a storage medium are exactly the bits specified by the program (raw I/O); the library may perform special processing on binary output, provided that the processing is reversed on binary input. Binary mode only guarantees that the write/read-back process is transparent, not that I/O takes place in a raw, unprocessed way.

9.2.3 Buffering

The C I/O routines often buffer input and output. Buffering input means that the library function that reads input actually reads many input characters at once, even if it only returns a single character to the calling program. The remaining characters are stored in an area of memory called a buffer.

Each time the library function is called, it returns a new character from the buffer. In this way, the C program may make many calls to the input function, but there is only one physical input operation until every character in the buffer has been used.

Buffering output is similar: the library functions which are called to write out characters actually accumulate the material in a memory buffer until the buffer is full. The output may be written before the buffer is full in some cases.

9.2.4 Standard I/O Routines

All input and output operations in C are performed with library functions. These functions are written in C, assembler, or some other language. Internally, they are machine dependent but the format for calling the routines is the same from one version of C to another.

Prototypes for all I/O routines may be obtained with the directive

#include <stdio.h>

Programs should not contain their own declarations for I/O routines.

9.2.5 Opening Files

The standard streams (stdin, stdout, stderr) are opened automatically as text streams when a program begins execution. To perform I/O on any other stream, you must first open the stream explicitly. This is done by a call to the library function fopen. It has the prototype

FILE *fopen(const char *filename,
            const char *options);

where FILE is the special type declared in <stdio.h>, filename is a string giving the name of the file you want to open, and options is a string telling how the file should be opened. As an example,

#include <stdio.h>
   ...
FILE *f;
f = fopen("raheinlein/tomorrow/file","r");

opens the given file for reading as a text stream.

As shown above, fopen returns a pointer to FILE data. The program stores this pointer in the variable f (also declared as a pointer to FILE data).

The second argument of fopen is a string giving options for opening the stream. The following options are the most commonly used.

"r"
Opens the file for reading as a text stream.
"rb"
Opens the file for reading as a binary stream.
"w"
Opens the file for writing as a text stream. The old contents of the file are discarded. If the named file does not already exist, it is created.
"wb"
Opens the file for writing as a binary stream. The old contents of the file are discarded. If the named file does not already exist, it is created.
"a"
Opens the file for appending as a text stream. This means that the file is opened for writing, but the old contents of the file are not discarded. If data is written to the file, it is appended on the end of the current contents. If the named file does not already exist, it is created.
"ab"
Opens the file for appending as a binary stream. Otherwise, it is identical to the "a" option.

The value returned by fopen is a pointer to FILE data. This pointer is used when calling I/O routines to read or write the opened file. If fopen fails to open the requested file (for example, if you don't have appropriate permissions to open the specified file), fopen returns a null file pointer. This means that you should always check for a NULL pointer to see if the open operation was successful, as in

#include <stdio.h>
    ...
FILE *f;
    ...
f = fopen("file","r");
if (f == NULL) {
   /* perform error processing */
}
else
{
   /* proceed with normal work */
}

fopen can also be used to open a string instead of a file or device. To tell fopen that it is opening a string, put an 's' on the end of the options argument, as in

#include <stdio.h>
    ...
FILE *f;
char s[100];
f = fopen(s,"ws");

In this case, the first argument is a pointer to the beginning of the string. Once the string has been opened for I/O, all the usual text I/O functions may be used to read from or write to the string.

Note that opening a string for I/O is an extension to the ANSI standard. This version of fopen has many other extensions to the ANSI standard; for further information, see "expl c lib fopen".

9.2.6 Closing Files

When you have finished performing all desired I/O on a file, the file should be closed using the fclose function. It has the prototype

int fclose(FILE *fp);

The argument for fclose is always a pointer to the FILE structure for the stream you want to close. The result of fclose is zero if the stream is successfully closed, and non-zero otherwise.

It is not absolutely necessary to execute fclose on every open file, since open files are automatically closed when program execution terminates. However, closing files frees internal memory space for other uses, and thus can keep your program down to a reasonable size.

9.2.7 The getc Function

The most basic input function is getc, which reads a single character. It has the prototype

int getc(FILE *fp);

fp must point to a file that has already been opened for reading. The result of getc is a character read from the stream. Note that this is returned as an int value, not a char.

If the given file is at end of file, getc returns a special int value declared as EOF in <stdio.h>. This value cannot be represented as a char, so it can never be confused with a valid input character. Thus,

#include <stdio.h>
     ...
int c;
while ( ( c=getc(fp) ) != EOF ) {
     ...
}

loops around reading one character at a time into c until it reaches the end of the file.

9.2.8 The putc Function

The most basic output function is putc. It outputs a single character. It has the prototype

int putc(int c,FILE *fp);

putc outputs the character in c to the stream indicated by fp. This file must have been opened previously. The result of putc is the character it prints out. Thus an expression like

FILE *f;
char c, d[100];
c = putc( d[3], f);

outputs the character d[3] and also assigns it to c.

9.2.9 Reading and Writing Strings

To read in a string, use the fgets function. It has the prototype

char *fgets(char *s, int N, FILE *fp);

fp points to an input file (or string); N is a number telling how many characters to read; and s points to an area in memory where the input data should be stored. fgets returns a pointer to the string read in; this pointer is identical to s.

fgets reads characters until it encounters a new-line or it has read a total of N-1 characters, whichever comes first. The last character read into s is followed by a null character '\0'. If fgets stops reading because it encounters a new-line, the new-line is included in the string s.

To write out a string, use the fputs function. It has the prototype

int fputs(const char *s,FILE *fp);

fp points to an output stream, and s points to the string that should be written.

fputs keeps writing characters until it encounters a null character '\0' marking the end of the string. It does not output this closing '\0'.

fputs doesn't automatically put a new-line character on the end of the string printed out. If you want a new-line, and the output string does not already end with one, you must output it explicitly, as in

fputs(s,fp);
putc('\n',fp);

9.2.10 Formatted Output: fprintf

More complicated I/O operations can be performed using the functions fscanf and fprintf. Both these functions use format strings to control the I/O operations.

fprintf outputs a number of values in ASCII form. It has the prototype

int fprintf(FILE *fp, const char* format, ...);

fp points to the stream that is to receive the output; the variable argument list gives expressions whose values are to be output; and format dictates the form in which these values should be written. The result of fprintf is the number of characters output.

The format string contains ordinary characters (which are just copied to the output stream) and placeholders (which dictate output formats for the arguments that follow). Each placeholder consists of a '%' character followed by one or more other characters specifying conversions. For example, %d is a placeholder indicating that an int value should be output in decimal format while %o indicates octal format. This means that

fprintf(fp, "%d + %d = %d;\n", 4, 4, 4+4);

prints out

4 + 4 = 8;

In the output, the %d placeholders are replaced by the corresponding argument values. Other characters in the format string are copied as they appear. Note that a new-line character had to be specified explicitly as '\n' so that the output line ended with a new-line.

fprintf(fp, "%o + %o = %o;\n", 4, 4, 4+4);

uses octal output and prints out

4 + 4 = 10;

Below we list a few recognized placeholders. This is not a complete list, but does give an idea of the type of formats available.

%c
prints a char value.
%d
prints an integer in its decimal representation.
%f
prints a double argument in the style
[-]ddd.ddd
%e
prints a double argument in the style
[-]d.ddde[-]dd

In other words, %e uses scientific notation, with one digit before the decimal point and an exponent at the end of the number.

%g
prints a double argument in either %e or %f format, whichever gives full precision in the least space.
%o
prints an integer in its octal representation.
%s
prints a string. The corresponding argument is taken as a pointer to an array of char. The string begins at the location indicated by the argument and ends when a '\0' character is encountered.
%u
prints an unsigned integer in decimal format.
%x
prints an integer in its hexadecimal representation.

It is possible to dictate the printing precision for floating point numbers, as well as minimum lengths for most output fields. For full details, see "expl c lib fprintf".

The effect of fprintf is the same as would be obtained with calls to putc for each output character.

9.2.11 Formatted Input: fscanf

fscanf is the counterpart of fprintf; it uses a format string to determine how to read input. fscanf has the prototype

int fscanf(FILE *fp, const char *format, ...);

fp indicates the stream that should be read and format is a string indicating how input should be interpreted. The format string for fscanf is similar to that for fprintf. fscanf normally skips over white space (blanks, tabs, and new-lines) as it is searching for numeric input. The variable argument list gives pointers to memory locations where input data should be stored. For example,

fscanf(fp,"%d",&i);

reads in a decimal integer and assigns it to the variable i. If the input stream contains the ASCII characters 10, i would be assigned the value 10.

fscanf(fp,"%o",&i);

is almost the same, but the integer is assumed to be in octal. If the file contains 10, i would be assigned the value 8.

Suppose a file contains the line

78  3.645    beeblebrox

The lines

int j; float y; char str[100];
FILE *fp;
fscanf(fp,"%d%f%s",&j,&y,str);

put the decimal number 78 into j, the floating point number 3.645 into y, and the string beeblebrox into the char array str. (Note the way that str is treated as a pointer in the call to fscanf.)

There are a good many details about fscanf that have not been covered here. These details are given in "expl c lib fscanf".

9.2.12 Redirecting Standard Streams

The default stdin and stdout can be redirected by specifications on the command line. The construct

<file

on the command line changes the standard input stream from the terminal to the named file. Similarly, the construct

>file

on the command line changes the standard output stream from the terminal to the named file. Output sent to stdout writes over whatever is currently in the given file. The construct

>>file

is similar to >file but appends output to the current contents of the file instead of overwriting it.

Below we give a few examples of how redirection works. If a program is invoked with the line

prog

stdin and stdout both refer to the terminal.

prog <in1

changes stdin to refer to the file in1; stdout still refers to the terminal.

prog >out1

changes stdout to refer to the file out1, while stdin refers to the terminal.

prog <in2 >out2

makes stdin refer to the file in2 and stdout refer to the file out2. Order of specification is not important.

prog >out2 <in2

is equivalent to the previous command line. stdin and stdout should never be opened with fopen, even if they are redirected as shown above.

<stdio.h> declares a number of special functions for I/O on the standard input and output streams. The following lists some of these functions and their equivalents.

getchar()     ==     getc(stdin)
putchar(c)    ==     putc(c,stdout)
puts(s)       ==     fputs(s,stdout)
printf(fmt,   ==     fprintf(stdout,
 a,a,...)               fmt,a,a,...)
scanf(fmt,    ==     fscanf(stdin,
 a,a,...)               fmt,a,a,...)

There is also a gets function that is similar to fgets; however,

gets(s)

reads a line from stdin and deletes the new-line on the end, while

fgets(s,N,stdin)

includes the new-line in the input. The gets function also assumes that you have enough memory at s to hold the input string, no matter how big it is.

The standard error stream stderr cannot be redirected on the command line; it is always attached to the terminal. Traditionally, error messages and other communications intended to go directly to the user are written to stderr. In this way they can be certain of going to the terminal where they can be seen, even in the standard output stream stdout has been redirected on the command line.

9.3 String Manipulation Functions

The C library contains several functions for manipulating strings. In the descriptions that follow, we assume that the ends of all strings are marked by a null character '\0'.

char *strcat(char *s1,const char *s2);
strcat appends a copy of string s2 to the end of string s1. The memory allocated to s1 must be large enough to hold the new longer string or else errors may result. strcat returns a pointer to the new string.
char *strncat(char *s1,const char *s2,size_t N);
strncat appends at most N characters of s2 to the end of s1. Otherwise, it behaves like strcat.
int strcmp(const char *s1,const char *s2);
strcmp compares its string arguments. If s1 is lexicographically greater than s2, strcmp returns a positive integer. If s1 is lexicographically less than s2, strcmp returns a negative integer. If the two strings are identical, strcmp returns zero.
int strncmp(const char *s1,const char *s2,size_t N);
strncmp works in the same way as strcmp but only examines a maximum of N characters from the two strings.
char *strcpy(char *s1,const char *s2);
strcpy copies s2 into the area pointed to by s1. The copy process ends when the trailing null character of s2 has been copied. strcpy returns a pointer to string s1.
char *strncpy(char *s1,const char *s2,size_t N);
strncpy copies exactly N characters of s2 into the area pointed to by s1. If s2 is longer than N characters, no '\0' is put on to the new s1. If s2 is shorter than N characters, s1 is padded to a length of N with trailing '\0' characters. strncpy returns a pointer to string s1.
int strlen(const char *s1);
strlen returns the length of s1.
char *strchr(const char *s1,int c);
strchr returns a pointer to the first occurrence of the character c in the string s1. If c is not found in the string, strchr returns a null pointer.
char *strrchr(const char *s1,int c);
strrchr returns a pointer to the last occurrence of the character c in the string s1. If c is not found in the string, strrchr returns a null pointer.

9.4 Memory Allocation Functions

C programs may allocate and free blocks of memory using the functions malloc and free. malloc has the prototype

void *malloc(size_t Nbytes);

Nbytes is the number of bytes of memory desired. malloc returns a pointer to a block of exactly that size, allocated on an alignment boundary suitable for any data type. This memory can then be used for any purpose.

free has the prototype

void free(void *ptr);

where ptr is a pointer to a block of memory previously allocated by malloc. The space is then made available for allocation through future calls to malloc.

Another way to allocate memory is via the function calloc. It has the prototype

void *calloc(size_t N,size_t Size);

This allocates enough space for N objects, each of which has a size of Size bytes. For example,

ptr = calloc( 100, sizeof(double) );

allocates enough space for an array of 100 objects of type double. The space allocated by calloc is initialized to 0-bits.

Dynamic allocation of memory through malloc and calloc is particularly for useful for local arrays. If arrays are allocated by auto declarations, their space is obtained from an internal stack, and the size of this stack is limited. It is more practical to allocate them through malloc to save stack space.

9.5 Using Variable Argument Lists

Variable argument lists let you define functions whose arguments are not fixed. Each time such a function is called, there may be a different number of arguments and the arguments may have different types.

If a source code file contains the definition of a function that uses a variable argument list,

#include <stdarg.h>

must appear before the first variable argument function.

To show how to use variable argument lists, suppose we begin a function definition with the prototype

int f(char *str,int count,...)

In order to access the arguments in the variable part of the list, you must first have a data structure to represent those arguments. To do this, you declare a variable of the type va_list, as in

va_list ap;

The va_list type is defined in <stdarg.h>.

Next you must associate this variable with the variable argument list. This is done with a call to the macro va_start. It has the prototype

void va_start(va_list ap,lastparm);

where lastparm is the last fixed parameter for the function. In our example of f, we would say

va_start(ap,count);

because count is the parameter preceding the "..." in the function prototype.

After the call to va_start, you use a routine called va_arg to obtain arguments one by one from the variable list. The macro is used with calls of the form

arg = va_arg(ap,type);

where ap is the variable that represents the variable list, type is the type of the next argument in the variable list and arg is a variable of that type. The assignment statement above obtains the value of the next argument on the list and assigns it to arg.

Remember that argument values specified in the variable part of a variable argument list undergo the default argument conversion rules. This means that char values are automatically converted to int, and float values are automatically converted to double. Thus you should never specify char or float for the type of va_arg, because these types will never be used for variable arguments.

When you have obtained all the arguments from the variable list, you must clean things up. This is done with a call to va_end, as in

va_end(ap);

This indicates that you are finished using the ap variable. Future calls to va_arg will not work properly unless you initialize ap again with a call to va_start.

Note that you can "walk" through the variable list several times. To end a walk, use va_end. To start a new walk, use va_start.

The variable ap may be passed as an argument to other functions. Therefore, your calls to va_arg may be spread over many functions. You must always call va_end to clean things up before the original function terminates.

Examples:

  /*
   * This function determines the maximum integer
   * in a list of numbers.   It should be called
   * with
   *     max = maxint(count,value,value,value,...)
   * where "count" is the number of "values" given.
   */
  #include <stdarg.h>
  int maxint(int N,...)
  {
       va_list ap;
       int arg,ourmax;
       va_start(ap,N);
       if (N--) ourmax = va_arg(ap,int);
       while (N--) {
           arg = va_arg(ap,int);
           ourmax = (ourmax > arg) ? ourmax : arg;
       }
       va_end(ap);
       return(ourmax);
  }

9.6 Signals

Signals indicate the occurrence of special events. For example, if a C program attempts an illegal operation like dividing by zero, the hardware or software that detects the error sends an appropriate signal to the program. Different signals indicate different kinds of events. Each possible signal is represented by a manifest whose name begins with SIG. These manifests are defined in <signal.h>.

When a program receives a signal, normal execution is interrupted and a function known as a signal handler is called. A program can have a different signal handler for every type of signal, or it may use the same handler to deal with several different signals. A program uses the signal function to name the signal handler function that should be invoked if a particular signal occurs.

Signal handler functions should be short. This minimizes the likelihood of a new signal arriving while a signal handler is dealing with the previous signal.

Even if the handler function is short, there is still a chance a new signal may come in while the handler is executing. For example, a user may enter several interrupt signals in a row. As a result, signal handler functions must be written so they can be interrupted and start over again from the top. In the language of computer science, signal handlers must be re-entrant. To be re-entrant, signal handlers must abide by the following rules.

  1. They may not define local variables that have the static storage class.
  2. They may not use any library routines, with the exception of abort, exit, longjmp, and/or signal. (Note: there may be other library routines that can be used with some implementations. However, the routines just listed are the only ones that the ANSI standard requires to be usable in re-entrant code.)
  3. They may not change the value of any external data object, except for data objects with the type sig_atomic_t. This is an integral data type defined in <signal.h>. It is the only type that is guaranteed "safe" to change. Note that the function can examine the value of any external object of any type; the function is only prohibited from changing the value.

As is obvious from this list, a signal handler function has a limited set of actions that it can perform safely. It can terminate the program using abort or exit; it can jump to another function using longjmp; or it can assign a value to a sig_atomic_t object and return.

When a signal handler function returns, it normally returns to the instruction that was executing at the time the signal was received and execution resumes from that point. However, if the signal was SIGFPE (indicating an error calculating an expression), the effect of returning from a signal handler is undefined.

If a program does not use signal to set up a handler function for a particular signal, the signal is handled by a default handler referred to as SIG_DFL. On GCOS8, SIG_DFL just issues an error message and aborts the program.

Except for the SIGILL signal (illegal opcode), the first thing that happens when a signal is raised is that C issues the function call

signal(sig,SIG_DFL)

for the signal. As a result, the next time the signal occurs, it is handled by the default handler. In other words, a user signal handler is only set up for one occurrence of the related signal. Often then, a signal handler function calls signal to set itself up to handle the next occurrence of the signal (as well as the occurrence that just took place).

As this brief introduction shows, signal handling is a conceptually difficult subject. For more information, see

expl c lib longjmp
expl c lib signal
expl c lib raise

9.7 Miscellaneous Routines

The exit routine terminates program execution. This is a simple way of stopping a program before it would finish naturally. For example, if an error occurs one might say

fprintf(stderr,"FATAL ERROR.\n");
exit(-1);

This terminates the program and returns a status of -1 to whatever invoked the program. In the process, any partly filled I/O buffers are flushed and all open files are closed.

The ANSI standard defines two special manifests which can be used as portable arguments for exit:

EXIT_SUCCESS
stands for a value which indicates successful program completion.
EXIT_FAILURE
stands for a value which indicates program failure of some kind.

These symbols are guaranteed to be portable, because they must be defined by every ANSI C compiler. If you have a call to exit which specifies some different value (e.g. -1, as in the preceding example), the program may not be portable to other systems, since the return value may have different meanings on different platforms.

Appendix A: Escape Sequences

Escape sequences are used in character constants and strings to obtain characters which for one reason or another are hard to represent directly. Here are the escapes.

\a  	beep (octal 007)
\b  	backspace
\f  	ASCII formfeed
\n  	new-line
\r  	carriage return (no line feed)
\t  	horizontal tab
\v  	vertical tab
\"  	"
\'  	'
\\  	\ (backslash)
\nnn 	nnn is 1-3 character octal number
\xnnn 	nnn is hex number (no limit on digits)

Appendix B: Characteristics Files

The ANSI standard requires two headers that define the "characteristics" of an implementation of C. For example, these files tell how many bits in a byte, the maximum size of an int value, and so on. The header <limits.h> describes general aspects of the machine; <float.h> describes the nature of floating point arithmetic.

B.1 Limits

Below we list the symbols defined in <limits.h>.

CHAR_BIT
number of bits in a byte.
CHAR_MAX
maximum value of a char object.
CHAR_MIN
minimum value of a char object.
INT_MAX
maximum value of an int object.
INT_MIN
minimum value of an int object.
LONG_MAX
maximum value of a long object.
LONG_MIN
minimum value of a long object.
SCHAR_MAX
maximum value of a signed char object.
SCHAR_MIN
minimum value of a signed char object.
SHRT_MAX
maximum value of a short object.
SHRT_MIN
minimum value of a short object.
UCHAR_MAX
maximum value of an unsigned char object.
UINT_MAX
maximum value of an unsigned int object.
ULONG_MAX
maximum value of an unsigned long int object.
USHRT_MAX
maximum value of an unsigned short int object.

B.2 Floating Point Characteristics

<float.h> uses the following terms for describing a floating point number X.

SIGN
is +1 or -1.
BASE
is the base of the exponential representation of X. This is an integer greater than 1.
EXP
is the exponent of X.
EMAX
is the maximum value of an exponent.
EMIN
is the minimum value of an exponent.
PREC
is the precision of X: the number of BASE digits in the mantissa.
DIG1,DIG2,...
are the digits of the mantissa. All of these digits are base BASE.

The value X is equal to the SIGN, times the BASE to the EXP, times the mantissa.

Below we list the symbols defined in <float.h>.

DBL_DIG
the number of decimal digits of precision in a double value.
DBL_EPSILON
the smallest positive double value that can be added to 1.0 to get a value different from 1.0.
DBL_MANT_DIG
the number of base BASE digits in the mantissa of a double value (i.e., the PREC).
DBL_MAX
the largest double value.
DBL_MAX_10_EXP
the largest integer such that 10 raised to that power is representable as a double value.
DBL_MAX_EXP
the largest integer such that BASE raised to that power minus 1 is representable as a double value.
DBL_MIN
the smallest normalized positive double value.
DBL_MIN_10_EXP
the smallest negative integer such that 10 raised to the power is still in the range of normalized double values.
DBL_MIN_EXP
the smallest negative exponent for a double value (i.e., EMIN).
FLT_DIG
the number of decimal digits of precision in a float value.
FLT_EPSILON
the smallest positive float value that can be added to 1.0F to get a value different from 1.0F.
FLT_MANT_DIG
the number of base BASE digits in the mantissa of a float value (i.e., the PREC).
FLT_MAX
the largest float value.
FLT_MAX_10_EXP
the largest integer such that 10 raised to that power is representable as a float value.
FLT_MAX_EXP
the largest integer such that BASE raised to that power minus 1 is representable as a float value.
FLT_MIN
the smallest normalized positive float value.
FLT_MIN_10_EXP
the smallest negative integer such that 10 raised to the power is still in the range of normalized float values.
FLT_MIN_EXP
the smallest negative exponent for a float value (i.e., EMIN).
FLT_RADIX
the BASE of a float number.
FLT_ROUNDS
indicates whether addition rounds or truncates. A positive number indicates it rounds; 0 indicates it truncates; -1 indicates the case is indeterminable.
LDBL_DIG
the number of decimal digits of precision in a long double value.
LDBL_EPSILON
the smallest positive long double value that can be added to 1.0L to get a value different from 1.0L.
LDBL_MANT_DIG
the number of base BASE digits in the mantissa of a long double value (i.e., the PREC).
LDBL_MAX
the largest long double value.
LDBL_MAX_10_EXP
the largest integer such that 10 raised to that power is representable as a long double value.
LDBL_MAX_EXP
the largest integer such that BASE raised to that power minus 1 is representable as a long double value.
LDBL_MIN
the smallest normalized positive long double value.
LDBL_MIN_10_EXP
the smallest negative integer such that 10 raised to the power is still in the range of normalized long double values.
LDBL_MIN_EXP
the smallest negative exponent for a long double value (i.e., EMIN).

Appendix C: Library Names

This appendix lists selected names defined as part of the C library. After each symbol name, we give the header file in which is it defined.

_IOFBF <stdio.h>
_IOLBF <stdio.h>
_IONBF <stdio.h>
abort <stdlib.h>
abs <stdlib.h>
acos <math.h>
asctime <time.h>
asin <math.h>
assert <assert.h>
atan <math.h>
atan2 <math.h>
atexit <stdlib.h>
atof <stdlib.h>
atoi <stdlib.h>
atol <stdlib.h>
bsearch <stdlib.h>
BUFSIZ <stdio.h>
calloc <stdlib.h>
ceil <math.h>
CHAR_BIT <limits.h>
CHAR_MAX <limits.h>
CHAR_MIN <limits.h>
clearerr <stdio.h>
CLOCKS_PER_SEC <time.h>
clock <time.h>
clock_t <time.h>
cos <math.h>
cosh <math.h>
ctime <time.h>
DBL_DIG <float.h>
DBL_EPSILON <float.h>
DBL_MANT_DIG <float.h>
DBL_MAX <float.h>
DBL_MAX_10_EXP <float.h>
DBL_MIN <float.h>
DBL_MIN_10_EXP <float.h>
DBL_MIN_EXP <float.h>
difftime <time.h>
div <stdlib.h>
div_t <stdlib.h>
EDOM <math.h>
EOF <stdio.h>
ERANGE <math.h>
ERANGE <stdlib.h>
errno <stddef.h>
exit <stdlib.h>
exp <math.h>
fabs <math.h>
fclose <stdio.h>
feof <stdio.h>
ferror <stdio.h>
fflush <stdio.h>
fgetc <stdio.h>
fgetpos <stdio.h>
fgets <stdio.h>
FILE <stdio.h>
floor <math.h>
FLT_DIG <float.h>
FLT_EPSILON <float.h>
FLT_MANT_DIG <float.h>
FLT_MAX <float.h>
FLT_MAX_10_EXP <float.h>
FLT_MIN <float.h>
FLT_MIN_10_EXP <float.h>
FLT_MIN_EXP <float.h>
FLT_RADIX <float.h>
FLT_ROUNDS <float.h>
fmod <math.h>
fopen <stdio.h>
fpos_t <stdio.h>
fprintf <stdio.h>
fputc <stdio.h>
fputs <stdio.h>
fread <stdio.h>
free <stdlib.h>
freopen <stdio.h>
frexp <math.h>
fscanf <stdio.h>
fseek <stdio.h>
fsetpos <stdio.h>
ftell <stdio.h>
fwrite <stdio.h>
getc <stdio.h>
getchar <stdio.h>
getenv <stdlib.h>
gets <stdio.h>
gmtime <time.h>
HUGE_VAL <stdlib.h>
HUGE_VAL <math.h>
INT_MAX <limits.h>
INT_MIN <limits.h>
isalnum <ctype.h>
isalpha <ctype.h>
iscntrl <ctype.h>
isdigit <ctype.h>
isgraph <ctype.h>
islower <ctype.h>
isprint <ctype.h>
ispunct <ctype.h>
isspace <ctype.h>
isupper <ctype.h>
isxdigit <ctype.h>
jmp_buf <setjmp.h>
L_tmpnam <stdio.h>
labs <stdlib.h>
LDBL_DIG <float.h>
LDBL_EPSILON <float.h>
LDBL_MANT_DIG <float.h>
LDBL_MAX <float.h>
LDBL_MAX_10_EXP <float.h>
LDBL_MIN <float.h>
LDBL_MIN_10_EXP <float.h>
LDBL_MIN_EXP <float.h>
ldexp <math.h>
ldiv <stdlib.h>
ldiv_t <stdlib.h>
localtime <time.h>
log <math.h>
log10 <math.h>
LONG_MAX <limits.h>
LONG_MIN <limits.h>
longjmp <setjmp.h>
malloc <stdlib.h>
memchr <string.h>
memcmp <string.h>
memcpy <string.h>
memmove <string.h>
memset <string.h>
mktime <time.h>
modf <math.h>
NDEBUG <assert.h>
NULL <stddef.h>
offsetof <stddef.h>
OPEN_MAX <stdio.h>
perror <stdio.h>
pow <math.h>
printf <stdio.h>
ptrdiff_t <stddef.h>
putc <stdio.h>
putchar <stdio.h>
puts <stdio.h>
qsort <stdlib.h>
raise <signal.h>
rand <stdlib.h>
RAND_MAX <stdlib.h>
realloc <stdlib.h>
remove <stdio.h>
rename <stdio.h>
rewind <stdio.h>
scanf <stdio.h>
SCHAR_MAX <limits.h>
SCHAR_MIN <limits.h>
SEEK_CUR <stdio.h>
SEEK_END <stdio.h>
SEEK_SET <stdio.h>
setbuf <stdio.h>
setjmp <setjmp.h>
setlocale <locale.h>
setvbuf <stdio.h>
SHRT_MAX <limits.h>
SHRT_MIN <limits.h>
sig_atomic_t <signal.h>
SIG_DFL <signal.h>
SIG_ERR <signal.h>
SIG_IGN <signal.h>
SIGABRT <signal.h>
SIGFPE <signal.h>
SIGILL <signal.h>
SIGINT <signal.h>
signal <signal.h>
SIGSEGV <signal.h>
SIGTERM <signal.h>
sin <math.h>
sinh <math.h>
size_t <stddef.h>
sprintf <stdio.h>
sqrt <math.h>
srand <stdlib.h>
sscanf <stdio.h>
stderr <stdio.h>
stdin <stdio.h>
stdout <stdio.h>
strcat <string.h>
strchr <string.h>
strcmp <string.h>
strcoll <string.h>
strcpy <string.h>
strcspn <string.h>
strerror <string.h>
strftime <time.h>
strlen <string.h>
strncat <string.h>
strncmp <string.h>
strncpy <string.h>
strpbrk <string.h>
strrchr <string.h>
strspn <string.h>
strstr <string.h>
strtod <stdlib.h>
strtok <string.h>
strtol <stdlib.h>
strtoul <stdlib.h>
system <stdlib.h>
tan <math.h>
tanh <math.h>
time <time.h>
time_t <time.h>
tm <time.h>
TMP_MAX <stdio.h>
tmpfile <stdio.h>
tmpnam <stdio.h>
tolower <ctype.h>
toupper <ctype.h>
UCHAR_MAX <limits.h>
UINT_MAX <limits.h>
ULONG_MAX <limits.h>
ungetc <stdio.h>
USHRT_MAX <limits.h>
va_arg <stdarg.h>
va_end <stdarg.h>
va_list <stdarg.h>
va_start <stdarg.h>
vfprintf <stdio.h>
vprintf <stdio.h>
vsprintf <stdio.h>

Appendix D: Converting Old Programs to ANSI

Most C source code will compile and execute correctly under this ANSI version of the C package, provided that the code could compile and execute correctly under previous non-ANSI packages. It is possible that old code could be given a large number of warning messages, but it still should work as is.

Most of the warning messages arise from disagreements between old library routines and ANSI ones. For example, many of the arguments that used to be (char *) are now considered (void *). If old source code contains (char *) declarations for these routines, the code will disagree with the (void *) declarations given in the standard library headers. However, the current implementation of (void *) pointers is exactly the same as the old (char *) pointers, so there is no actual conflict. They are just different names for the same thing.

As a rule of thumb, we suggest that programmers just compile old code, ignore all warnings, and see if the program works properly. Actual errors need attention, but there should be few of these. The rest of this appendix discusses special areas where you should take care to make sure code still works.

D.1 The Library

This section discusses problems that may arise with library routines.

D.1.1 Time Routines

The time_t type (used in the functions difftime, time, ctime, gmtime, and localtime) may not be an int type. You could lose a great deal of significance if you try to store time_t types in int variables.

D.1.2 The printf Family

The ANSI standard has dictated new behaviors for printf and related functions. This means that our old printf had to change. The changes are not drastic (mostly to default precisions), but they may affect some programs.

%b
The old placeholder (for BCD strings) has been renamed %_s (with an underscore before the 's'). %b is no longer valid.
%e,%f,%g
The behaviors of these three formats have changed; see "expl c lib printf". The old versions are still supported, under the names %_e, %_f, and %_g.
%h
The old %h placeholder (for BCD characters) has been renamed %_c. In the ANSI version of printf, 'h' is used for short arguments.
%v
The old %v placeholder (for variable argument lists) has been renamed %_v. The corresponding argument should be a va_list type object. %v is no longer valid.

D.1.3 B Routines

Previous (single segment) C packages included a large number of functions from the B library (now known as the UW Tools library). These were never documented, but insiders knew they were there and some programmers used them.

The NS mode C package does not contain any of the UW Tools routines, although there are some library functions whose names are similar to UW Tools functions. Thus if you want a program to work under both SS mode and NS mode C, you must use C routines rather than UW Tools ones.

Appendix E: Extensions

The main body of this manual mostly discussed features of C that are consistent with the ANSI standard. This appendix looks at features which are extensions to the standard. Note that using the features described in this appendix will make your source code non-portable to many other implementations of C.

E.1 Bit Fields

Bit fields may be declared to be enum types as well as int and unsigned.

A typedef may define a bit field type. Of course, such a type may only be used inside structure definitions. For example,

typedef unsigned int BYTE:9;

defines BYTE to be an unsigned bit field type that is nine bits long.

E.2 The __typeof Operator

The __typeof operator can be used anywhere that a type reference is valid (e.g. declarations, cast operations). The keyword __typeof begins with two underscore characters.

__typeof(expression)

stands for the type of the given expression. For example,

__typeof(x) y;

declares y to have the same type as the variable x, whatever type that is.

__typeof(type)

stands for the given type. For example,

#define PT(T) __typeof(T) *

creates a macro whose result is the type "pointer to T".

E.3 Improved Constant Expressions

Most <math.h> and all <ctype.h> functions may be used in constant expressions, provided that the arguments to the functions are constant expressions. For example, you may have

double power_array[4] = {
	1, exp(1), exp(2), exp(3)
};

to initialize array elements to powers of e. The <math.h> functions that can be used in this way are

acos    asin    atan    ceil
cos     cosh    exp     fabs
floor   log     log10   sin
sinh    sqrt    tan     tanh

In addition, the comma operator may be used in constant expressions.

E.4 Address of Constant Expression

This version of C lets you take the address of a constant expression, as in

&5

To evaluate this, the compiler first creates a data object holding the value of the constant expression. The result of the & operation is a pointer to the created data object. The type of the result is a pointer to a const version of the type of the expression result.

E.5 Reference Types

Reference types were introduced as part of C++. A reference data object refers to another data object, in much the same way that a pointer does. (Internally, reference data objects are pointers.) References are declared using the & character. For example,

int &p = val;

declares p to be a reference variable which refers to the int value val. In some sense, p is another name for val.

p = 2;

assigns val the value 2. Technically speaking, the above statements are equivalent to

int *p = &val;
*p = 2;

The major use of reference types is to create call-by-reference functions.

int f(int &y);

is a prototype for a function f which takes a single reference value as its argument. The value passed to the function should be an int; but inside the function, the parameter y is a reference to the argument that was passed. Thus any change to y causes a corresponding change in the argument that was passed.

For example, consider

void swap(int &x,int &y)
{
	int temp;
	temp = x;
	x = y;
	y = temp;
}

If you call this with

int A, B;
swap(A,B);

the function actually switches the values of the variables A and B. The parameters x and y inside the function refer to the real A and B in the function call. In other words, you get the same effect as if you define

void swap(int *x,int *y)
{
	int temp;
	temp = *x;
	*x = *y;
	*y = temp;
}

and call the function with

swap(&A,&B);

(Internally, the two versions are identical.)

Since you can take the address of a constant, you can pass a constant as a call-by-reference argument. However, you should be careful about what the function does with the constant being referenced.

E.6 Macros with Variable Argument Lists

You can use "..." to indicate a macro with a variable argument list, in the same way that you use it in the prototype of a function. As a simple example, consider

#define err(FORMAT,...) fprintf(stderr,FORMAT)

The err macro prints out a message to the standard error stream. It always takes at least one argument, named FORMAT. It may take additional arguments as indicated by the "..." inside the parentheses.

The macro definition assigns names to all the macro arguments that must appear in a macro call, and there must always be at least one named argument. Looking at the above definition, we see that err must always be called with at least one argument, and that argument is known as FORMAT.

When the macro is called, the last named argument before the "..." will be replaced by the corresponding argument in the macro call, plus a list of all remaining arguments passed to the macro. For example,

  err("Invalid values %d %d\n",i,j);

would expand to

  fprintf(stderr,"Invalid values %d %d\n",i,j);

The FORMAT argument in the macro definition is replaced by the entire variable length list in the macro call.

Appendix F: Near, Far, and Huge Objects

The keywords __near, __far, and __huge are extensions to the ANSI standard. They are type qualifiers similar to const and volatile. Collectively, we will call __near, __far, and __huge the location qualifiers. All of these keywords begin with two underscore characters.

The location qualifiers are only recognized for compatibility with NS mode C. In SS mode, these qualifiers have no effect. However, SS mode C performs the same type-checking on the location qualifiers that NS mode C does; therefore, you can use SS mode C to test-compile code that will eventually run in NS mode.

For more information on these qualifiers, see the NS Mode C Reference Manual. If you are not writing an NS mode program, you do not need these qualifiers.

Appendix G: Implementation-Defined Behavior

The ANSI standard requires that every implementation of C document its behavior in a number of areas. This appendix summarizes the implementation-defined behaviors of this compiler, using the same section titles as Appendix F of the ANSI standard for C.

G.1 Translation

Diagnostics are written to stdout at the time the program is compiled. These diagnostics can be redirected by an appropriate >file construct on the compiler command line.

If a particular compilation results in a large number of messages, the compiler may stop printing them after the first 20 or so. In all likelihood, the source code only has a small number of problems, but the problems have confused the compiler to the point where it is no longer printing out useful diagnostics.

G.2 Environment

The main function may be called with

int main(void);
int main(int argc,char **argv);
int main(int argc,char *argv[]);

Interactive devices include terminals and consoles.

G.3 Identifiers

There is no limit on the number of significant characters in an identifier without external linkage.

In identifiers with external linkage, only the first six characters are significant.

Case distinctions are not significant in identifiers with external linkage.

G.4 Characters

This compiler uses the 9-bit ASCII character set for both the source and execution character sets.

In the execution character set, characters are stored in 9-bit bytes.

The source and execution character sets are identical, so source characters map directly onto the corresponding execution character in character constants and string literals.

There is no way to specify an integer character constant that contains a character or escape sequence not represented in the basic execution character set.

Integer character constants may contain 1-4 ASCII characters. The given characters are stored right-justified in the machine word, and the remaining bytes on the left are filled with 0-bits.

All conversions between multibyte characters and wide characters (including wide character constants) are performed in the C locale.

A "plain" char value is treated as an unsigned char.

G.5 Integers

A value of the int or long int type is stored in a 36-bit machine word.

If you specify the +18bitShorts option while compiling, a value of the short int type is stored in an 18-bit halfword. If you do not specify +18bitShorts, values of the short int type are stored in 36-bit words (the same as int and long int).

Signed values of any integer type use the hardware's usual 2's-complement representation, while unsigned values are simple binary numbers.

When converting an integer to a shorter signed integer (an int or long to a short when +18bitShorts is in effect), the high order bits are simply discarded leaving the lower halfword. This preserves the value of the original integer if it can be represented as a short.

When converting an unsigned integer to a signed integer of equal length, the compiler simply retains the same bit pattern.

A bitwise operation on a signed integer reflects the 2's-complement representation of signed integers.

In an integer division operation A/B, the sign of the remainder is the same as the sign of the dividend A.

When you right shift (>>) a negative-valued signed integral value, vacated bits are filled with the sign bit. In other words, this is an arithmetic shift.

G.6 Floating Point

A value of the float type is represented as a 36-bit single-precision floating point number using the hardware's standard format. For DPS90 machines, this format is described in the GCOS8 DPS90 Assembly Instructions Manual (Order Number DX20). For other machines, this format is described in the GCOS8 Assembly Instructions Manual (Order Number DH03).

Values of the double and long double type are represented as 72-bit double-precision floating point numbers using the hardware's standard format (as described in the previously cited manuals).

The Assembly Instructions Manuals previously cited also describe the direction of truncation when integral numbers are converted to floating point numbers that cannot exactly represent the original value, and the direction of truncation or rounding when a floating point number is converted to a narrower floating point number.

G.7 Arrays and Pointers

The sizeof operator returns a value of type unsigned int. Thus, size_t is equivalent to unsigned int.

When casting a pointer to an integer or vice versa, the compiler simply retains the bit pattern of the original value.

The ptrdiff_t type is defined as int.

G.8 Registers

The compiler makes no attempt to store any register data objects in hardware registers. However, applying the register specifier to a data object does make it possible for the compiler to perform certain code optimizations that might otherwise be impractical. The reason for this is that you cannot take a pointer to a register data object, nor refer to such objects in any way except by their names. Thus the compiler can keep track of all references to such a data object, possibly storing its value in a register for some of the object's lifetime, even if the object is not always kept in a dedicated register.

G.9 Structures, Unions, Enumerations, and Bit Fields

When a member of a union object is accessed using a member of a different type, the compiler simply uses the bit pattern of whatever value currently occupies the associated memory location.

Each member of a structure is aligned according to the alignment required by the type of that member. For more information, see Section 3.4.4.

A plain int bit field is treated as a signed int.

Within a machine word, bit fields are allocated from left to right, in the order specified by the source code.

A bit field may not straddle a machine-word boundary.

Values of an enumeration type are represented with int values.

G.10 Qualifiers

Since volatile objects are always stored in regular memory, all accesses to volatile objects are normal memory accesses. For basic types (int, long, float, pointers, etc.) such references are usually atomic. Access to char types is not atomic. The only volatile accesses guaranteed to be atomic are ones to a volatile version of the sig_atomic_t type.

G.11 Declarators

There is no maximum number of declarators that may modify an arithmetic, structure, or union type.

G.12 Statements

There is no limit on the number of case values in a switch statement.

G.13 Preprocessing Directives

The value of a single-character character constant in a constant expression that controls a conditional inclusion matches the value of the same character constant in the execution character set. Such a character constant may not have a negative value.

The method for locating includable source files is described in "expl c directive include". That explain file also tells how quoted names and other character sequences are handled in #include statements.

Section 8.3 explains the behavior of all recognized #pragma directives.

Since the date and time is always available, __DATE__ and __TIME__ are always defined by the current date and time.

G.14 Library Functions

The NULL macro expands to (void *)0.

If the expression of an assert macro is false, the macro outputs

Assertion "expression" failed, file xxx, line yyy

where "expression" is the supplied expression, xxx is the name of the file containing the where the assert macro appeared. Once the message has been printed out, the program terminates via abort.

The islower function tests for the letters

abcdefghijklmnopqrstuvwyz

of the ASCII character set. The isupper function tests for the letters

ABCDEFGHIJKLMNOPQRSTUVWXYZ

of the ASCII character set. The isalpha function tests for the combined upper and lower case character sets of islower and isupper. The isalnum function tests for those same letters, plus the 10 ASCII digits.

The isprint function tests for ASCII characters from octal 040 to octal 0126. All other characters are considered control characters (which means that iscntrl tests for those other characters).

All mathematical functions return 0.0 in the case of a domain error.

The mathematics functions do not set errno to ERANGE on underflow range errors.

The fmod function returns zero if its second argument is zero.

The set of signals recognized by the signal function is given in "expl c lib signal". That explain file also tells the default handling for each recognized signal.

SS C always performs the equivalent of

signal(sig,SIG_DFL)

before calling the user handler, whether the signal sig is SIGILL or some other signal. No blocking of the signal is performed.

The last line of a text stream requires a terminating new-line character.

Space characters written to a text stream immediately before a new-line do appear when read back in.

No null characters are appended to data written to a binary stream.

The file position indicator of an append mode stream is initially positioned at the end of the file.

A write on a text stream truncates the file beyond that point.

All input and output files are buffered.

A zero-length file can exist and will be created in the file system.

The rules for valid file names are explained in the GCOS8 FMS Reference manual (DH19).

A program may not have the same file open as several different streams.

If you execute remove on an open file, the operation succeeds on an FMS file system, unless the file is protected. On an FS8 file system, the operation will probably fail. If the operation fails on either FMS or FS8, remove sets errno to the file system's error status value, plus _ER_FMS (defined in <errno.h>).

If you attempt to use rename to give a file a name that is already in use, the operation fails with a status of -1 and the error message

RENAME: name not unique

The %p placeholder for fprintf displays the corresponding pointer value as an octal integer.

The %p placeholder for fscanf expects the corresponding value in the input string to be an integer (any format).

A dash character (-) has no special meaning when used anywhere in a %[...] placeholder for fscanf.

If the fgetpos function cannot seek on a particular type of device, it sets errno to _ER_LIO+2 (where _ER_LIO is a constant defined in <errno.h>). If ftell can't determine the current seek location, it sets errno to one of the following values:

errno=_ER_LIO+4
Record too long to encode seek address. This can happen if the length of a record is so long, the length is too big to fit in the integer size allotted.
errno=_ER_LIO+3
Non-binary records found in binary stream. This occurs if a file is opened with the "b" option, but it contains records which are not Media 4.
errno=_ER_LIO+2
Cannot seek on given device class.

The fsetpos function may set an errno value of _ER_LIO+12 for Internal error; cannot determine file position.

If the size requested in a call to malloc , calloc, or realloc is zero, the functions return a unique pointer.

The abort function closes all open files.

For details about the way the exit function handles values except for zero, EXIT_SUCCESS and EXIT_FAILURE, see "expl c lib exit".

Since GCOS8 does not have environment variables, the set of environment variables available to getenv is always empty.

The use of the system is described in "expl c lib system".

For information about time zones and daylight savings time, see "expl c time".

The clock function measures time relative to a point during program start-up. Because of differences in program set-up, the clock may start at different times for different programs, so the absolute value of the clock is not particularly useful. However, all calls to clock during the execution of a single program are made relative to the same starting point, so the differences between calls to clock can be compared.