Variables and Constants ------------------------ Variables are stored in memory, whereas constants need not be (they may be replaced at the compile time by the compiler with the value they refer to). Variables are associated with a location and value. The location is called its l-value, while the value stored in this location is called the r-value. These two values can be captured by a diagram such as: Location +----------+ +------------------+ | +-----+ | Name -------| other attributes |----| |Value| | +------------------+ | +-----+ | +----------+ Henceforth, we will omit the "other attributes" from this picture. (In the notes, we use this way to represent the box-and-circle diagram used in your textbook. In particular, this picture captures the first figure in Section 5.5.1 of your textbook.) In assignment statements of the form x = y, we implicitly use the l-value of the variable x on the lhs of the assignment, and the r-value of the variable on the assignment. In other words, we are updating the LOCATION corresponding to x with the VALUE corresponding to y. The semantics of assignment can be understood precisely by looking at the following box-and-circle diagram: +----------+ | +-----+ | y --------| |Value| | | +--|--+ | +-----|----+ | +-----|----+ | V | | +-----+ | x --------| |Value| | | +-----+ | +----------+ Some languages provide explicit operators to access the l-value or r-value of a variable. For instance, the "*" operator in C/C++ accesses the r-value of a variable, even if it is used on the lhs of an assignment, such as *x = y. Similarly, "&" operator is used to override the use of r-values on the rhs, as in x = &y Most imperative languages use the above semantics for assignment, which requires the value of y to copied into the location for x. This called "STORAGE SEMANTICS". Some languages use a POINTER SEMANTICS, where the locations for x and y are simply shared: +----------+ | +-----+ | y --------| |Value| | /| +-----+ | / +----------+ / / / / x -- Pointer semantics is uncommon in languages that permit assignment, since it causes confusion: a future assignment to y may change the value of x. Pointer semantics is used in SNOBOL, and in some situations in LISP. Constants ----------- The semantics of constants is given in terms of their r-values. Although they may be stored in memory, we do not typically think about their l-values: since they are constants, their contents cannot be changed, so it is of no use to access their l-values. Since we only think about the r-value of constants, we say that constants have a "VALUE SEMANTICS" Note that the notion of a constants is symbolic: A constant is simply a symbolic name for a value. Aliases, Dangling References and Garbage ----------------------------------------- Alias: Two variables have the same l-value In C++, we can define reference types using the syntax &: int& y Reference variables have to be initialized with their l-value, e.g.: int x = 1; int& y = x; The resulting picture is: +----------+ | +-----+ | x --------| | 1 | | /| +-----+ | / +----------+ / / / / y -- Later, if we statement y = 3 then the value of x is also changed to 3: +----------+ | +-----+ | x --------| | 3 | | /| +-----+ | / +----------+ / / / / y -- It would not be apparent just by looking at the statement "y = 3" that the value of x would change as a result of this statement. For this reason, we say that the assignment to y has the "side-effect" of changing the value of x. Side-effects cause confusion for someone trying to read or understand a program, and are hence to be avoided in general. Since aliases cause such side-effects, one should be careful to use aliases sparingly. Aliases can also be created due to the use of pointer variables. int *x = NULL; +----------+ | +-----+ | x --------| | o--|-|-----+ | +-----+ | | +----------+ --+-- --- x = new int; *x = 4; +----------+ +----------+ | +-----+ | | +-----+ | x --------| | o--|-|----->| | 4 | | | +-----+ | | +-----+ | +----------+ +----------+ int *y; y = x; Execution of these two statements will result in: +----------+ +----------+ | +-----+ | | +-----+ | x --------| | o--|-|--+-->| | 4 | | | +-----+ | | | +-----+ | +----------+ | +----------+ | | +----------+ | | +-----+ | | y --------| | o--|-|--+ | +-----+ | +----------+ Note that the assignment statement has worked in the same way as it did when x and y were integer variables: the location of the variable on the lhs has been updated with the r-value of the variable on the rhs. It happens to be the case that the r-value of x is itself a pointer, and hence the result shown above. Arrays Vs Pointers in C ------------------------- In C, an array of type T is considered to have the same type as a pointer to type T. For instance, if we have int a[5]; int *b; then b and a have the same type, int*. But the semantics of array and pointers are quite different: -- you cannot assign a value to "a", but you can assign a value to "b" This is because the l-value of "a" cannot be changed. -- you can assign a value to *a, but before you can assign to *b, you have to first allocate storage for it. The box-and-circle diagrams look as follows: (5 locations, each can store integer) +-------------------------+ | +---+---+---+---+---+ | a --------| | | | | | | | | +---+---+---+---+---+ | +-------------------------+ (1 location that can store an integer pointer) +----------+ | +-----+ | b --------| | o--|-|-----> pointer to nowhere | +-----+ | (Unitialized) +----------+ An assignment *a = 3 will result in: +-------------------------+ | +---+---+---+---+---+ | a --------| | 3 | | | | | | | +---+---+---+---+---+ | +-------------------------+ whereas "*b = 3" will result in the following picture, assuming that b has previously been initialized with a pointer to a location that can hold an integer (using a statement such as "b = new int;") +----------+ +----------+ | +-----+ | | +-----+ | b --------| | o--|-|----->| | 3 | | | +-----+ | | +-----+ | +----------+ +----------+ Garbage --------- A location that has been allocated, but is no longer accessible in a program: int *x; int y = 3; x = new int; +----------+ +----------+ | +-----+ | | +-----+ | x --------| | o--|-|----->| | | | | +-----+ | | +-----+ | +----------+ +----------+ +----------+ | +-----+ | y --------| | 3 | | | +-----+ | +----------+ *x = 5; x = &y; results in: (Garbage location: allocated, but no +----------+ +----------+ pointers to it, which | +-----+ | | +-----+ | means that it cannot be x --------| | o--|-|-+ | | | | accessed) | +-----+ | | | +-----+ | +----------+ | +----------+ | +----------+ | | +-----+ |<+ y --------| | 3 | | | +-----+ | +----------+ Garbage leads to loss of available memory, but does not otherwise affect correctness of programs, i.e., programs that create garbage will operate correctly as long as they do not run out of memory. A program that produces garbage is said to have a MEMORY LEAK. Since premature freeing of locations (see DANGLING POINTERS below) causes more serious problems, programmers often err on the side of writing programs that leak memory. The net result is that long-running programs eventually run out of memory and crash. While this behavior may be tolerable for programs such as browsers, it is unacceptable for server programs with high availability requirements. In language with a strong type discipline (eg Java, SML) it is possible to detect garbage automatically and reclaim it. The process of search/reclamation is called GARBAGE COLLECTION. In languages with lax type systems (C/C++), automatic garbage collection is much harder.