Symbol Table --------------- Uses data structures that allow efficient name lookup operations in the presence of scope changes. We can use -- hash tables to lookup attributes for each name -- a scope stack that keeps track of the current scope (in which a symbol to be looked up appears) and its surrounding scopes. The top most element in the scope stack corresponds to the current (ie innermost) scope, while the bottommost element will correspond to the outermost (ie global) scope. Support for scopes: lexical scopes can be supported using a scope stack as follows: a) symbols in a program reside in multiple hash tables: in particular, symbols within each scope are contained in a single hash table for that scope b) at any point in the program, the scope stack keeps track of all the scopes surrounding that program point, as described above. The elements of the stack contain pointers to the corresponding hash table. c) to lookup a name, we start from the hash table pointed to by the top element of the stack. If the symbol is not found, we try hash table pointed by the next lower entry in the stack. This process is repeated until we find the name, or we reach the bottom of the stack (this means that the name has never been declared.) d) scope entry and exit operations modify the scope stack appropriately. Specifically, when a new scope is entered, a corresponding hash table is created. A pointer to this hash table is pushed onto the scope stack. When we exit a scope, the top of the stack is popped off. Example: <------------------ (1) float y = 1.0 <------------------ (2) void f(int x) { <------------------ (3) for (int x = 0; ...) { <------------------ (4) { <------------------ (5) int y = 1; <------------------ (6) } <------------------ (7) { <------------------ (8) float x = 1.0; <------------------ (9) } <------------------ (10) } <------------------ (11) } <------------------ (12) main() { <------------------ (13) float y = 10.0; <------------------ (14) f(1); } <------------------ (15) Illustration: At (1), we have a single hash table, which is the global hash table. The scope stack contains exactly one entry, which points to this global hash table. When the compiler moves from (1) to (2), the name y is added to the hash table for the current scope. Since the top of scope stack points to the global table, this means that "y" is being added to the global table. When compiler moves from (2) to (3), the name "f" is added to the global table, a new hash table for f's scope is created. A pointer to f's table is pushed on the scope stack. Then the parameter "x" added to hashtable for the current scope (ie f) (Students should work out the rest of the moves.) Static or lexical scoping: associations are determined at compile time, using a sequential processing of program. Dynamic scoping: associations are determined at runtime --- processing of program statements follows the execution order of different statements. Example: if we added a new function "g" to the above program as follows: void g() { int y ; f() ; } Consider references to the name "y" at (3). With static scoping, this name always refers to the global variable y defined between (1) and (2). With dynamic scoping, the search for non-local names follows the execution nesting, rather than static nesting. This means that if "f" is called from main, then at (3), "y" will refer to the float variable declared in main. If "f" is invoked from within g, the same name will refer to the integer variable "y" defined in g. Thus, the same name may refer to different entities, possibly of different types, in a dynamically scoped language. Since the type associated with the name "y" at (3) can differ depending upon the point of call, we cannot statically determine the type of y. This suggests that dynamic scoping does not fit well with static typing. Since static typing has now been accepted to be the right approach, almost all current languages (C/C++/Java/SML/LISP) use static scoping Scopes in SML: -------------- -- most names are at the "top-level," which corresponds to global scope. -- formal parameters of functions are within the scope of the function -- let statement introduces new bindings whose scope extends from the point of binding to the end of the let-block e.g., val v = let val x = 2 val y = x * y in x*y end Allocation, Extent, Environment -------------------------------- A variable is stored in memory at a location (determined by the environment) corresponding to the variable. Constants do not need to be stored in memory. (Constants Vs Literals) Environment stores the binding between variable names and the corresponding locations in memory. The process of setting up this binding is known as storage allocation. 1. static allocation: allocation performed at compile time. Compiler translates all names to corresponding location in the code generated by it. Examples: all variables in original FORTRAN all global and static variables in C/C++/Java 2. stack allocation: needed in any language that supports the notion of local variables for procedures. Also called "automatic allocation" but this is somewhat of a moisnomer now. Examples: all local variables in C/C++/Java procedures and blocks Implementation: Compiler translates all names to relative offsets from a location called the "base pointer" or "frame pointer". The value of this pointer varies will, in general, be different for different procedure invocations. The pointer refers to the base of the "activation record" (AR) for an invocation of a procedure. The AR holds such information as parameter values, local variables, return address (where execution should resume on exit from the current procedure) etc. Example: int fact (int n) { if n = 0 then 1 else { int rv = n * fact (n-1) ; return rv ; } } main() { fact (5) ; } An activation record is created on the stack for each a call to function. The start of activation record is pointed to by a register called BP. The other elements (e.g., parameters, local variables) are accessed by offset from BP (e.g., first parameter may be accessed as BP, second parameter as BP+2, assuming each parameter occupies 2 bytes etc). Then on the first call to fact, BP is decremented (since stack grows downwards, towards address 0 in the memory) to point to new activation record, n is initialiaed to 5, rv is pushed but not initialized (since fact has to compute factorial of 4 yet) and then new activation record (again with n initialized to 4, rv is pushed on top of n) is created for the next recursive call and so on until when n becomes 0, at which time, stack is unrolled where successive rv's are assigned the value of n at that stage and the rv of previous stage (thus computing the factorial). Dynamic allocation - explicit allocation, freeing e.g., malloc/free in C, new/delete in C++ - explicit allocation, automatic free Java - automatic allocation, automatic free Lisp, SML Extent, Lifetime: duration of allocation of a variable Variables and Constants Variables are stored in memory, whereas constants need not be (they may be replaced at the compile time by the compiler with the value they refer to). Value of variables can change at runtime. Variables have a location and value. Location refers to l-value (the value on left of assignment) and value refers to r-value (the value on right) e.g., in x = y, r-value of y is stored in location that is l-value of x.