In computer science, a symbol table is a data structure used by a language translator such as a compiler or interpreter, where each identifier (a.k.a. symbol) in a program's source code is associated with information relating to its declaration or appearance in the source.
Video Symbol table
Background
A symbol table may only exist during the translation process, or it may be embedded in the output of that process, such as in an ABI object file for later exploitation. For example, it might be used during an interactive debugging session, or as a resource for formatting a diagnostic report during or after execution of a program.
Maps Symbol table
Implementation
Numerous data structures are available for implementing tables. Trees, linear lists and self-organizing lists can all be used to implement a symbol table. The symbol table is accessed by most phases of a compiler, beginning with lexical analysis, and continuing through optimization.
A compiler may use one large symbol table for all symbols or use separated, hierarchical symbol tables for different scopes.
A common data structure used to implement symbol tables is the hash table. The time for searching in hash tables is independent of the number of elements stored in the table, so it is efficient for a large number of elements. It also simplifies the classification of literals in tabular format.
The lexical analyser spends a great proportion of its time looking up the symbol table, this activity has a crucial effect on the overall speed of the computer. A symbol table must be organised in such a way that entries can be found as quick as possible. Hash tables are used to organise a symbol table, where the keyword or identifier is 'hashed' to produce an array subscript. Collisions are inevitable in a hash table, and a common way of handling them is to store the synonym in the next available free space in the table.
Applications
An object file will contain a symbol table of the identifiers it contains that are externally visible. During the linking of different object files, a linker will identify and resolve these symbol references.
While reverse engineering an executable, many tools refer to the symbol table to check what addresses have been assigned to global variables and known functions. If the symbol table has been stripped or cleaned out before being converted into an executable, tools will find it harder to determine addresses or understand anything about the program.
At that time of accessing variables and allocating memory dynamically, a compiler should perform many works and as such the extended stack model requires the symbol table.
Example
Consider the following program written in C:
A C compiler that parses this code will contain at least the following symbol table entries:
In addition, the symbol table will also contain entries generated by the compiler for intermediate expression values (e.g., the expression that casts the i
loop variable into a double
, and the return value of the call to function bar()
), statement labels, and so forth.
Example: SysV ABI
An example of a symbol table can be found in the SysV Application Binary Interface (ABI) specification, which mandates how symbols are to be laid out in a binary file, so that different compilers, linkers and loaders can all consistently find and work with the symbols in a compiled object.
The SysV ABI is implemented in the GNU binutils' nm utility. This format uses a sorted memory address field, a "The symbol type" field, and a symbol identifier (called "Name").
One entry is a data symbol, denoted by the type "D". Many functions, including both user-defined functions and library functions are also present.
Example: the Python symbol table
The Python programming language includes extensive support for creating and manipulating symbol tables. Properties that can be queried include whether a given symbol is a free variable or a bound variable, whether it is block scope or global scope, whether it is imported, and what namespace it belongs to.
Example: Dynamic symbol tables
Some programming languages allow the symbol table to be manipulated at run-time, so that symbols can be added at any time. Racket is an example of such a language.
Both the LISP and the Scheme programming languages allow arbitrary, generic properties to be associated with each symbol.
The Prolog programming language is essentially a symbol-table manipulation language; symbols are called atoms, and the relationships between symbols can be reasoned over. Similarly, OpenCog provides a dynamic symbol table, called the atomspace, which is used for knowledge representation.
See also
- Debug symbol
References
Source of the article : Wikipedia