Final Year Project – Interim Report

 

  “Optical Communications Simulation Software with Windows Interface”

         

 

                                                                   Name:   Patrick O'Halloran

                            

 

 

                                                                             ID:   9637079

                            

 

                                                                    Course:   Computering Engineering

                                     

                                               

                                                Supervisors:     Michael Connelly and John Nelson

                                     

                                     

                                                                            Date:   05/11/99

 

 

 

 

Signature:                                                                                              

 

 

 

 

 

 

 

 

 


FYP Interim Information:

 

Optical Communications Simulation Software with Windows Interface

 

 

Introduction and Project Outline:

 

In Brief there are two main parts to this project, the design and implementation of a translator, which converts Pascal programs into C++ programs, with the translator written in OO C++. The second part of the project is the

Development of a windows interface to allow the user to input simulation parameters, run a simulation, view simulation results graphically and store results in a file. The first objective will be the most difficult to implement. I will try to implement the project without the use of any of the compiler building tools, which are available. The second part of the project will be developed using Borland C++ Builder to create a GUI (Graphical User Interface) which will be integrated with the translator. However the C++ code will have to be compiled separately using a general C++ compiler before any parameters are passed to a graphing package which will also be integrated into the GUI.

 

Background Information:

 

This project is basically stemming from the development of a Compiler and in fact most of the projects background information comes from the design and implementation phases of a Compiler. The only phase which will be different is the code generation section, as instead of conversion to assembly language this compiler based design will hopefully translate syntactically correct Pascal code to Semantically and Syntactically correct C++ code.

The main parts of Compilation are Analysis and Synthesis. The analysis part breaks up the source program into constituent pieces. This then creates an intermediate representation of the program.

The synthesis part constructs the desired target program from the intermediate representation. These two sections (Synthesis and Analysis) are treated as front and back ends of the Compiler respectively. The front end consists of those phases or parts of phases that depend primarily on the source language and are largely independent of the target machine, these normally include lexical and syntactic analysis, the creation of the symbol table, semantic analysis and the generation of intermediate code. The front end can also do a certain amount of code optimization as well. The front end also includes the error handling that goes along with these sections as well.

 

In compiling, Analysis can be segregated into three phases:

 

       Linear Analysis:

This is where the stream of characters representing the source program is read from left to right and grouped into tokens that are sequences of characters having a specific collective meaning i.e. in the case of the Pascal programming language the relational operators are examples of this {<>, <=, >= }, and Keywords of the language can also be symbolized.

 

       Hierarchical Analysis:

Involves the grouping of tokens of the source program into grammatical phrases that are used by the compiler the sythesise the output

 

       Semantic Analysis:

This is where certain checks are performed to ensure that the components of a program fit together meaningfully.

 

       Code Generation:

This section will deal with the actual conversion of the syntactically and semantically correct code being converted to the destination language, using all the information gathered from the previous sections of the Project. The destination Language in this case will be C++.


                                                                 Phases of a Compiler:

                               

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Lexical Analyser:

 

            The lexical analyser is the first phase of a compiler. The basic driver of a lexical analyser is a Finite State Machine with actions appended to each state. The main task of the lexical analyser is to read the input characters of a specified source program and produce as output a sequence of tokens that the parser uses for syntax analysis.

The manner in which it deals with the input is determined by the FSM (Finite State Machine). The FSM is derived from a Grammar representation of the Pascal language[1] which recognises such elements as numbers, identifiers, constants, uses, variables, begin-end blocks, statements, and other program units, as will be defined in more detail later.

 

Source

Program                                                                                                Token

 

                                                                                                                                                                Get next token

 

 

 

 

 

Sometimes lexical analysers are divided into a cascade of two phases, the first called “scanning” and the second “lexical analysis”. Since it is the lexical analyser that reads the source program, it may execute secondary tasks while reading in the source text. During the scanning phase of the lexical analyser white space and comments may be removed from the source text. When we refer to lexical analysis we use the terms “tokens”, “pattern”, and “lexeme” with specific meaning. In general there is a set of strings in the input for which the same token is produced as output. This set of strings is described by a rule called a pattern associated with the token. A lexeme is a sequence of characters in the source program that is matched by the pattern for a token. We treat tokens as terminal symbols in the grammar. IN most programming languages the following are treated as tokens: Keywords, operators, identifiers, constants, literal strings, and punctuation symbols such as parenthesis, comma’s and semicolons. When a character sequence appears in the source program a token representing the type of sequence seen is returned to the parser. The lexical analyser collects information about tokens into their associated attributes, all of which are used in the initial development of a specific data structure, called a symbol table.

 

 

Symbol Table Management:

An essential function of a compiler is to record the identifiers used in the source program and to collect information about the storage allocated for an identifier. The Symbol Table is a data structure containing a record for each identifier. It also contains fields for the attributes of each of those identifiers. The attributes involved consist of its type, it’s scope [2], and in the case of procedure names such things as the number and types of it arguments, the method of passing each argument and the type returned. When the lexical analyser detects an identifier in the source program, the identifier is entered into the symbol table.  However the attributes of an identifier cannot normally be determined during lexical analysis. For example consider the sample Pascal code declaration below:

 

                                                var count, initial, time : integer;

 

The lexical analyser does not know the identifiers are of type integer when they are read in therefore the remaining phases enter information about identifiers into the symbol table, and then use this information in various ways. When doing semantic analysis and intermediate code generation, we need to know what the types of identifiers are, so we check to see if the source program uses them correctly and so we can generate proper operations on them.

A symbol table mechanism must allow us to add new entries and find existing entries efficiently. Hence in the development of this data structure I will have to take into consideration the time required to add entries and to make inquiries to the symbol table. Another consideration in implementing the symbol table will be whether or not it will be generated dynamically or whether it will be set to a fixed size, in which case the size will have to be large enough to handle any source program. With this in mind I will keep my emphasis on a dynamically generated symbol table.

 

 

The Grammar:

One major part in the specification of the front end of this translator will be the definition of a grammar, which naturally describes the hierarchical structure of the complete[3] Pascal programming language. In general a grammar comprises of four components:

       A set of tokens, known as terminal symbols i.e. keywords, begin-end constructs, punctuation constructs.

       A set of non-terminals.

       A set of productions or rules, where each production consists of a non-terminal called the left side of the production, an arrow and a sequence of tokens and non-terminals, called the right side of the production.

       A designation of one of the non-terminals as a start symbol, in Pascal this will more than likely be taken as the token PROGRAM.

 

 

Syntactic Analysis:

Every programming language has rules that dictate the syntactic structure of well-formed programs.

       A grammar gives a precise, yet easy to understand syntactic specification of a programming language.

       From certain classes of grammars we can automatically construct an efficient parser that determines if a source program is syntactically well formed. As an additional benefit the parser construction process can reveal syntactic ambiguities and other difficult to parse constructs that might otherwise go undetected in the initial phase of compiler development.

This project will use a Top Down Parsing technique. This top down construction of a parse tree is done by starting with the root labeled with the starting non-terminal and repeatedly performing the following two steps:

       At node N, labeled with the non-terminal A select one of the productions for A and construct children at N for the symbols on the right side of the production.

       Find the next node at which a subtree is to be constructed.

 

The parser will obtain a string of tokens from the lexical analyser and verify that the string can be generated by the defined grammar for Pascal. The parser should then hopefully report any syntax errors in an intelligible fashion, and recover from any commonly occurring errors so that it can continue processing the remainder of the input.

In general there are a number of tasks that can be conducted during parsing, such as collecting information about various tokens and placing this gathered information in the symbol table and checking for other kinds of semantic errors. The error handling associated with the syntactic analysis stage has set goals:

 

       It should report the presence of errors clearly and accurately.

       It should recover from each error quickly enough to be able to detect subsequent errors

       It should not significantly slow down the processing of correct programs.

 

 

Semantic Analysis:

In general the compiler must check that the source program follows both the syntactic and semantic rules associated with the programming language in question. This checking is called static checking as it is done before the execution of the target program. Static checking ensures that certain kinds of programming errors will be detected and reported. Example of static checks include the following:

       Type checks:

The compiler will report an error when an operator is applied to an incompatible operand, for example, if we tried to add an integer value to a function call.

       Flow of control checks:

An Example of this would be in the use of a break statement in C, where the program breaks the flow of control from the smallest enclosing repetition loop or switch statement, and transfer flow of control the somewhere else, we need to be able to report an error if such an enclosing statement is not present.

There are just two brief examples of static checking many more will need reviewed and considered in the process of writing this translator. For now I will just give a brief overview of the manner in which type checking will be handled. A type checker verifies that the type of a construct matches that expected by the its context.

 

 

                                                                                Syntactic Analyser                 Semantic Analyser                Intermediate code Generator

 


Token stream

 

For example a type checker must verify that a user defined procedure is assigned the correct number of parameter arguments and also the correct type of arguments for which it was defined. When designing a type checker for Pascal I will have to work on the basis of the syntactic structure of the language, and the rules associated with assigning types to language constructs, such examples are detailed below as from Compilers, Principles, Techniques and Tool, By Aho, Sethi and Ullman, Addison-Wesley 1986.

       “ If both operands of the arithmetic operators of addition, subtraction and multiplication are of type integer, then the result is of type integer.”

       The result of the unary & operator is a pointer to the object referred to by the operand. If the type of the operand is “---“ the type of the result is a ‘pointer to ---‘.”

When considering types in a language, we can break the types up into Basic types or constructed types.

Basic Types are said to have no internal structure to them. Examples of basic types include character, integer, real, and Boolean. While constructed types are arrays, records, sets, and pointers. The above information is only a basic outline of the considerations which have to be taken account of when designing a semantic analyser, as I go through this project I’m sure I will be concerned about many more aspects of this phase.

 

Intermediate Code Generation:

In the synthesis model of a compiler, the front end translates a source program into an intermediate representation from which the back end generates the target code. Details of the target code are confined to the back end of the Compiler as far as possible. The reason for this is because the target code to be generated can be changed as long as the front and back ends are kept separate i.e. you can translate directly to a different programming language or a machine dependant language. Plus you can apply a code optimizer between the two sections to produce more efficient code. Syntax trees or postfix notation are two kinds of intermediate representations for programming languages. 

Code Generation:

 

 


Source program                                                                                                                                                                                 

 

 

 

 

 

 

 

The final phase in the Translator will be the actual translation of the code using the code generator. It will take as input the intermediate representation of the source program and produce as output the equivalent target code i.e. C++. The requirements, which will be imposed on the code generator, will be severe. The output code should be efficient and correct.

 

Design and Implementation of GUI for Final Half of Project:

 

At the moment I haven’t determined in detail how I will implement the second part of this project, except for the fact that I will try and generate the GUI in Borland C++ Builder 4. As I have used this on a previous occasion in development of a team project, and found that it was a relatively easy to use. The plan for this section is to create a GUI where the user can change the parameters as needed per the project, using this interface. The user should will then be able to use the built in Graphing program to generate the specific simulations and to save all the data and information produced to disk. The manner in which all this is to be implemented have not been fully determined as of yet, but I hope to have these worked out in the next few weeks.

 

 

 

Resources:

In the development of this project I will need a licensed version of both the Borland C++ compiler 4.5X or higher, Borland C++ Builder 4. I will also need the use of a computer in the Intel lab, preferably with internet access, to gather more information.

 

Timeline for Semester One:

Week /

Objective

Week 1

Week 2

Week 3

Week 4

Week 5

Week 6

Week 7

Week 8-10

Week 11-13

1

 

 

 

 

 

 

 

 

 

2

 

 

 

 

 

 

 

 

 

3

 

 

 

 

 

 

 

 

 

4

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Tasks Involved in Semester One:

Objective 1:

Research on Compiler Construction and Design. Write preliminary paper on understanding of the project.

Objective 2:

                Implementation in C++ of Lexical Analysis Phase of the Translator using EBNF grammar for the Pascal

Language.

Objective 3:

                Extend into Syntactic Analysis Phase and Error detection and Recovery associated with these phases.

Objective 4:

                Extend into Semantic Analysis Phase, which includes type checking and Flow control analysis.

 

Timeline for Semester Two:

 

Week /

Objective

Week 1

Week 2

Week 3

Week 4

Week 5

Week 6

Week 7

Week

8 -9

Week

10 –11

1

 

 

 

 

 

 

 

 

 

2

 

 

 

 

 

 

 

 

 

3

 

 

 

 

 

 

 

 

 

4

 

 

 

 

 

 

 

 

 

 

Tasks Involved in Semester Two:

Objective 1:

Start Intermediate Code Generation Phase of Translator, and Translation to C++. A lot of this will

be done over the 3 week beak after Christmas, the GUI will also have been started in this phase.

Objective 2:

GUI to be implemented, and testing will have started.

Objective 3:

                Final Testing and writing of Report will have commenced. However I plan on keeping the report up to date

 as I go through the whole project.

Objective 4:

                Ensure everything is working correctly, well as close to working as is possible.

                Project Completed!!!

 

Reference:

Alfred V.Aho, Ravi Sethi, and Jeffrey D.Ullman; Compliers : Principles, Techniques and Tool;Addison and Wesley; USA; 1986.

Ronald Mak; Writing Compilers and Interpreters : An applied approach Using C++, Second Ed.;Wiley & Sons; USA; 1996.

Colin Flanagan; CE4717- Language Processors;UL Printroom;Ireland;1999.

 



[1] This Grammar will be defined in more detail, in the next section, it will be defined using an EBNF (Extended Backus Naur Form) technique which is a standard used for defining the syntax of a language.

[2] Where in the program it is valid.

[3] This could be a piped dream as I may not be able to fully implement this step, but I will try to cover as much of the options as possible.