[Custom Thumbnail]
All the Code of the series can be found at the Github repository:
https://github.com/drifter1/compiler
Introduction
Hello it's a me again @drifter1! Today we continue with my Compiler Series, a series where we implement a complete compiler for a simple C-like language by using the C-tools Flex and Bison. In this article we will continue with the implementation of Assignment checking using the Revisit Queue. The actual integration and explanation on examples will be done in the next part!The topics that we will cover today are:
- Revisit Queue changes
- Expression datatype function
Requirements:
Actually you need to read and understand all the topics that I covered in the series as a whole, as these articles will give you access to knowledge about:- What Compiler Design is (mainly the steps)
- For which exact Language the Compiler is build for (Tokens and Grammar)
- How to use Flex and Bison
- How to implement a lexer and parser for the language using those tools
- What the Symbol Table is and how we implement it
- How we combine Flex and Bison together
- How we can pass information from the Lexer to the Parser
- How we define operator priorities, precedencies and associativity
- What Semantic Analysis is (Attributes, SDT etc.)
- How we do the so called "Scope Resolution"
- How we declare types and check the type inter-compatibility for different cases, including function parameters
- How we check function calls later on using a "Revisit queue", as function declarations mostly happen after functions get used (in our Language)
- Intermediate Code Representations, including AST's
- How we implement an AST (structure and management)
- Action Rules for AST nodes and more
Difficulty:
Talking about the series in general this series can be rated:- Intermediate to Advanced
- Medium
Actual Tutorial Content
Revisit Queue changes
Based on the visualization of the previous part, the Revisit Queue structure code has to change into:typedef struct revisit_queue{
// symbol table entry
list_t *entry;
// name of identifier
char *st_name;
// type of revisit
int revisit_type;
// parameters of function calls
int **par_types;
int *num_of_pars;
int num_of_calls;
// assignment expression nodes
void **nodes;
int num_of_assigns;
// maybe additional information to simplify the process ...
struct revisit_queue *next;
}revisit_queue;
You can see that I included some new entries that will be used for the assignment checking. The AST structure can't be used in the Symbol Table and I can't just include the header file, as this causes conflicts in my weird coding structuring! That's why I put the type "void", cause this type allows us to type-cast anything to it..
Let's now get into the function changes!
add_to_queue function
The array of nodes will be managed using a similar principle to the parameter checking (search queue entry and tweak this entry), but this doesn't mean that we can't initialize anything! Let's initialize the counter "num_of_assigns" to 0, so that we can check this value during the parser integration.The new code of the function looks as following:
void add_to_queue(list_t *entry, char *name, int type){
revisit_queue *q;
/* queue is empty */
if(queue == NULL){
/* set up entry */
q = (revisit_queue*) malloc(sizeof(revisit_queue));
q->entry = entry;
q->st_name = name;
q->revisit_type = type;
q->next = NULL;
/* additional info */
if(type == PARAM_CHECK){
q->num_of_calls = 0;
}
else if(type == ASSIGN_CHECK){
q->num_of_assigns = 0;
}
/* q "becomes" the queue */
queue = q;
}
/* queue not empty */
else{
/* find last element */
q = queue;
while(q->next != NULL) q = q->next;
/* add element to the end */
q->next = (revisit_queue*) malloc(sizeof(revisit_queue));
q->next->entry = entry;
q->next->st_name = name;
q->next->revisit_type = type;
q->next->next = NULL;
/* additional info */
if(type == PARAM_CHECK){
q->next->num_of_calls = 0;
}
else if(type == ASSIGN_CHECK){
q->next->num_of_assigns = 0;
}
}
}
revisit function
During the revisit of the assignment check, we will have to perform the type inter-compatibility check for each assignment that the identifier is used in (looping through the nodes array). After performing the check we will have to remove the entry from the revisit queue, like always...The new code looks like this:
int revisit(char *name){
int i, type1, type2;
revisit_queue *q = search_queue(name);
revisit_queue *q2;
if(q == NULL){
return -1; // no entry
}
/* revisit entry depending on the type */
switch(q->revisit_type){
case PARAM_CHECK:
/* run parameter check */
if(!func_param_check(name, q->num_of_calls, q->par_types, q->num_of_pars)){
printf("Successful Parameter Check of %s\n", name);
}
/* remove entry by making it point to it's next */
q2 = search_prev_queue(name);
if(q2 == NULL){ /* special case: first entry */
queue = queue->next;
}
else{
q2->next = q2->next->next;
}
break;
case ASSIGN_CHECK:
/* run assignment check for each assignment */
type1 = get_type(q->entry->st_name);
for(i = 0; i < q->num_of_assigns; i++){
type2 = expression_data_type(q->nodes[i]);
/* perform assignment check */
get_result_type(
type1, /* variable datatype */
type2, /* expression datatype */
NONE /* checking compatibility only (no operator) */
);
}
/* remove entry by making it point to it's next */
q2 = search_prev_queue(name);
if(q2 == NULL){ /* special case: first entry */
queue = queue->next;
}
else{
q2->next = q2->next->next;
}
break;
/* ... */
}
return 0; // success
}
revisit_dump function
Now that we have a new type of revisit we also have to tweak the dump function, so that it prints out that we have to do an assignment check of a specific number of assignments for a specific identifier!In code this looks like this:
void revisit_dump(FILE *of){
int i;
revisit_queue *q;
q = queue;
fprintf(of,"------------ -------------\n");
fprintf(of,"Identifier Revisit Type\n");
fprintf(of,"------------ -------------\n");
while(q != NULL){
fprintf(of, "%-13s", q->st_name);
if(q->revisit_type == PARAM_CHECK){
fprintf(of,"%s","Parameter Check ");
fprintf(of,"for %d function calls",q->num_of_calls);
}
else if(q->revisit_type == ASSIGN_CHECK){
fprintf(of,"%s","Assignment Check ");
fprintf(of,"for %d assignments",q->num_of_assigns);
}
// more later on
fprintf(of, "\n");
q = q->next;
}
}
Expression datatype function
To be able to know if a variable needs to be revisited, talking about assignments only, we can use the already developed function "expression_data_type". Let's first define a new flag variable "cont_revisit" which will get a value of '1', whenever an identifier of a function call has no declared type. This will help us later on in the parser to understand if the assignment check can be performed or if we have to add the assignment into the revisit queue!The flag is defined like that:
int cont_revisit = 0; // 1: contains revisit, 0: not
Talking about the function, we will first have to make sure that we set the datatype of arithmetic, boolean, relational and equality expressions again, during that function! That way the type stored in the "data_type" entry is updated to the correct value. This can be done very easily using the same function that we use for assignment checking, which is "get_result_type", with the correct parameters. For example:
case ARITHM_NODE: /* arithmetic expression */
temp_arithm = (AST_Node_Arithm *) node;
/* set datatype again */
temp_arithm->data_type = get_result_type(
expression_data_type(temp_arithm->left), /* data type of left expression */
expression_data_type(temp_arithm->right), /* data type of right expression */
ARITHM_OP /* operation type */
);
return temp_arithm->data_type;
Talking about function calls, we will do a "dummy return" of the "INT_TYPE", by also setting the flag to '1', whenever the function identifier has no declared type! In code this looks like this:
case FUNC_CALL: /* function call */
temp_func_call = (AST_Node_Func_Call *) node;
/* check if it needs revisit */
if(temp_func_call->entry->st_type == UNDEF){
if(temp_func_call->entry->inf_type == UNDEF){
cont_revisit = 1; /* contains revisit */
return INT_TYPE; /* dummy return */
}
}
return temp_func_call->entry->inf_type; /* return type */
The final part will be about how we use today's code to solve the problem! :)
RESOURCES
References:
No references, just using code that I implemented in my previous articles.Images:
All of the images are custom-made!Previous parts of the series
General Knowledge and Lexical Analysis
- Introduction
- A simple C Language
- Lexical Analysis using Flex
- Symbol Table (basic structure)
- Using Symbol Table in the Lexer
Syntax Analysis
Semantic Analysis (1)
Intermediate Code Generation (AST)
Semantic Analysis (2)
Final words | Next up on the project
And this is actually it for today's post! I hope that I explained everything as much as I needed to, meaning that you learned something out of it.Next up on this series are:
- Semantic analysis (using even more action rules in Bison)
- Machine Code generation (MIPS Assembly)
So, see ya next time!
GitHub Account:
https://github.com/drifter1
Keep on drifting! ;)