Writing a simple Compiler on my own - Function Semantics (part 2) [C][Flex][Bison]

[Custom Thumbnail]

All the Code of the series can be found at the Github repository:


    Hello it's a me again @drifter1! Today we continue with my Compiler Series, a series where we implement a complete compiler for a simple C-like language by using the C-tools Flex and Bison. In this article we will talk about a way that let's us check the parameters of function call after the function get's declared.

The topics that we will cover today are:

  1. Revisit Queue Concept
  2. Revisit Queue Core Implementation
  3. Inserting undeclared variables/identifiers in that queue
  4. Run the compiler for the full example and discuss the results


    Actually you need to read and understand all the topics that I covered in the series as a whole, as these articles will give you access to knowledge about:
  • What Compiler Design is (mainly the steps)
  • For which exact Language the Compiler is build for (Tokens and Grammar)
  • How to use Flex and Bison
  • How to implement a lexer and parser for the language using those tools
  • What the Symbol Table is and how we implement it
  • How we combine Flex and Bison together
  • How we can pass information from the Lexer to the Parser
  • How we define operator priorities, precedencies and associativity
  • What Semantic Analysis is (Attributes, SDT etc.)
  • How we do the so called "Scope Resolution"
  • How we declare types and check the type inter-compatibility for different cases, including function parameters


Talking about the series in general this series can be rated:
  • Intermediate to Advanced
Today's topic(s) can be rated:
  • Medium
So, without further ado, let's now finally start with the actual Tutorial...

Actual Tutorial Content

Revisit Queue Concept

    The language for which we are implementing our compiler has one very difficult aspect: "Functions get declared in the end". So, we have a problem when calling functions before they are declared, cause we don't know what the function returns and also don't know if the parameters are actually compatible! The first one is quite difficult, cause we will have to suspend assignment-compatibility checks, as we don't know the type of value that comes from the function call. This is something that we will do later on when we are doing stuff on the "actual" compiler. The second thing is what we will try to prepare today.

    So, what's the concept of all that I'm trying to discuss and implement today? Well, I already explained it sort of. Whenever something is not declared yet we will not throw off an "declaration error" directly, but will insert that identifier in a so called revisit queue. When the identifier that corresponds to that entry in the revisit queue get's declared we will then do some Backtracking actions to set up that entry correctly by also feeding that information to the expressions and assignments where it belongs to. BUT, this will only be allowed and checked for function calls (for now), meaning that anything that's not a function, and so some undeclared variable, will remain in that queue even after the whole parsing procedure is finished. This will show us that some variables are really undeclared, meaning that we now have an "Variable not declared error"!I don't included any revisit action yet and so will just output a warning so that we know that this is working right!

So, let's now get into the actual implementation!

Revisit Queue Core Implementation

    This revisit queue will of course be some kind of structure. As the name already says I will implement it as a queue, inserting new identifiers in the end and taking identifiers starting from the beginning. Of course each revisit will search for it's corresponding identifier and so the actual structure might be irrelevant in the end. Each entry of that queue will store the name of the identifier, so that we are able to lookup the entry in the symbol table, and also the type of revisit in a form of a integer code that get's defined in the same way that we defined data types, parameter passing types etc. Afterwards we might include additional information.
So, the struct for the queue is:

typedef struct revisit_queue{
    // name of identifier
    char *st_name;
// type of revisit int revisit_type;
// maybe additional information to simplify the process ...
struct revisit_queue *next; }revisit_queue;

    Similar to the symbol table this structure will be declared as a global static variable in the "symtab.h" header file of the symbol table. So, the declaration code of this structure is:

static revisit_queue *queue;

    Before starting the parsing procedure we should also initialize this queue-pointer to NULL, so that we know that the structure is empty. This is also something that we have to check after the procedure is done so that we know if there are still entries (undeclared identifiers) in the queue. So, the parser should contain the lines:

// initialize revisit queue
queue = NULL;
... parsing procedure ...
if(queue != NULL){ printf("Warning: Something has not been checked in the revisit queue!\n"); } ...

So, what are the functions that we have to include to manage such a structure? It's actually very simple. We only have to add entries to the queue when finding an undeclared identifier, remove entries or better said revisit them when they are declared and we might also include a dump file generation similar to the one of the symbol table. So, the functions are:
  • add_to_queue -> for adding identifiers to the revisit queue
  • revisit -> to revisit and remove an entry by also returning a code that tells us if it succeeded or not
  • revisit_dump -> to generate a dump file for the remaining entries
Let's get into each one on it's own!

add_to_queue function

    Right now an entry of the revisit queue needs the name of the identifier and type of revisit, which means that these are also the parameters of the function! We don't need to return anything and so the function will be void. The main purpose of this function is to create and add a new entry. Of course we have two cases for that:
  1. Queue is empty, where we just have to set up an entry and make the queue equal to that entry.
  2. Queue is not empty, where we will have to find the last element and add then new entry after that element
So, the code is:
void Add_to_queue(char *name, int type){
    revisit_queue *q;
/* queue is empty */ if(queue == NULL){ /* set up entry */ q = (revisit_queue*) malloc(sizeof(revisit_queue)); q->st_name = name; q->revisit_type = type; q->next = NULL;
/* q "becomes" the queue */ queue = q; } /* queue not empty */ else{ /* find last element */ q = queue; while(q->next != NULL) q = q->next;
/* add element to the end */ q->next = (revisit_queue*) malloc(sizeof(revisit_queue)); q->next->st_name = name; q->next->revisit_type = type; q->next->next = NULL; } }

Where this function has to be called will be discussed later on in this article :)

Revisit function

    When revisiting an entry we of course have to know which entry we are referring to. This shows us that one very important parameter is the name of the identifier, which is more than enough! The main purpose of this function is doing the corresponding backtracking action for that identifier by also removing the entry afterwards. When an entry was found and taken care of we will return '0' to know that it was successful. By returning '1' we let the caller function know that no entries where found. This is quite important when declaring functions, cause it will show us that an function got declared, but not used yet previously (it might be useless).
    To able to search for the entry and remove it at any position it might be found at, we have to take care of some cases:
  1. Special case of being the first entry, where we do the revisit actions depending on the revisit type and then remove the entry by setting the queue to it's "next"
  2. Simple case, where we just search for the entry pointing to our entry, do the action and then make that "pointing entry", point to the "next" of the found entry
  3. Not being in the queue at all!
So, the code is:
int revisit(char *name){
    revisit_queue *q;
/* special case - first entry */ if( strcmp(queue->st_name, name) == 0 ){
/* revisit entry depending on the type */ switch(queue->revisit_type){ case PARAM_CHECK: /* TO DO: run parameter check */ break; /* ... */ }
/* remove entry by setting queue to "next" */ queue = queue->next;
return 0; // success }
/* search for the entry that points to it */ q = queue; while( strcmp(q->next->st_name, name) != 0 ) q = q->next;
/* check if entry was not found */ if(q == NULL){ return 1; // not found }
/* revisit entry depending on the type */ switch(q->next->revisit_type){ case PARAM_CHECK: /* TO DO: run parameter check */ break; /* ... */ }
/* remove entry by making the previous entry point at */ /* the "next" of the entry that we want to remove */ q->next = q->next->next;
return 0; // success }

    This function will be called when functions get declared, to take care of function calls that has not been checked yet! This actual integration will come in later articles...

Revisit_dump function

    Last but not least we need a function that prints out what remained in the queue. We just have to loop through the queue, print the name of the identifier, and depending on the revisit type also the revisit type, but in text (not an integer value). Very simple process actually.
So, the code is:
void revisit_dump(FILE *of){
    int i;
    revisit_queue *q;
    q = queue;
fprintf(of,"------------ -------------\n"); fprintf(of,"Identifier Revisit Type\n"); fprintf(of,"------------ -------------\n"); while(q != NULL){ fprintf(of, "%-13s", q->st_name); if(q->revisit_type == PARAM_CHECK){ fprintf(of,"%s","Parameter Check"); } // more later on fprintf(of, "\n"); q = q->next; } }

    This will be called after the parsing procedure is done in the same way as the symbol table dump. The code is:
yyout = fopen("revisit_dump.out", "w");

Inserting undeclared identifiers in that queue

    The first thing I thought about was declaring a flag variable similar to the declare variable, but there is one big problem. We can't run an action before finding the ID that easily, which is what happens in function call statements. This shows us that we can't do the same programming trick that we did with declarations. So, to be able to check all this we ended up making all the stuff discussed through-out the article.
    The simplest way of integrating the new revisit queue is by adding it to the correct point of the "insert" function that also takes care of declarations. Knowing when exactly we declare, we should have already taken care of insertions to the symbol table, when we are currently not declaring! So, what does this mean? Well, we already took care of the case of having an entry already inside of the symbol table, but of different scope, by creating a new entry only when currently declaring. In the same way, there is also this case of having no entry at all, that I completely forgotten in the Scope resolution article! So, when there is no entry yet, we should only add an entry when currently declaring. BUT, to take care of function calls, we should add the identifier either way, by also adding the identifier to the revisit queue, whenever we have "declare = 0".
So, the first part of the insert function becomes:
/* variable not yet in table */
if (l == NULL){
    /* check if we are really declaring */
    if(declare == 1){
        /* set up entry */
        l = (list_t*) malloc(sizeof(list_t));
        strncpy(l->st_name, name, len);
        l->st_type = type;
        l->scope = cur_scope;
        l->lines = (RefList*) malloc(sizeof(RefList));
        l->lines->lineno = lineno;
        l->lines->next = NULL;
/* add to hashtable */ l->next = hash_table[hashval]; hash_table[hashval] = l; printf("Inserted %s for the first time with linenumber %d!\n", name, lineno); } else{ /* add it to check it again later */ l = (list_t*) malloc(sizeof(list_t)); strncpy(l->st_name, name, len); l->st_type = type; l->scope = cur_scope; l->lines = (RefList*) malloc(sizeof(RefList)); l->lines->lineno = lineno; l->lines->next = NULL; l->next = hash_table[hashval]; hash_table[hashval] = l; printf("Inserted %s at line %d to check it again later!\n", name, lineno);
/* Adding identifier to the revisit queue! */ add_to_queue(l->st_name, PARAM_CHECK); } }

Full example result discussion

    Running the compiler for "full_example.c" we now get more messages and also a new "revisit_dump.out" file after the procedure is done.
Messages of "check later":

The "queue has entries" warning:

Queue dump file:

    Very interesting indeed! We can see that the function call identifiers have been truly inserted to the revisit queue. Now what remains to do afterwards is to revisit them for parameter checking whenever a function get's declared. Of course the revisit function will be called always, but a result of '1' (as we mentioned earlier) will tell the "caller" function that we had no entries to revisit, meaning that the function will be used later on in another function or be completely useless!

All the code and the dump files are in GitHub, as always :)



    No actual references. Only stuff we discussed through-out the series, but now "implemented"...


All of the images are either custom-made or screenshots!

Previous parts of the series

Final words | Next up on the project

     And this is actually it for today's post! I hope that I explained everything as much as I needed to, meaning that you learned something out of it.
Next up on this series are:
  • Semantic analysis (creating and using even more semantic rules)
  • Intermediate Code generation (Abstract Syntax Tree)
  • Machine Code generation (MIPS Assembly)
     Which are all topics that will need more than one article to complete. Also, note that we might also get into Optimizations later on, or could even extend the Language by adding complex datatypes (structs and unions), more rules etc.
So, see ya next time!

GitHub Account:


Keep on drifting! ;)

3 columns
2 columns
1 column