[Custom Thumbnail]

All the Code of the series can be found at the Github repository:
https://github.com/drifter1/compiler

Introduction

Hello it's a me again @drifter1! Today we continue with my Compiler Series, a series where we implement a complete compiler for a simple C-like language by using the C-tools Flex and Bison. In this article we will check if the types of the expression and identifier in an assignment statement are compatible with each other or not. Furthermore, we will also simplify some rules and tweak other things. One of those things is a "sneak-peak" of the next articles :)

The topics that we will cover today are:

Simplify pointer and array rules
Advanced type error message
Assignment type check
Some stuff around parameters

Requirements:

Actually you need to read and understand all the topics that I covered in the series as a whole, as these articles will give you access to knowledge about:

What Compiler Design is (mainly the steps)
For which exact Language the Compiler is build for (Tokens and Grammar)
How to use Flex and Bison
How to implement a lexer and parser for the language using those tools
What the Symbol Table is and how we implement it
How we combine Flex and Bison together
How we can pass information from the Lexer to the Parser
How we define operator priorities, precedencies and associativity
What Semantic Analysis is (Attributes, SDT etc.)
How we do the so called "Scope Resolution"
How we declare types and check the type inter-compatibility for different cases, including function parameters
How we check function calls later on using a "Revisit queue", as function declarations mostly happen after functions get used (in our Language)
Intermediate Code Representations, including AST's
How we implement an AST (structure and management)
Action Rules for AST nodes and more

Difficulty:

Talking about the series in general this series can be rated:

Intermediate to Advanced

Today's topic(s) can be rated:

Medium

So, without further ado, let's now finally start with the actual Tutorial...

Actual Tutorial Content

Simplify pointer and array rules

First of all, I would like to point out that I removed the DOT token (.) from the lexer, as it's not used right now. We might add it back later on, but right now it just gives us an "unused" warning. Also, the pointer and array rules from the grammar that we have right now, allow multiple pointers and multi-dimensional arrays. But, managing those in the symbol table structure and generating Assembly code is not that easily. So, to make the whole process easier for you to understand, and most importantly easier to code, I will only allow single-pointers and one-dimensional arrays. This means that the rules for them change into:

pointer: MULOP ; /* for now only single pointers! */

array: /* for now only one-dimensional arrays */
  LBRACK expression RBRACK /* can only be used in expressions */
  {
    // if declaration then error!
    if(declare == 1){
      fprintf(stderr, "Array declaration at %d contains expression!\n", lineno);
      exit(1);
    }
  }
  | LBRACK ICONST RBRACK
  {
    // set array_size for declaration purposes
    $$ = $2.ival;
  }
;

That way the actual structure that we have for storing such information (symbol table entry) makes much more sense now! We could also add new entries for such things, but why make things so complicated from the beginning? We can always add such behavior later on...

Advanced type error message

Next up is the "type_error" function that we defined during Semantics Analysis and so inside of the "semantics.c" file. This function get's triggered from the "get_result_type" function, whenever there is a type conflict. To help us distinguish the error that we have, we also added an error message. But this error message just prints out numbers, and remembering all the data and operator type integer mappings is quite difficult! So, let's make it print out the actual names of the data and operator types.

In code this looks as following:

void type_error(int type_1, int type_2, int op_type){
  fprintf(stderr, "Type conflict between ");
  /* first type */
  if      (type_1 == INT_TYPE)           fprintf(stderr,"%s","int");
  else if (type_1 == REAL_TYPE)          fprintf(stderr,"%s","real");
  else if (type_1 == CHAR_TYPE)          fprintf(stderr,"%s","char");
  else                                   fprintf(stderr,"%s","other");

  fprintf(stderr, " and ");

  /* second type */
  if      (type_2 == INT_TYPE)           fprintf(stderr,"%s","int");
  else if (type_2 == REAL_TYPE)          fprintf(stderr,"%s","real");
  else if (type_2 == CHAR_TYPE)          fprintf(stderr,"%s","char");
  else                                   fprintf(stderr,"%s","other");

  /* operator */
  fprintf(stderr," using op type ");
  switch(op_type){
    case NONE:
      fprintf(stderr,"%s","NONE");
      break;
    case ARITHM_OP:
      fprintf(stderr,"%s","ARITHM_OP");
      break;
    case INCR_OP:
      fprintf(stderr,"%s","INCR_OP");
      break;
    case BOOL_OP:
      fprintf(stderr,"%s","BOOL_OP");
      break;
    case NOT_OP:
      fprintf(stderr,"%s","NOT_OP");
      break;
    case REL_OP:
      fprintf(stderr,"%s","REL_OP");
      break;
    case EQU_OP:
      fprintf(stderr,"%s","EQU_OP");
      break;
    default: 
      fprintf(stderr, "Error in operator selection!\n");
      exit(1);  
  }

  /* line */
  fprintf(stderr, " in line %d\n", lineno);

  exit(1);
}

Consider the following code snippet:

...

int i;
i = 5.5;

...

Running the compiler now, when having a type conflict, would give us:

Type error cause we can't assign (operator NONE) a real to an integer!

Assignment type check

Having all the needed functions already implemented in previous articles/parts of the series, we just have to call the "get_result_type" function with the correct arguments:

The variable datatype can be found using the "get_type" function.
The expression datatype can be found with the "expression_data_type" function
The operator is NONE, which corresponds to assignments and so type checks only

Knowing all that we just have to add the following code to the assignment-rule:

assigment: var_ref ASSIGN expression
{
  AST_Node_Ref *temp = (AST_Node_Ref*) $1;
  $$ = new_ast_assign_node(temp->entry, temp->ref, $3);

  // check assignment semantics
  get_result_type(
    get_type(temp->entry->st_name), /* variable datatype */
    expression_data_type($3),       /* expression datatype */
    NONE  /* checking compatibility only (no operator) */
  );
}
;

Running this function with a NONE argument, will make it check if the first datatype can get a value of the second datatype. That way we know if the expression can be assigned as a value of the identifier. If not, then we will get an error message (like the one that we saw earlier).

Some stuff around parameters

The stuff that we covered up to now is actually all that I have to do for today's topic! But, as that would be just to little, let's now also dive into some other stuff, to prepare for the next big topic of function call semantics. In this topic we have to:

check if the calling parameters are compatible with the formal (declaration) parameters (which includes checking the parameter count)
set the return type of the function, so that we can check if the function can be used in an expression, assignment etc.

All this means that we will have to do a lot of stuff in the "Revisit queue" structure. Also, a lot of semantic checking will have to be delayed up to the point where we have the needed information (mainly datatype).
Let's first add some new entries to the revisit queue entry. To check parameters and to be able to tweak the datatype of the symbol table entry/item we have to add:

a "list_t" entry
an array of parameter datatype ("par_types"
the number of parameters ("num_of_pars")

So, the new structure in code looks like this now:

typedef struct revisit_queue{
  // symbol table entry
  list_t *entry;

  // name of identifier
  char *st_name;

  // type of revisit
  int revisit_type;

  // parameters
  int *par_types;
  int num_of_pars;

  // maybe additional information to simplify the process ...
   
  struct revisit_queue *next;
}revisit_queue;

The first of those newly added entries can be accessed easily during the "entry" creation inside of the "insert" function of the Symbol Table. This makes us think that the "add_to_queue" function of the Revisit Queue structure, should contain such a parameter. In the same way, to be able to access the queue entry easier, we should also create a search function that gives back the entry without removing it. This will become useful when we actually want to add the information of parameter data-types ("par_types") and the parameter count ("num_of_pars") to this entry. That way we now end up with the following new functions:

symtab.h:

...

void add_to_queue(list_t *entry, char *name, int type);
revisit_queue *search_queue(char *name);

...

----------------------------------------------------------------------------

symtab.c:

...

void add_to_queue(list_t *entry, char *name, int type){
    revisit_queue *q;

    /* queue is empty */
    if(queue == NULL){
        /* set up entry */
        q = (revisit_queue*) malloc(sizeof(revisit_queue));
        q->entry = entry;
        q->st_name = name;
        q->revisit_type = type;
        q->next = NULL;

        /* q "becomes" the queue */
        queue = q;
    }
    /* queue not empty */
    else{
        /* find last element */
        q = queue;
        while(q->next != NULL) q = q->next;

        /* add element to the end */
        q->next = (revisit_queue*) malloc(sizeof(revisit_queue));
        q->next->entry = entry;
        q->next->st_name = name;
        q->next->revisit_type = type;
        q->next->next = NULL;
    }       
}

revisit_queue *search_queue(char *name){    
    revisit_queue *q;
    /* search for the entry */
    q = queue;
    while( strcmp(q->st_name, name) != 0 ) q = q->next;

    return q;
}

...

Doing that we of course have to change the "insert"-function, as it uses the "add_to_queue" function. That's pretty easily actually, as we just have to tweak the following line:
add_to_queue(l, l->st_name, PARAM_CHECK);
which is line 68 of the "symtab.c" file.

The "search_queue" function can be used in the "function_call"-rule, to be able to set the two other new entries of the revisit queue entry. This function gives back the queue entry and so we can then easily set the number of parameters, allocate memory for the array, and get the expression data type of each parameter, passing it to each corresponding "par_types" index. In code that looks as following:

function_call: ID LPAREN call_params RPAREN
{
  AST_Node_Call_Params *temp = (AST_Node_Call_Params*) $3;
  $$ = new_ast_func_call_node($1, temp->params, temp->num_of_pars); 

  /* add information to revisit queue entry (if one exists) */
  revisit_queue *q = search_queue($1->st_name);
  if(q != NULL){
    q->num_of_pars = temp->num_of_pars;
    q->par_types = (int*) malloc(temp->num_of_pars * sizeof(int));
    /* get the types of the parameters */
    int i;
    for(i = 0; i < temp->num_of_pars; i++){
      /* get datatype of parameter-expression */
      q->par_types[i] = expression_data_type(temp->params[i]);
    }
  } 
}
;

RESOURCES

References:

No references, just using code that I implemented in my previous articles.

Images:

All of the images are custom-made!

Previous parts of the series

Introduction -> What is a compiler, what you have to know and what you will learn
A simple C Language -> Simplified C, comparison with C, tokens, basic structure
Lexical Analysis using Flex -> Theory, Regular Expressions, Flex, Lexer
Symbol Table (basic structure) ->Why Symbol Tables, Basic Implementation
Using Symbol Table in the Lexer -> Flex and Symbol Table combination
Syntax Analysis Theory -> Syntax Analysis, grammar types and parsing
Bison basics -> Bison tutorial actually
Creating a grammar for our Language -> Grammar and first Parser
Combine Flex and Bison -> lexer and parser combined
Passing information from Lexer to Parser -> Bug Fix, yylval variable, YYSTYPE union, Setting up the Parser, Passing information "directly" and through the symbol table (special case) with examples.
Finishing Off The Grammar/Parser (part 1) -> Adding the missing grammar rules, small fixes
Finishing Off The Grammar/Parser (part 2) -> Operator priorities, precedencies and associativity, complete example file (for testing), grammar rule visualization, finish off grammar/parser
Semantic Analysis Theory -> What is Semantic Analysis about, CSG and CSL, Attribute Grammars, Syntax Directed Translations (SDT)
Semantics Examples -> Visualization of Semantics for different rules/cases, needed attributes, what we have to implement from now on
Scope Resolution using the Symbol Table -> What we have now, scope resolution, integration, compiler output
Type Declaration and Checking -> Type declaration and type checking code, "simple" code integration
Function Semantics (part 1) -> Code for parameter definitions, function declarations and parameter compatibility checking
Function Semantics (part 2) -> Revisit queue concept, basic implementation, inserting undeclared identifiers
Abstract Syntax Tree Principle -> Intermediate code generation and representations, how to design an AST
Abstract Syntax Tree Structure -> Node, operator and value types, AST node structures, some management functions, integration with the rest of the compiler
Abstract Syntax Tree Management -> Tweaking Nodes, Node creation and AST traversal functions, "manual" testing of the AST
Action Rules for Declarations and Initializations -> Replacing many values with a Value union, Other smaller changes, Action rules for declarations, Action rules for initializations
Action Rules for Expressions -> Separating operator tokens, Action rules for expressions, running the compiler
Action Rules for Assignments and Simple Statements -> Action rules for simple statements, Action rules for assignments, running the compiler
Action Rules for If-Else Statements -> Statements-node, Action rules for “simple” and “complicated” if-else statements, running the compiler
Action Rules for Loop Statements and some Fixes -> Some fixes, Action rules of while statements, Action rules of for statements, running the compiler
Action Rules for Function Declarations (part 1) -> Small Tweaks and Fixes, Visualizing the Problem, The new AST nodes and creation functions, Action Rules for Declarations
Action Rules for Function Declarations (part 2) -> Action Rules for Function declarations, Action Rules for a Function declaration Action Rules for Parameters, Traversing after parsing is done using "program", Compiler validation using examples
Action Rules for Function Calls -> Visualizing the Problem,AST nodes and creation functions, Action Rules for Function Calls, Running the compiler
Datatype attribute for Expressions -> Visualizing the Problem, Parameter datatype, Changes in AST Nodes and creation functions, Running the compiler

Final words | Next up on the project

And this is actually it for today's post! I hope that I explained everything as much as I needed to, meaning that you learned something out of it.
Next up on this series are:

Semantic analysis (using even more action rules in Bison)
Machine Code generation (MIPS Assembly)

Which are all topics that will need more than one article to complete. Also, note that we might also get into Optimizations later on, or could even extend the Language by adding complex datatypes (structs and unions), more rules etc.
So, see ya next time!

GitHub Account:

https://github.com/drifter1

Keep on drifting! ;)

Writing a simple Compiler on my own - Type Checking for Assignments [C][Flex][Bison]