Writing a simple Compiler on my own - Action Rules for Function Declarations (part 1) [C][Flex][Bison]





[Custom Thumbnail]

All the Code of the series can be found at the Github repository:
https://github.com/drifter1/compiler


Introduction

    Hello it's a me again @drifter1! Today we continue with my Compiler Series, a series where we implement a complete compiler for a simple C-like language by using the C-tools Flex and Bison. In this article we will be writing the needed Action Rules for Function Declarations. This topic came out as an enormous article and so I will split it into 2 parts. Today's article will mostly be about the preparation of the AST structures, whilst the next article (tomorrow) will contain most of the actual use of today's code in the parser. The next article will also contain a compiler validation using examples. :)

More specifically, the topics that we will cover today are:

  1. Small Tweaks and Fixes
  2. Visualizing the Problem
  3. The new AST nodes and creation functions
  4. Action Rules for Declarations

Requirements:

    Actually you need to read and understand all the topics that I covered in the series as a whole, as these articles will give you access to knowledge about:
  • What Compiler Design is (mainly the steps)
  • For which exact Language the Compiler is build for (Tokens and Grammar)
  • How to use Flex and Bison
  • How to implement a lexer and parser for the language using those tools
  • What the Symbol Table is and how we implement it
  • How we combine Flex and Bison together
  • How we can pass information from the Lexer to the Parser
  • How we define operator priorities, precedencies and associativity
  • What Semantic Analysis is (Attributes, SDT etc.)
  • How we do the so called "Scope Resolution"
  • How we declare types and check the type inter-compatibility for different cases, including function parameters
  • How we check function calls later on using a "Revisit queue", as function declarations mostly happen after functions get used (in our Language)
  • Intermediate Code Representations, including AST's
  • How we implement an AST (structure and management)
  • Action Rules for other cases

Difficulty:

Talking about the series in general this series can be rated:
  • Intermediate to Advanced
Today's topic(s) can be rated:
  • Medium
So, without further ado, let's now finally start with the actual Tutorial...

Actual Tutorial Content

Small Tweaks and Fixes

    There are some things that I miss-typed or forgot to change after copy-pasting, and also some other small things that I changed while adding code related to function declarations.

A small change in the "symtab.c" function insert

    A symbol table item defined in "symtab.h" stores the length of the stored entry in "st_size". But, we never really set the value of that entry inside of the function "insert". This might not be that important, but still worth noting as I did change it to be able to access the length of "st_name" without the use of the "strings.h" function "strlen()", which makes code smaller in some cases...

So, a snipper of the code looks such as:
...
l = (list_t*) malloc(sizeof(list_t)); strncpy(l->st_name, name, len); l->st_size = len;
...

Wrong type in creation function of Return_Node

    While running the compiler I had a Segmentation Fault Error! I kept searching and searching for what caused the error. It really drove me insane! BUT, after putting lots and lots of console messages in each point of the new code (always do that!!) I found out that the problem must be caused by the return node somehow. Well, after taking a look at the creation function I saw that we used "FUNC_DECL" as the type of this node. This had to be some copy-paste error, as I kept copy-pasting to write all the functions faster. Well, either way the correct type that I've set now is "RETURN_NODE".

Changes in printing and traversal

    While writing the code I also made some changes in the messages that get printed out using the functions "ast_print_node" and "ast_traversal". I'm not talking about the new nodes that we will talk about later on, but about changes in the previous nodes :)

    I don't know every single change that I made, so just go to GitHub to see the changes for yourself in "ast.c". By the way, the code that I've uploaded on GitHub is about both parts. The changes in "ast.c" will be shown today tho..

After covering these small tweaks I made, let's now get into today's problem.

Visualizing the Problem

    Our program is build up of declarations and statements of a main function (that we somewhat took care of already), but can also contain optional functions, with their own declarations and statements. A simple visualization of that looks as following:


    So, the topic of this article are these optional functions. But, before getting into them, we should first talk about the declarations in general, which of course are also used in functions. In previous articles we already defined a node that stores multiple statements, but we never really did that for declarations! Similar to statements, we have to define a new node that will store multiple declaration nodes that occur in a specific part of the program, which can be the main or any optional function. Knowing that the rule "declarations" is left-recursive we will use a similar concept to statements, where we take the information stored in the previous "declarations"-rule and add the newly found "declaration" to it. When there's no other declaration yet, we will just add this new declaration. A visualization of that is:


    In the same way the function declarations can also be of any count, including zero. So, we will have to define a AST node that will store multiple function declarations as they occur. Here's a visualization of that:


    So, how exactly is a function declaration build up? Well, the rule "function" is split up into two parts: "function_head" and "function_tail". The first part contains the return type, identifier (function name) and parameters, whilst the second part contains the "actual" code of the function, which includes optional declarations, optional statements and an optional "return_node". But, how can we access the information about the newly "forming" function declaration node? Similar to other times we will have to define a global variable that stores a temporary function declaration node, so that we can set the entries of that function, while they occur. A visualization of that is:


    Taking a look at this visualization we can see that we will have to define nodes for the return type and parameters. The rest has already been done in previous articles. The return type will store the datatype and if we have a pointer or not. The parameters node will store Param structures and the count of them (num_of_pars), which means that a parameter can be easily defined by an entry "Param par" inside of the YYSTYPE union. We also created the function "def_param", which creates a parameter, taking a type and name (and also if it's being referenced or not, which will be used in function call parameters). Visualizing parameters we can can see that:


So, after visualizing all these structures let's now write them down!

The new AST nodes and creation functions

    Adding these new nodes we now end up with these somewhat final Node_Type snippet:
typedef enum Node_Type {
    BASIC_NODE,   // no special usage (for roots only)
    // declarations
    DECLARATIONS, // declarations
    DECL_NODE,    // declaration
...
// functions FUNC_DECLS, // function declarations FUNC_DECL, // function declaration RET_TYPE, // function return type DECL_PARAMS, // function parameters RETURN_NODE, // return statement of functions }Node_Type;

Let's get into the new nodes and their functions!

AST Node of Declarations

    A "declarations"-node needs to have a dynamic array of "declaration"-node pointers (type-casted to AST_Node of course) and also the count of these declarations (declaration_count). So, this structure is defined as following:
typedef struct AST_Node_Declarations{
    enum Node_Type type; // node type
// declarations struct AST_Node **declarations; int declaration_count; }AST_Node_Declarations;

    The creation function of this type will take the previous declarations and declaration_count of another "declarations"-node (or NULL and 0 for a single "declaration") that stores the declarations that occurred up to this points and the newly occurred "declaration" that has to be inserted. So, the two files of the AST structure now contain:
ast.h:
...
AST_Node *new_declarations_node(AST_Node **declarations, int declaration_count, AST_Node *declaration);
...
--------------------------------------------------------------------------------------------------------
ast.c:
...
AST_Node *new_declarations_node(AST_Node **declarations, int declaration_count, AST_Node *declaration){ // allocate memory AST_Node_Declarations *v = malloc (sizeof (AST_Node_Declarations));
// set node type v->type = DECLARATIONS;
// first declaration if(declarations == NULL){ declarations = (AST_Node**) malloc (sizeof (AST_Node*)); declarations[0] = declaration; declaration_count = 1; } // add new declaration else{ declarations = (AST_Node**) realloc (declarations, (declaration_count + 1) * sizeof (AST_Node*)); declarations[declaration_count] = declaration; declaration_count++; }
// set entries v->declarations = declarations; v->declaration_count = declaration_count;
// return type-casted result return (struct AST_Node *) v; }
...

AST Node of Function Declarations

    Similarly, the structure that stores multiple function declarations looks as following:
typedef struct AST_Node_Func_Declarations{
    enum Node_Type type; // node type
// declarations struct AST_Node **func_declarations; int func_declaration_count; }AST_Node_Func_Declarations;

    The code with which we create such a node is also similar! Just add the newly found function declaration to the previously found function declarations of the "function declarations" node, by also incrementing the count. So, the code looks as following:
ast.h:
...
AST_Node *new_func_declarations_node(AST_Node **func_declarations, int func_declaration_count, AST_Node *func_declaration);
...
----------------------------------------------------------------------------------------------------------------------------
ast.c:
...
AST_Node *new_func_declarations_node(AST_Node **func_declarations, int func_declaration_count, AST_Node *func_declaration){ // allocate memory AST_Node_Func_Declarations *v = malloc (sizeof (AST_Node_Func_Declarations));
// set node type v->type = FUNC_DECLS;
// first declaration if(func_declarations == NULL){ func_declarations = (AST_Node**) malloc (sizeof (AST_Node*)); func_declarations[0] = func_declaration; func_declaration_count = 1; } // add new declaration else{ func_declarations = (AST_Node**) realloc (func_declarations, (func_declaration_count + 1) * sizeof (AST_Node*)); func_declarations[func_declaration_count] = func_declaration; func_declaration_count++; }
// set entries v->func_declarations = func_declarations; v->func_declaration_count = func_declaration_count;
// return type-casted result return (struct AST_Node *) v; }
...

AST Node of a Function Declaration

    The already defined function declaration node needs to be extended to be able to store if the return type is a pointer or not, and should also be able to include the declarations, statements and return_node AST nodes of the function tail. So, in the end the structure looks as following:
typedef struct AST_Node_Func_Decl{
    enum Node_Type type; // node type
// return type int ret_type;
// is pointer or not int pointer; // 0: not pointer, 1: pointer
// symbol table entry list_t *entry;
// declarations, statements and return struct AST_Node *declarations; struct AST_Node *statements; struct AST_Node *return_node; }AST_Node_Func_Decl;

    The creation function will only contain the first 3 entries, as the others will be managed later on! There's no need of any complicated behavior. We just the entries. So, the node creation code of this is:
ast.h:
...
AST_Node *new_ast_func_decl_node(int ret_type, int pointer, list_t *entry);
...
----------------------------------------------------------------------------
ast.c:
...
AST_Node *new_ast_func_decl_node(int ret_type, int pointer, list_t *entry){ // allocate memory AST_Node_Func_Decl *v = malloc (sizeof (AST_Node_Func_Decl));
// set entries v->type = FUNC_DECL; v->ret_type = ret_type; v->pointer = pointer; v->entry = entry;
// return type-casted result return (struct AST_Node *) v; }
...

AST Node of the Return Type

    As already told before a return type node has to store the actual return data type and if it's a pointer or not. So, the structure becomes:
typedef struct AST_Node_Ret_Type{
    enum Node_Type type; // node type
// return type int ret_type;
// is pointer or not int pointer; // 0: not pointer, 1: pointer }AST_Node_Ret_Type;

The creation function is also fairly simple and looks as following:
ast.h:
...
AST_Node *new_ast_ret_type_node(int ret_type, int pointer);
...
----------------------------------------------------------------
ast.c:
...
AST_Node *new_ast_ret_type_node(int ret_type, int pointer){ // allocate memory AST_Node_Ret_Type *v = malloc (sizeof (AST_Node_Ret_Type));
// set entries v->type = RET_TYPE; v->ret_type = ret_type; v->pointer = pointer;
// return type-casted result return (struct AST_Node *) v; }
...

AST Node for Function Parameter Declarations

    Function parameter declarations are again using the same concept as before with declarations. So, a function declaration parameters node has to store a dynamic array of parameters, by also storing the count of elements in it (num_of_pars). The structure is:
typedef struct AST_Node_Decl_Params{
    enum Node_Type type; // node type
// parameters Param *parameters; int num_of_pars; }AST_Node_Decl_Params;

    The code with which we create such a node is also similar to declarations and function declarations, taking the parameters array and parameter count from previous function declarations parameters rules and just adding the newly found parameter to this array. So, the code looks as following:
ast.h:
...
AST_Node *new_ast_decl_params_node(Param *parameters, int num_of_pars, Param param);
...
-----------------------------------------------------------------------------------
ast.c:
...
AST_Node *new_ast_decl_params_node(Param *parameters, int num_of_pars, Param param){ // allocate memory AST_Node_Decl_Params *v = malloc (sizeof (AST_Node_Decl_Params));
// set node type v->type = DECL_PARAMS;
// first declaration if(parameters == NULL){ parameters = (Param*) malloc (sizeof (Param)); parameters[0] = param; num_of_pars = 1; } // add new declaration else{ parameters = (Param*) realloc (parameters, (num_of_pars + 1) * sizeof (Param)); parameters[num_of_pars] = param; num_of_pars++; }
// set entries v->parameters = parameters; v->num_of_pars = num_of_pars;
// return type-casted result return (struct AST_Node *) v; }
...

Non-terminal type definitions

    Adding all these new node types and functions, and knowing which non-terminals will be used in our rules later on, we can also define the following new non-terminal types:

%type <node> functions_optional functions function
%type <node> parameters_optional parameters
%type <par>  parameter
%type <node> return_type

Action Rules for Declarations

    So, before finishing of today's part, let's also get into the action rules for the declarations. By already creating the declaration in previous articles and talking about the creation function today, we can easily use this function as following:

declarations: 
  declarations declaration
  {
    AST_Node_Declarations *temp = (AST_Node_Declarations*) $1;
    $$ = new_declarations_node(temp->declarations, temp->declaration_count, $2);
  }
  | declaration
  {
    $$ = new_declarations_node(NULL, 0, $1);
  }
;

    Pretty simple right? We are just type-casting back to the correct type to be able to access the array and counter, so that we can pass them as parameters of the function, along the declaration that's "to be added". In the second sub-rule we put NULL and 0, as there's no other declaration yet!

RESOURCES

References:

No references, just using code that I implemented in my previous articles.

Images:

All of the images are custom-made!

Previous parts of the series


Final words | Next up on the project

     And this is actually it for today's post! I hope that I explained everything as much as I needed to, meaning that you learned something out of it.
Next up on this series are:
  • Semantic analysis (using more action rules in Bison)
  • Intermediate Code generation (using the AST structure for more cases)
  • Machine Code generation (MIPS Assembly)
     Which are all topics that will need more than one article to complete. Also, note that we might also get into Optimizations later on, or could even extend the Language by adding complex datatypes (structs and unions), more rules etc.
So, see ya next time!

GitHub Account:

https://github.com/drifter1

Keep on drifting! ;)

H2
H3
H4
3 columns
2 columns
1 column
4 Comments
Ecency