Transforming Terraform Variable Types into JSON Schema

Introduction

Several months ago, I embarked on a journey to accomplish a captivating project: investigating how Terraform templates can be parsed, and specifically focusing on parsing Terraform variable types.

My goal was to build a solution that could parse any given Terraform template, including the variable types, and generate an output for a frontend app to dynamically generate forms.

Challenges

Several challenges arose as soon as I started the project. Firstly, I had to find a solution to generate frontend forms dynamically. The solution I settled on was using jsonform, a frontend library that generates forms from JSON schema.

Now that I knew the frontend required a JSON schema to function properly, the second challenge was to create a solution to convert Terraform templates to JSON schema. In my case, I only needed to parse the variables.tf file.

While there are numerous tools available that parse Terraform templates, all of them leave the variable types untouched.

To illustrate the pain point I faced, here's the output of a variable block parsed using python-hcl2:

Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.19.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import hcl2

In [2]: input = """
   ...: variable "project_metadata" {
   ...:     description = "Project metadata"
   ...:     type = object({
   ...:         name = string,
   ...:         id = string,
   ...:         tags = list(string)
   ...:     })
   ...: 
   ...:     validation {
   ...:         condition = substr(var.project_metadata.name, 0, 2) == "p-"
   ...:         error_message = "Project name must be prefixed with 'p-'"
   ...:     }
   ...: }"""

In [3]: output = hcl2.loads(input)

In [4]: output
Out[4]: 
{'variable': [{'project_metadata': {'description': 'Project metadata',
    'type': "${object({'name': '${string}', 'id': '${string}', 'tags': '${list(string)}'})}",
    'validation': [{'condition': '${substr(var.project_metadata.name, 0, 2) == "p-"}',
      'error_message': "Project name must be prefixed with 'p-'"}]}}]}

In [5]: output['variable'][0]["project_metadata"]["type"]
Out[5]: "${object({'name': '${string}', 'id': '${string}', 'tags': '${list(string)}'})}"    # Remains a string

As you can see in the last output, the variable type remains a string. Consequently, I had to find a way to further process the output and convert it to a JSON schema.

Parsing Variable Type

Despite the tiny setback, I was excited that I encountered the challenge. I recalled subjects I had studies back in school that discussed the formal specifications of programming languages, coupled with a book I read not long ago, which also delved into the same subject. The first idea that came up was to create a custom parser.

To build a custom parser, I had to first define the grammar of the language. A quick search brought me to the lark-parser, a library that allows us to create custom parsers in Python and other languages. Their getting started guide is well-written and easy to follow.

The fun part was defining an EBNF grammar. The grammar in Lark is based on EBNF, but with several enhancements that make defining a grammar easier.

Fortunately, Terraform variable type itself is relatively simple. After several iterations, I was able to come up with the following grammar that parses all the complex variable types I had.

from lark import Lark

type_parser = Lark(r"""
    ?type: "any" -> any
        | "string" -> string
        | "number" -> number
        | "bool" -> bool
        | "object({" [keyval (keyval)*] "})" -> object
        | "list(" [type] ")"  -> list
        | "set(" [type] ")" -> set
        | "map(" [type] ")" -> map
        | "tuple(" [type (type)* ] ")" -> tuple

    keyval: CNAME keyval_separator type [comment]
    ?keyval_separator: "=" | ":"
    ?comment: SH_COMMENT                  

    %import common.SH_COMMENT              
    %import common.CNAME              
    %import common.WS              
    %ignore WS
    """, start='type')

Converting Parsed Type To JSON Schema

Defining a grammar for parsing variable type was just the first part. With the defined grammar, variable types are parsed into a parsed tree. A further step was necessary to convert the parsed tree to a JSON schema.

With Lark, we can define transformers to convert a parsed tree back to a text form. The first experiment I did was to convert the parsed tree back to its original form, and it worked well. After that, I had to convert it to a JSON schema. Before I could do that, I had to first map Terraform variable types to JSON schema types. Primitive types such as string, boolean, number are straightforward, as their equivalent counterpart exist in JSON schema. However, some complex types, including map and object, require more thoughts.

Eventually, I created the following mapping, where on the left is Terraform variable type, and the right is its corresponding JSON schema type:

string -> string
number -> number
bool -> boolean
list/tuple/set -> array (with items)
map -> object (with additionalProperties)
object -> object (with properties)
any -> string

Results

Let's revisit the Terraform variable block that I was trying to parse:

variable "project_metadata" {
    description = "Project metadata"
    type = object({
        name = string,
        id = string,
        tags = list(string)
    })

    validation {
        condition = substr(var.project_metadata.name, 0, 2) == "p-"
        error_message = "Project name must be prefixed with 'p-'"
    }
}

With the custom parser and transformer I developed, I was able to reliably generate the JSON schema required by the frontend to dynamically create forms.

{
  "title": "",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "project_metadata": {
      "description": "Project metadata",
      "validation": [
        {
          "condition": "${substr(var.project_metadata.name, 0, 2) == \"p-\"}",
          "error_message": "Project name must be prefixed with 'p-'"
        }
      ],
      "type": "object",
      "properties": {
        "name": {
          "type": "string"
        },
        "id": {
          "type": "string"
        },
        "tags": {
          "type": "array",
          "items": {
            "type": "string"
          }
        }
      }
    }
  }
}

The parser was eventually packaged as a Python library and distributed internally. Backend APIs were also built to interact with the frontend.

In conclusion, the development of this custom parser and transformer has proven to be a rewarding journey personally. If you have any feedback, questions, or thoughts, please feel free to reach out. Thank you for making it this far! =)

Melvin Koh

Melvin Koh

Parsing Terraform for Forms