Enterprise Javascript
Interrogate legacy code with ASTs
Warren Bickley, Technical Consultant
12 December 20234 minute read
Code accumulates over time, enterprise products amass a considerable volume of it. With that, we frequently find ourselves relying on code which may have been written a long time ago or written in languages that do not align with our strategic goals. As these technologies fall out of favor, the pool of engineers proficient in them diminishes significantly. Bringing in new engineers to work on such code becomes progressively challenging. Consequently, product teams often find themselves constrained to making only minor fixes, fearing the unintended consequences of altering unseen dependencies.
Eventually, a big rebuild project gets undertaken which often discards a lot of value which could be found in this legacy code.
What if you could parse and walk over this code to extract that value and find answers you’re looking for, but in your modern choice of language? What if you could provide tools which enables anyone in your team to easily do this?
Parsing code in an AST format is one of the ways Griffiths Waite demystify legacy transformation projects, and its part of the reason we are adept at these kind of enterprise projects.
What is an AST?
AST as represented by Antlr
Abstract Syntax Trees (ASTs) are hierarchical data structures used to represent the structure of a program. They abstract away specific syntax details and focus on the underlying code structure. ASTs are used in various areas of software development such as compilation, code analysis, optimisation, and transformation. They are a fundamental component in compilers, interpreters, and even syntax highlighters.
Producing ASTs can be trivial, for example TypeScript ASTs are easily produced with the TypeScript package available on NPM. Python has an ast module which does the same for Python. You will probably find that most languages have their own AST parser, great!
But what if you want to parse Python AST in TypeScript?…
Antlr4
Enter Antlr4, “ANother Tool for Language Recognition”. Antlr4 is special because it is not quite a parser, but instead a parser generator. You give it a grammar, which is essentially the rules for a language, and you can generate a parser in a language of your choosing (target).
For example you could parse Fortran with PHP code if you wanted. Java in Go. Swift in TypeScript. Whilst the choice of targets is limited, the number of grammars is endless.
Getting started with Antlr4 is not always straightforward, depending on the grammar and target there can be extra steps or fine-tuning required for the generated code to work correctly. The community on GitHub is helpful and welcoming however, so it won’t take too long to work through any difficulties!
Why you should parse your legacy codebase?
Digging into old, unfamiliar code might not sound like the most exciting task. But it's worth it! Parsing your legacy codebase can reveal a wealth of insights and opportunity. It helps you understand the nitty-gritty details, spot potential risks, and even pave the way for modernisation. Here are five clear reasons why we think parsing your legacy codebase is a smart move:
- Demystify: Parsing your legacy codebase allows you to gain a deeper understanding of its structure, dependencies, and overall functionality. This understanding is crucial for making informed decisions and planning future changes.
- Identify Risks: By parsing the code, you can identify potential risks and pitfalls that may arise during refactoring or modernisation efforts. This helps you mitigate those risks and ensures a smoother transition.
- Enable Modernisation: Parsing the codebase enables you to modernise and refactor it using your preferred modern language. This opens up possibilities for leveraging modern frameworks, libraries, and coding practices to improve maintainability and performance.
- Enhance Collaboration: When anyone in your team can parse the legacy code, it promotes collaboration and knowledge sharing. It empowers team members to contribute to the codebase and make informed decisions based on their findings.
- Facilitate Transformation Projects: Parsing the legacy codebase is a crucial step in transformation projects. It helps demystify the complexities of the code, provides insights for planning, and ensures a successful migration to a more modern and maintainable system.
How do Griffiths Waite leverage parsers and AST to add value to its clients?
Griffiths Waite believe in leveraging cutting-edge tools and techniques to drive value for our clients. Through tools such as parsers and AST we are able to redefine how code is understood, managed, and improved in a manner which is tailored to a particular problem, challenge, or codebase. Some examples of how we leverage these tools are:
- Documentation: Automatically generate documentation from particular nodes within the code, providing a comprehensive understanding of the codebase's structure and functionality at whatever level is required i.e integration or system. Coupling this with short-node traversal and LLMs allows us to produce deductions of what the code does.
- Migration Compatibility: Analyse and enforce compatibility rules by producing minimum requirements for new technology or by automatically migrating test suites, enabling smoother technology changes where beneficial or required.
- Type-Safe Integrations: Automatically produce type-safe integrations for your modern solutions, creating a contractual link between the legacy system. When integrated into CI processes this helps to ensure conformity across your technologies.
- Dead code elimination: Deduce unused functions, files, tables, columns etc for removal, reducing overall technical debt.
- Refactoring Support: Identify areas of code that can be improved or optimised by interrogating common code smells or bad patterns. This helps in enhancing code quality and maintainability.
How we empower teams to autonomously parse legacy code
Our teams obsession with automation extends to running automations on code, whether that code be written by us or not. This approach makes it easier for us to demystify and improve legacy codebases, which we believe is key to both remaining competitive and long term success. We continuously work to develop and share any tooling which works toward these goals.
An example of this would be our PL/SQL AST Viewer which parses PL/SQL code within the browser, and returns a visual representation of the AST alongside highlighted and linked code. It is built using an open source PL/SQL parser which is a generated and packaged Antlr4 grammar in TypeScript.
The purpose of the viewer is to make interrogation via the parser incredibly fast. You can see how each token within the code is evaluated and then start to produce listeners based on that much quicker than without the viewer. The viewer is also a great way to introduce new members of the team to ASTs structure and how the underlying parser works.
Once we have that understanding of the AST which makes up our code, we can write our own automations. Such as the TypeScript type generator below.
Parsing legacy code is the future of technology transformation
In the ever-evolving landscape of software development, parsing legacy code is going to become increasingly more common place. Having this capability is a super power for extracting value out of existing long-standing systems and logic whilst enabling technology modernisation by effectively reducing risk.
Insights to your inbox
Enter your email below, and we'll notify you when we publish a new blog or Thought Leadership article.