Part 1: Create a web editor with syntax colorization.

Have you ever wondered how web editors like Visual Studio (Online), codesandbox or snack work? or wanted to make a custom web or desktop editor and don’t know how to start?
In this article I’m going to explain how web editors work and we w’ll create one for a custom language.

The language we are going to build the editor for is simple. it declares a list of TODOs, and then apply on them some predefined instructions. I’ll call this language TodoLang, here are some examples of those instructions:

ADD TODO "Make the world a better place"
ADD TODO "read daily"
ADD TODO "Exercise"
COMPLETE TODO "Learn & share"

We simply add some TODOs using:

ADD TODO "TODO_TEXT"

...or complete a TODO using COMPLETE TODO “todo_text”,  so the output of interpreting this code my tell us about the remaining TODOs and the ones we have done so far. This is a simple language that I have invented for the purpose of this blog post. it may seem useless but it has everything I need to cover in this article.
We are going to make the editor support the following features:

  • Auto formatting
  • Auto completion
  • Syntax highlighting
  • Syntax and semantic validation
The editor will only support one code or file editing at once, it will not support multiple file or code editing.

TodoLang semantic rules

Here are some semantics I’ll be using for semantic validation of TodoLang code:

  • If a TODO is completed using COMPLETE TODO instruction, we can not reapply any other instruction on it.
  • COMPLETE instruction should not be applied in a TODO that have not been declared using ADD TODO

I’ll get back to these semantic rules later in this article.

Before we dig deep into the code, let’s start first with a general architecture of a web editor or any editor in general.

App architecture

As  we can see from the above schema, in general there are two threads in  any editor, one that is responsible for UI stuff, such as waiting for the user to type some code or do some actions, and another thread which take the changes the user made and do the heavy calculations, which  includes code parsing, and other compilation stuff.

For  every change in the editor, it could be for every character the user  typed or until the user stopped typing for 2 seconds…, a message will be sent to the Language Service worker to do some actions, the worker itself will respond with a message containing the results. For example when the user types some code and want to format the code ( clicks Shift +  Alt + F) the worker will receive a message containing the action Format and the code that will be formatted, this should happen asynchronously to have a good user experience.

Language service in the other hand, is responsible for parsing the code, generate the Abstract syntax tree(AST), find any possible syntax or lexical errors, use the AST to find any semantic errors, format the code, etc…

We can use a new advanced way to handle the language service by using the LSP protocol, but in this example the language service and the editor will be in the  same process, which is the browser, without any back-end processing. But, if you want your language to be supported in other editors such as VSCode, sublime or eclipse… without reinventing the wheel, it’d be better to separate the language service and the worker. Implementing LSP will allow you to make plugins for other editors to support your language. Take a look at LSP page to learn more.

The Editor provides an interface which allows the user to type the code, and make some actions, as the user types the editor should consult a list of configurations for how it should highlight the code tokens (keywords, types…). This could be done by the language service but for  our example we will do that in the editor. We will see how to do that  later.

Communicating with web worker

Monaco provides an API monaco.editor.createWebWorker to create a proxy web worker using built-in ES6 Proxies. Use getProxy method  to get the proxy object (language service). In order to access any service in  the language service worker, we will use this proxied object to call any method. All the methods will return a Promise object.

Check this (Comlink) tiny library developed by google which makes working with web workers enjoyable using ES6 Proxies.

Without further ado, let’s start writing some code.

What are we going to use?

For this project I’m going to use :

React: For UI.

ANTLR: (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It’s widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees. ANTLR supports a lot of languages as target, which means it can generate a parser in Java, C#…, for this project I’m going  to use ANTLR4TS, which is a nodejs version of ANTLR that can generate the a lexer and a parser in typescript.

ANTLR uses a special syntax for declaring a language grammar which are typically placed in a *.g4 file, it allows you to define lexer and parser rules in a single combined grammar file. In this repository you will find grammar files for a lot of well known languages.

This grammar syntax uses a notation known as Backus normal form (BNF) to describe the syntax of languages.

TodoLang Grammar:

Here is a simplified grammar of our TodoLang. It declares a root rule for TodoLang, todoExpressions which holds the list of expressions, the expressions in TodoLang can be either an addExpression or completeExpression , the asterisk * as in regular expressions, means that the expression may occur zero or more times.

Each expression begins with a terminal keyword (‘add’, ‘todo’ or ‘complete’) and has a string (“…”) identifying the Todo.

grammar TodoLangGrammar;

todoExpressions : (addExpression)* (completeExpression)*;

addExpression : ADD TODO STRING EOL;
completeExpression : COMPLETE TODO STRING EOL;

ADD : 'ADD';
TODO : 'TODO';
COMPLETE: 'COMPLETE';
STRING: '"' ~ ["]* '"';
EOL: [\r\n] +;
WS: [ \t] -> skip;
TodoLangGrammar.g4

Monaco-Editor: The Monaco Editor is the code editor that powers VS Code, it’s a Javascript library which offers an API for syntax highlighting, auto-completion, etc.

Development tools:

Typescript, webpack, webpack-dev-server, webpack-cli, html-webpack-plugin, ts-loader

So let start by initiating the project.

Initiate a new Typescript project:

For that let’s initiate our project:

npm init

Create a tsconfig.jsonfile with this minimum content:

{
    "compilerOptions": {
        "target": "es6",
        "module": "commonjs",
        "allowJs": true,
        "jsx": "react"
    }
}
tsconfig.json

...and add a config file webpack.config.js for webpack:

const path = require('path');
const htmlWebpackPlugin = require('html-webpack-plugin');
module.exports = {
    mode: 'development',
    entry: {
        app: './src/index.tsx'
    },
    output: {
        filename: 'bundle.[hash].js',
        path: path.resolve(__dirname, 'dist')
    },
    resolve: {
        extensions: ['.ts', '.tsx', '.js', '.jsx']
    },
    module: {
        rules: [
            {
                test: /.tsx?/,
                loader: 'ts-loader'
            }
        ]
    },
    plugins: [
        new htmlWebpackPlugin({
            template: './src/index.html'
        })
    ]
}
webpack.config.js

...add dependencies for react and typescript:

npm add react react-domnpm add -D typescript @types/react @types/react-dom ts-loader html-webpack-plugin webpack webpack-cli webpack-dev-server

Create a src directory with your entry point: index.tsand index.htmlwhich contains a div with an id container.

Here is the source code for this starter project

If you are targeting an existing language like typescript, HTML, Java …, you don’t have to reinvent the wheel, Monaco-editor |Monaco-Languages support most of those languages.

For our example we are going to use a core version of Monaco-editor called monaco-editor-core.

Add the package:

npm add monaco-editor-core

We also need some loaders for CSS as Monaco uses them internally:

npm  add -D style-loader css-loader

Add this rule to module property in webpack config:

{
    test: /\.css$/,
    use: ['style-loader', 'css-loader']
}

Finally add CSS to the resolved extensions:

extensions: ['.ts', '.tsx', '.js', '.jsx','.css']

Now  we are ready to create the editor component, create a react component  we will call it Editor, and return an element that has a ref attribute  so we can take its reference to let monaco API inject the editor inside it.

To create a Monaco editor we need to call monaco.editor.create,  it takes as arguments the DOM element in which Monaco will inject the editor, and some options for language id, the theme, etc. Check out the  documentation for more details.

Add a file that will contain all the language configuration in src/todo-lang:

export const languageID = 'todoLang';
config.ts

Add a component in src/components:

import * as React from 'react';
import * as monaco from 'monaco-editor-core';

interface IEditorPorps {
    language: string;
}

const Editor: React.FC<IEditorPorps> = (props: IEditorPorps) => {
    let divNode;
    const assignRef = React.useCallback((node) => {
        // On mount get the ref of the div and assign it the divNode
        divNode = node;
    }, []);

    React.useEffect(() => {
        if (divNode) {
            const editor = monaco.editor.create(divNode, {
                language: props.language,
                minimap: { enabled: false },
                autoIndent: true
            });
        }
    }, [assignRef])

    return <div ref={assignRef} style={{ height: '90vh' }}></div>;
}

export { Editor };
Editor.tsx

We basically use a callback hook to get the reference of the div when mounted, so we can pass it to the create function.

Now you can add the editor component to your application and add some styling if you want.

Register our language using Monaco API

To  make Monaco editor support our defined language (e.g when we created  the editor we specified the language ID), we need to register it using  the API monaco.languages.register. Let’s create a file in src/todo-lang called setup, we also need to implement monaco.languages.onLanguage  by giving it a callback that will be called when the language configuration is ready (we will use this callback later to register our  language providers for syntax highlighting, auto-completion, formatting etc.):

import * as monaco from "monaco-editor-core";
import { languageExtensionPoint, languageID } from "./config";

export function setupLanguage() {
    monaco.languages.register(languageExtensionPoint);
    monaco.languages.onLanguage(languageID, () => {

    });
}
setup.ts

Now call the setup function from the entry point.

Add a web worker for Monaco

So far if you run the project and open it in the browser, you will get an error concerning the web worker:

Could not create web worker(s). Falling back to loading web worker code in main thread, which might cause UI freezes. Please see https://github.com/Microsoft/monaco-editor#faq
You must define a function MonacoEnvironment.getWorkerUrl or MonacoEnvironment.getWorker
Language services create web workers to compute heavy stuff outside of the UI  thread. They cost hardly anything in terms of resource overhead and you  shouldn’t worry too much about them, as long as you get them to work  (see above the cross-domain case). [Source] check also [ this ]

There  is a web worker that Monaco-Editor uses, I think it’s used for highlighting  and perform other built-in actions. We will create another one that will handle our  language service.

Let's first tell webpack to bundle the Monaco’s Editor WebWorker. Add this line to the entry point:

entry: {
    app: './src/index.tsx',
    "editor.worker": 'monaco-editor-core/esm/vs/editor/editor.worker.js'
},

Change the output to tell webpack to give a specific name to the web worker without the hash and use ‘self’ as global object as it's required by Monaco, here's webpack config file content so far :

const path = require('path');
const htmlWebpackPlugin = require('html-webpack-plugin');
module.exports = {
    mode: 'development',
    entry: {
        app: './src/index.tsx',
        "editor.worker": 'monaco-editor-core/esm/vs/editor/editor.worker.js'
    },
    output: {
        globalObject: 'self',
        filename: (chunkData) => {
            switch (chunkData.chunk.name) {
                case 'editor.worker':
                    return 'editor.worker.js';
                default:
                    return 'bundle.[hash].js';
            }
        },
        path: path.resolve(__dirname, 'dist')
    },
    resolve: {
        extensions: ['.ts', '.tsx', '.js', '.jsx', '.css']
    },
    module: {
        rules: [
            {
                test: /\.tsx?/,
                loader: 'ts-loader'
            },
            {
                test: /\.css/,
                use: ['style-loader', 'css-loader']
            }
        ]
    },
    plugins: [
        new htmlWebpackPlugin({
            template: './src/index.html'
        })
    ]
}
webpack.config.js

As we can see from the above error, Monaco-Editor calls a method from global variable MonacoEnvironment called getWorkerUrl , go to setup function and add the following:

import * as monaco from "monaco-editor-core";
import { languageExtensionPoint, languageID } from "./config";

export function setupLanguage() {
    (window as any).MonacoEnvironment = {
        getWorkerUrl: function (moduleId, label) {
            return './editor.worker.js';
        }
    }
    monaco.languages.register(languageExtensionPoint);
    monaco.languages.onLanguage(languageID, () => {

    });
}
setup.ts

This will tell Monaco where to find the worker. We'll add sooner our custom language service worker.

Run the application, you should see an editor which does not yet support any features:

Add syntax highlighting & language configuration

In this section, we will add some keywords highlighters.

Monaco-editor uses Monarch library  which allows to create declarative syntax highlighters using JSON. Take a  look at their documentation if you want to learn more about this syntax.

Here an example of Java configuration for syntax highlighting, code folding etc.

Create  a file in src/todo-lang called config.ts, we are going to configure the TodoLang highlighter and tokenizer using Monaco API: monaco.languages.setMonarchTokensProvider, it takes two parameters, the language ID, and the configuration of type IMonarchLanguage.

Here is the configuration for TodoLang:

import * as monaco from "monaco-editor-core";
import IRichLanguageConfiguration = monaco.languages.LanguageConfiguration;
import ILanguage = monaco.languages.IMonarchLanguage;

export const monarchLanguage = <ILanguage>{
    // Set defaultToken to invalid to see what you do not tokenize yet
    defaultToken: 'invalid',
    keywords: [
        'COMPLETE', 'ADD',
    ],
    typeKeywords: ['TODO'],
    escapes: /\\(?:[abfnrtv\\"']|x[0-9A-Fa-f]{1,4}|u[0-9A-Fa-f]{4}|U[0-9A-Fa-f]{8})/,
    // The main tokenizer for our languages
    tokenizer: {
        root: [
            // identifiers and keywords
            [/[a-zA-Z_$][\w$]*/, {
                cases: {
                    '@keywords': { token: 'keyword' },
                    '@typeKeywords': { token: 'type' },
                    '@default': 'identifier'
                }
            }],
            // whitespace
            { include: '@whitespace' },
            // strings for todos
            [/"([^"\\]|\\.)*$/, 'string.invalid'],  // non-teminated string
            [/"/, 'string', '@string'],
        ],
        whitespace: [
			[/[ \t\r\n]+/, ''],
		],
        string: [
            [/[^\\"]+/, 'string'],
            [/@escapes/, 'string.escape'],
            [/\\./, 'string.escape.invalid'],
            [/"/, 'string', '@pop']
        ]
    },
}
TodoLang.ts

We  basically specify the CSS classes or token names for each type of  keywrods in TodoLang. For example for keywords ‘COMPLETE’ and ‘ADD’, we instructed Monaco to give them a class ‘keyword’ and class ‘type’ for  typeKeywords ‘TODO’. We also instructed Monaco to colorize strings by giving  them a CSS class of type ‘string’ predefined by Monaco. Keep in mind  that you can override the theme and add new  CSS classes by using defineThem API and specify it when creating the editor or setting it using setTheme.

To tell Monaco to consider this configuration, go the setup function, in the onLanguage callback, call monaco.languages.setMonarchTokensProvider and give it the configuration as second argument:

import * as monaco from "monaco-editor-core";
import { languageExtensionPoint, languageID } from "./config";
import { monarchLanguage } from "./TodoLang";

export function setupLanguage() {
    (window as any).MonacoEnvironment = {
        getWorkerUrl: function (moduleId, label) {
            return './editor.worker.js';
        }
    }
    monaco.languages.register(languageExtensionPoint);
    monaco.languages.onLanguage(languageID, () => {
        monaco.languages.setMonarchTokensProvider(languageID, monarchLanguage);
    });
}
setup.ts

Run the app, the editor should now support syntax highlighting.

Here is the source code of the project so far:

https://github.com/amazzalel-habib/TodoLangEditor/tree/add-syntax-highlighter

In  the next part of this article, I’ll cover the language service. I’ll use ANTLR to generate TodoLang lexer and parser, implement most features of the editor using the AST provided by the parser, then we’ll see how to create a web worker to provide the language services with auto-completion.

Stay tuned.