Module std.experimental.lexer
Summary This module contains a range-based compile-time lexer generator.
Overview The lexer generator consists of a template mixin, Lexer, along with several helper templates for generating such things as token identifiers.
To write a lexer using this API:
- Create the string array constants for your language.
- Create aliases for the various token and token identifier types
specific to your language.
- TokenIdType
- tokenStringRepresentation
- TokenStructure
- TokenId
- Create a struct that mixes in the Lexer template mixin and
implements the necessary functions.
- Lexer
Examples
- A lexer for D is available here.
- A lexer for Lua is available here.
- A lexer for JSON is available here.
- defaultTokenFunction
- A function that serves as the default token lexing function. For most languages this will be the identifier lexing function.
- tokenSeparatingFunction
- A function that is able to determine if an identifier/keyword has come to an end. This function must return bool and take a single size_t argument representing the number of bytes to skip over before looking for a separating character.
- staticTokens
- A listing of the tokens whose exact value never changes and which cannot possibly be a token handled by the default token lexing function. The most common example of this kind of token is an operator such as "*", or "-" in a programming language.
- dynamicTokens
- A listing of tokens whose value is variable, such as whitespace, identifiers, number literals, and string literals.
- possibleDefaultTokens
- A listing of tokens that could posibly be one of the tokens handled by the default token handling function. An common example of this is a keyword such as "for", which looks like the beginning of the identifier "fortunate". tokenSeparatingFunction is called to determine if the character after the 'r' separates the identifier, indicating that the token is "for", or if lexing should be turned over to the defaultTokenFunction.
- tokenHandlers
- A mapping of prefixes to custom token handling function names. The generated lexer will search for the even-index elements of this array, and then call the function whose name is the element immedately after the even-indexed element. This is used for lexing complex tokens whose prefix is fixed.
Here are some example constants for a simple calculator lexer:
// There are a near infinite number of valid number literals, so numbers are
// dynamic tokens.
enum string[] dynamicTokens = ["numberLiteral", "whitespace"];
// The operators are always the same, and cannot start a numberLiteral, so
// they are staticTokens
enum string[] staticTokens = ["-", "+", "*", "/"];
// In this simple example there are no keywords or other tokens that could
// look like dynamic tokens, so this is blank.
enum string[] possibleDefaultTokens = [];
// If any whitespace character or digit is encountered, pass lexing over to
// our custom handler functions. These will be demonstrated in an example
// later on.
enum string[] tokenHandlers = [
"0", "lexNumber",
"1", "lexNumber",
"2", "lexNumber",
"3", "lexNumber",
"4", "lexNumber",
"5", "lexNumber",
"6", "lexNumber",
"7", "lexNumber",
"8", "lexNumber",
"9", "lexNumber",
" ", "lexWhitespace",
"\n", "lexWhitespace",
"\t", "lexWhitespace",
"\r", "lexWhitespace"
];
Functions
Name | Description |
---|---|
tokenStringRepresentation(type)
|
Looks up the string representation of the given token type. |
Structs
Name | Description |
---|---|
LexerRange
|
Range structure that wraps the lexer's input. |
TokenStructure
|
The token that is returned by the lexer. |
Templates
Name | Description |
---|---|
Lexer
|
The implementation of the lexer is contained within this mixin template. |
Aliases
Name | Type | Description |
---|---|---|
TokenId
|
id
|
Generates the token type identifier for the given symbol. |
TokenIdType
|
ubyte
|
Template for determining the type used for a token type. |