-
Notifications
You must be signed in to change notification settings - Fork 0
The egt file format
There are two types of files that the Gold Parser System uses - ".grm" and ".egt". GRM files are text files that contain grammar definitions(like regular expressions and productions in Backus–Naur Forms), while the EGT files are the compiled grammar tables that the parser engine will load and use. EGT stands for "Enhanced Grammar Tables"
EGT is a proper file format (although a relatively simple one) that the authors of the Gold Parser System have apparently developed, and one that I was not able to find documentation on (there is one though, it's just not that easy to find in the official website: Enhanced Grammar Tables), god forbid tools to work with, so I had to look through the code on my own and develop my own tools for reading it as text, which I will share with you.
EGT files consist of byte, boolean, integer and string values - strings are zero terminated, and the other tree are of fixed size, naturally. Those values represent Entries and Records. There is also one more entity - the file header.
Record
.int EntriesCount
.List<Entry> Entries
Entry
.object Data
.EntryType Type
EntryType
.Empty
.UInt16
.String
.Boolean
.Byte
.Error
At the beginning of the file is the file header, which is a zero-terminated string that should be something like this: "GOLD Parser Tables/v5.0". If it does not start with "GOLD", the file is not valid.
Everything else is records - one after another. A record starts with a specific byte - 0x4D "M". Next is an integer value that represent the entry count in the record, after which is the data that represents the entries.
Each entry consist of a byte that represents it's type, and a value of that type. 0x62 'b' signifies a byte entry, 0x42 'B' - boolean, 0x49 'I' - integer, 0x53 'S' - string, and 0x45 'E' is an empty entry. If something else is red, an error entry must be produced.
Each record is a piece of data in the grammar tables - for example look at the code below - it is a record of 4 entries. The first is a byte entry, that in this case means that this is a grammar property, the second enrty tells us that it has an index of "1". It is named "Version" and its value is "0.3".
4
{Byte} 0x70
{UInt16} 1
{String} "Version"
{String} "0.3"