Spring 2022 April 14, 2023
Batuhan Celik, Kadir Ersoy
- Horadrim QL
We implemented a query language that supports;
- Creating relations with desired fields and field data types,
- Deleting the existing relations or listing them,
- Adding records to a relation,
- Updating and deleting the records,
- Efficiently searching for a record or filtering the records, via Btrees,
- Listing all records in the relation.
Clearly specify your assumptions and constraints of the system in an itemized or tabular format.
- The program expected to be executed in a Unix environment due to the new line character differences.
- The longest string stored in as a type name, a field name or data can be 20 characters at maximum.
- The longest integer stored in a field can be at most 20 digits long.
- Maximum number of the fields in a table is 119. Number of fields is not limited, but our page size limit constraints it.
- Each File stores up to 10 pages. Actually our files support infinite amount of pages but as they get larger, delete operation becomes more costly, thus, we had set this constraint.
- File headers have maximum of 1000 byte length.
Page size is 2.5 kilobytes. Pages have a header and list of records. Page header includes;
- Page type, which provides what type of data this page will held (for now this feature is unused, we were expecting to store Btrees in pages and created page types for that, but we decided to use json later on),
- Table name, which table this page belongs to,
- Number of fields,
- List of field types(str, int)
- List of bools(0,1) indicating whether that slot of the page is filled or not.
Each page object has dictionary of records. We have stringify functions which turns page data into a string to be stored in disk. While writing pages to files, we write page header and records in order. Page header items separated by ”|” char and ends with ’ˆ’ char. Number of records stored depends on the number of fields the records have. We limit the page size to 2.5 kb so that as number of fields a record has increased, the space each record covers increases and number of records decreases. Within pages, each record is given an ID and starts with it. Then comes the record data each separated by ”-” chars.
Page = Page header + record id + record + ”-”Record objects store header and data. Record data is list of field values. We have stringify function for records as well. Records provide type, header and list of field values;
- Record type (for now this feature only have type=record option and not used properly, we were expecting to store Btrees in pages and created record types for that, but we decided to use json later on),
- Record header, customizable, in our implementation it only stores which table the record belongs to.
- List of field values, separated by ”|”.
Record objects have the structure of : Records = type + record header + list of field values. Currently both types and headers are 20 characters long, this value can set to any sufficiently large number.
Records = type + record header + list of field values.Files have maximum of 10 pages. Each file has a header and list of pages. File header is at max 1000 bytes. File header includes list of page metadatas. Page metadata consists of;
- Page ID, given by file constructor,
- Table name, which table the page belongs to,
- Type of page, (we only have record pages type=record, stores table records)
- Bool(0,1), 1 if page is full.
We have table objects for each relation created. Tables have files to store pages and records and, a json file to store its Btree. At constructor, table reserves itself files for record pages and Btree. To avoid creating already existing tables from scratch, we save their data to a json before program end and reload from json in a later execution.
We store system catalog as another table. It stores names of the existing tables.
Btrees are essentially linked Node objects. Each node stores;
- data: list of pointers and key values,
- isRoot: indicating whether node is root or not,
- isLeaf: indicating whether node is leaf or not,
- rids: dictionary of (data, recordID), only stored by leaf nodes.
While index nodes have 4 key values and corresponding 5 pointers in data, leaf nodes have only 4 key values.
Each Btree stored in its own json file. Whenever a table object is created, whether a new one or an existing one which is loaded from its file, a new tree json will be created for the former option and the tree will be loaded to tree object from the corresponding json for latter option. Creating tree object from json handled by reading json and building the tree from root to leaves, connect- ing pointers on fly. Essentially, in tree json, each node has a list of pointers and key values. Since pointers are also pointing to nodes, we stored them as list of pointers and key values as well.
To create a type, 2 operations are performed. Firstly a Table object is created, then, a Btree object is created. This object stores and manages the files and the B-Tree created for this type and manages their storage to the disk. If the given type name already exists in the system catalog, then creation fails.
When a relation is deleted,deleteTable()function is called from the specific table instance. This instance firstly deletes files containing records, then the B Tree. Finally, type name is removed from the system catalog. If the given type name does not exist, the operation fails.
To list all the types in the database, all type names stored in the system catalog table is fetched using List DMLO.
To create a record, firstly, table instance checks for an available file. If there is no available file, it creates one and sets a page within that file. After file situation handled, It finds a page within file and inserts the record. Also inserts given record to the related table tree.
To delete a record, we search the record within the related tree. If record exists, which is the expected case,search()function of tree returns recordId(filename, pageID, slot). We get the file name and page ID that record belongs to from recordID. Then delete the record from the page and the tree. Before finishing, we update page header, to indicate the deleted record slot is now free.
To update the given record, we acquire its primary key and search with it within table’s tree. Againsearch()function of tree returns us recordID. Using the given file name, pageID and record slot within recordID, we update the related page withupdate()function of page.update()basically checks if given field values are valid, w.r.t type concerns, and updates the record.
To search the given primary key, we usesearch()function of tree again. As function returns the recordID(filename, pageID, slot), we use it to find the record within file and acquire all its fields.
We have alist()function in Btree that finds min and max record and traverses between, using the linked list of leafs. It returns list of recordIDs for all records. Then similar to search, we use recordIDs to get all fields of records from files.
Parsing the filter condition, we acquire a key value and a condition, greater than, less than or equals to. For each case, we first search for given key in the tree. After finding the record in the tree, we use the linked list of leafs to acquire the rest. Then w.r.t to each case, for example for greater than, we get the records with keys greater than the given condition key via traversing leaf nodes using links between them. For less than, we traverse the record keys that are less than the given condition key up until the minimum record stored in tree. Equality case is nothing but a simple tree search.
In overall, we were able to implement the structures we learned in the lecture. We have quite the agile data storage but we have not used its properties. Our files support multiple types of pages, our pages support multiple types of records, with constant length, and these records must not be related! However, since throwing everything in the same file or page would be too complicated to track, we decided to match files and pages to tables for an easier implementation. On the other hand, we implemented our B tree by ourselves and are quite proud of it! We tested it extensively with hundreds of records and it performed well in our last tests. One non DB related but valuable thing to learn was that each acyclig graph’s being json serializable. If we mention our shortcomings: Weak side of our implementation is buffer management and deletion operation. Since buffer management was not included in the scope of our project, we did not get involved with it. When a table is required, it is loaded, then it loads a page. Upon completion of the operation, table and page instances are deleted, then reloaded in the next operation if requires. This results in a huge number of read operations. On the latter, we used cursor and this constrained our ability to delete quite much. To delete a page, we must load many others which is a slow operation and limits size of our files quite much.