Skip to content

BACKEND: Multiline Solutions with Renaming and Caching #5

@JMAR059

Description

@JMAR059

Relational Algebra in Database Systems quickly dives into more complex problems using renaming and caching. Caching involves savings the resulting table from one line into a new relation, so that those results could be used in the next lines. Renaming builds on top of caching, allowing the renaming the attributes of the resulting expression on the line. Examples would be:

# Only caching, new relation R is made
R = S join T

# Caching and renaming, where B's attributes (id, name) get renamed and saved to new relation A(id1, name1)
# New Relation A is made
A(id1, name1) = B 

# Both used, using new relation A that was not previously known
# Cached to new F relation
F(id2, name2) = A join_{id1 < id} B 

Use class material to further understand these operations if needed. They are essential to tackling more complex problems that simulate actual SQL queries, and are CRUCIAL functionalities for REX that will be expected by students taking the class.

For implementation, keep it simple and seperate from the current relationParser. Don't parse everything with one string, let whatever 'controls'/input split the parts for you. Have a simple string parser for any saving/renaming on the left and relational algebra operations on the right:

# DON'T:
Given: "F(id2, name2) = A join_{id1 < id} B"
Call: generalParser("F(id2, name2) = A join_{id1 < id} B", relationsDictionary)

# DO:
Given: ("F(id2, name2)", "A join_{id1 < id} B")
Call:
      resultDataframe = (relationalParser("A join_{id1 < id} B", relationsDictionary)).resolve(relationsDictionary)
      cacheRenameParser("F(id2, name2)", resultDataframe, relationsDictionary)

This parser should take care of:

  • Parsing the new relation name and column names
  • Make a new dataframe from the resultDataframe, which renames the previous columns
  • Throw an error if the number of renaming columns are not the same as the number of columns from resultDataframe
  • Save the resulting dataframe into the relationsDictionary
  • Skip renaming step if no renaming columns are found between () in given string

This would be the cleanest and most straightforward implementation. Any control/front-end should be able to have two seperate inputs for the two parsers in each line, and be able to build up the relationsDictionary after each given line.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions