-
Notifications
You must be signed in to change notification settings - Fork 97
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
7 changed files
with
194 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
### Computer course: Linear Programming | ||
|
||
This course explains ways to solve complicated mathematical problems with the aid of computers, in specific Python and Gurobi. | ||
|
||
Gurobi is a (proprietary) module to solve problems related to linear, quadratic and mixed integer programming in several coding languages. | ||
|
||
|
||
|
||
#### Python basics | ||
|
||
Python is open source (!!!) and easy to learn. It has many useful libraries to tackle high-level elements such as data visualization or graphs. | ||
|
||
However, it is slower than other compiled languages. | ||
|
||
|
||
|
||
#### Linear Programming | ||
|
||
Linear programs are in the form $min\ c^Tx\ s. t. Ax \leq b$, with an objective function $c$ over a feasible region (intersection of hyperplanes). | ||
|
||
Solving linear programming problems can be done through the simplex method, iterating through edges until the optimal solution is found. | ||
|
||
An optimal solution minimizes the value of the objective function while respecting all the constraints. | ||
|
||
|
||
|
||
#### Gurobi | ||
|
||
```python | ||
from gurobipy import * | ||
|
||
m = Model() | ||
|
||
x = m.addVar(vtype=GRB.CONTINUOUS) | ||
y = m.addVar(vtype=GRB.CONTINUOUS) | ||
``` | ||
|
||
Variable types: | ||
|
||
* Continuous; | ||
* Binary; | ||
* Integer; | ||
* Semi-continuous $\{0\} \cup (a, b)$; | ||
* Semi-integer $\{0\} \cup (a, b) \cap \mathbb{Z}$; | ||
|
||
```python | ||
addVar(lb=0, ub=GRB.INFINITY, obj=0.0, vtype=GRB.CONTINUOUS, name="") | ||
addVars(indices, lb=0, ub=GRB.INFINITY, obj=0.0, vtype=GRB.CONTINUOUS, name="") | ||
|
||
c1 = m.addConstr(2*x+y>=3) | ||
c2 = m.addConstr(2*x+2*y>=5) | ||
|
||
m.setObjective(3*x+5*y, GRB.MINIMIZE) | ||
|
||
m.optimize() | ||
``` | ||
|
||
- `lb`, `ub`, lower and upper bound; | ||
- `obj`, coefficient of the linear objective function; | ||
- `vtype`, variable type, | ||
- `name`, name for further referencing: | ||
- `indices`, array used to generate the set of variables. | ||
|
||
Linear expression can be created by mathematical symbols, `x.prod()`, `x.sum()` or `quicksum([2*x, 3*y])`. | ||
|
||
Gurobi runs multiple linear methods in parallel and outputs the solution of the fastest. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
### Computer course: Linear Programming | ||
|
||
This course explains ways to solve complicated mathematical problems with the aid of computers, in specific Python and Gurobi. | ||
|
||
Gurobi is a (proprietary) module to solve problems related to linear, quadratic and mixed integer programming in several coding languages. | ||
|
||
|
||
|
||
#### Python basics | ||
|
||
Python is open source (!!!) and easy to learn. It has many useful libraries to tackle high-level elements such as data visualization or graphs. | ||
|
||
However, it is slower than other compiled languages. | ||
|
||
|
||
|
||
#### Linear Programming | ||
|
||
Linear programs are in the form $min\ c^Tx\ s. t. Ax \leq b$, with an objective function $c$ over a feasible region (intersection of hyperplanes). | ||
|
||
Solving linear programming problems can be done through the simplex method, iterating through edges until the optimal solution is found. | ||
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
% !TeX root = ../notes.tex | ||
|
||
\section{Introduction} | ||
Queries involve multiple steps to be performed. All techniques obviously depend on the physical characteristic of hardware, however there are algorithms and data structures able to help making them faster. | ||
|
||
Query optimization is specifically important since SQL-like languages are declarative, hence do not specify the exact computation. Furthermore, the same query can be performed in multiple ways, and choosing the correct one is not trivial (depending on size, indices and more). | ||
|
||
For instance, looking at cardinality of joins can either cause a Cartesian product of a hundred or several thousands, depending on the order in which they are executed. Often the human-written sequential order of instructions is not the most efficient one. | ||
|
||
\subsection{Query Processing} | ||
Query processing consists in: | ||
\begin{enumerate} | ||
\item Taking a text query as input; | ||
\item Compiling and optimizing; | ||
\item Extract the execution plan; | ||
\item Executing the query. | ||
\end{enumerate} | ||
Roughly, it can ve differentiated between compile time and runtime system. Most systems strongly separate those phases, for example using so-called prepared queries. | ||
\begin{lstlisting}[language=SQL] | ||
SELECT S.NAME | ||
FROM STUDENTS S | ||
WHERE S.ID = ? | ||
\end{lstlisting} | ||
Compilation takes time and limits the number of queries which can be ran in parallel. This kind of query can be executed over and over providing a value \texttt{Q1(123)} without the compilation overhead. | ||
|
||
Embedded SQL is another technique to optimize compilation time in the case of large periods of time between the two phases. The programming language compiler takes care of the SQL part as well, optimizing it. | ||
|
||
Specifically, the steps executed in compile time are: | ||
\begin{enumerate} | ||
\item Parsing, AST production (abstract syntax to understand the structure); | ||
\item Schema lookup, variable binding, type inference (semantic analysis about relations and columns, syntax check); | ||
\item Normalization, factorization (bringing the query in abstract form, avoiding computing the same thing twice, evaluating expressions); | ||
\item Unnesting, deriving predicates, resolution of views (the plan generator can finally construct a cost-based model); | ||
\item Construction of execution plan; | ||
\item Review, pushing joins and refining the plan in general; | ||
\item Production of imperative plan (code generation). | ||
\end{enumerate} | ||
Rewrite I involves steps 1-3, while 4 and 5 compose rewrite II. | ||
|
||
Example (with views): | ||
\begin{lstlisting}[language=SQL] | ||
SELECT name, salary | ||
FROM employee, department | ||
WHERE dep = did | ||
AND location = "Munich" | ||
AND area = "Research" | ||
\end{lstlisting} | ||
|
||
\begin{figure}[h] | ||
\includegraphics[scale=1.1]{query_plan.png} | ||
\centering | ||
\end{figure} | ||
Finally, the execution tree is built and polished, with the join operation on top and selects on nodes, reducing the amount of tuples to be joined. In other cases, such as regular expressions which are hard to evaluate, filtering can be done later. | ||
|
||
The executable plan is a set of tuples containing variables or constants along and their type, with the main function loading query parameters, operations and allocated resources. | ||
|
||
Usually query planners are much more complicated and have practical difficulties: for instance, a long list of AND/OR predicates (machine-generated in the order of hundreds of thousands) can make the binary tree recursion crash due to insufficient space in the stack. | ||
|
||
\subsection{Query Optimization} | ||
Possible goals of query optimization include minimizing response time, resource consumption, time to first tuple (producing the first tuple as quick as possible, for instance with search results) or maximizing throughput. This can be expressed as a cost function: most systems aim to minimize response time, having resources as constraints. | ||
|
||
Algebraic optimization is a branch using relational algebra to find the cheapest expression equivalent to the original. However, finding the cheapest is a practically impossible problem: it is hard to test for equivalence (numerical overflow, undecidable), the set of expressions is potentially huge and some algorithms are NP-hard (actual search space is limited and smaller than the potential one). | ||
|
||
There are ways to transform numerical expressions in algebraic ones, yet they might be expensive: calculus is faster to evaluate than algebra. | ||
|
||
Optimization approaches can be: | ||
\begin{itemize} | ||
\item Transformative, taking an algebraic expression and iteratively making small changes, not efficient in practice; | ||
\item Constructing, starting from small expressions and joining them, obtaining larger sets, usually the preferred approach. | ||
\end{itemize} | ||
|
||
\subsection{Query Execution} | ||
Query execution is the last step, the one directly benefiting from optimization. In reality, operators can perform extremely specialized operations, treating data as bags (sets with duplicates) or streams. | ||
|
||
|
||
|
||
|
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
\documentclass{article} | ||
\input{../packages.tex} | ||
|
||
\title{Query Optimization} | ||
\author{Ilaria Battiston \thanks{All notes are collected with the aid of material provided by T. Neumann. All images have been retrieved by slides present on the \href{https://db.in.tum.de/teaching/ws2021/queryopt/}{TUM Course Webpage}.}} | ||
\date{Winter Semester 2020-2021} | ||
\pagestyle{fancy} | ||
|
||
\input{../commons.tex} | ||
\graphicspath{{./images/}} | ||
|
||
\begin{document} | ||
|
||
\maketitle | ||
|
||
\lfoot{} | ||
\cfoot{} | ||
\rfoot{\thepage} | ||
|
||
\newpage | ||
\setcounter{tocdepth}{1} | ||
\tableofcontents | ||
\newpage | ||
\input{lectures/introduction.tex} | ||
%\input{lectures/textbook_query_optimization.tex} too simple! | ||
%\input{lectures/join_ordering.tex} | ||
%\input{lectures/accessing_the_data.tex} | ||
|
||
\end{document} |
Binary file not shown.