thesis.tex

\documentclass[nolof,digital]{fithesis3}
\thesissetup{faculty=fi}
\thesissetup{author=Bc. Vojtěch Polášek,id=410266, departmentEn = Department of Computer Systems and Communications, programmeEn = Informatics, fieldEn = Information Technology Security, assignment = {}}
\thesissetup{type= mgr}
\thesissetup{title=Argon2 security margin for disk encryption passwords}
\thesissetup{keywords = {security, password hashing, key derivation, disk encryption, password cracking, LUKS, LUKS2}}
\usepackage[english]{babel}
\usepackage{alltt}
\usepackage{listings}
\usepackage{csquotes}
\usepackage[hidelinks]{hyperref}
\usepackage[style=ieee,sorting=nty,block=ragged]{biblatex}
\renewbibmacro*{bbx:savehash}{} %disabling dashing
\addbibresource{bibliography.bib}
\usepackage{tabularx}
\usepackage{placeins}
\usepackage{tabu}
%\usepackage[chars=60, lines=30]{stdpage}
\usepackage{algorithm2e}
\SetAlCapSkip{3mm}


\thesislong{abstract}{
Passwords are a popular authentication method in the field of information technology. Passwords were created for humans to be remembered. Sometimes they are not ideal for usage in encryption software. Therefore, there exist key derivation functions, which transform a password into more suitable cryptographic key.
This thesis deals with such functions, in particular considering their usage in disk encryption. The most popular function PBKDF2 is described together with its vulnerabilities and attacks. Memory-hard functions have started being used as a mitigation of time-memory trade-off attacks. One of such functions is Argon2 selected as a winner of Password Hashing Competition. The thesis describes Argon2 in detail.
The practical part of the thesis deals with simulating of an attack on a disk encrypted with LUKS2 encryption scheme using Argon2 as PBKDF. It includes collecting Argon2 parameters benchmarked by Cryptsetup software. Attack is devised through CPUs and GPUs using high-performance hardware provided by MetaCentrum VO. 
The last part of the thesis introduces a price model for an attacker using either physical hardware or on-demand allocation of computing resources in the cloud. This model is then applied to real world prices and data obtained during the attack simulation.
The thesis shows that it can take thousands of machines and hundreds of millions of dollars to crack a LUKS2 password eight characters long in ten years.
}
\thesissetup{advisor=Ing. Milan Brož\, Ph.D.}
\thesislong{thanks}{
My big thanks go to my advisor Milan Brož for very inspirational and positive supervision and for help with gaining access to hardware needed for benchmarking.
Furthermore, I would like to thank Ondrej Mosnáček for his consultations concerning Argon2 and for the software used in this thesis.
My thanks also go to Lukáš Másilko for help with polishing of graphical appearance of the thesis.
Last but not least, I would like to thank my family and my girlfriend Justyna for supporting me during the whole process.
Computational resources were provided by the CESNET LM2015042 and the CERIT Scientific Cloud LM2015085, provided under the programme "Projects of Large Research, Development, and Innovations Infrastructures". Computational resources were supplied by the Ministry of Education, Youth and Sports of the Czech Republic under the Projects CESNET (Project No. LM2015042) and CERIT-Scientific Cloud (Project No. LM2015085) provided within the program Projects of Large Research, Development and Innovations Infrastructures.
}


\thesisload
\setcounter{tocdepth}{2}


\begin{document}
\chapter{Introduction}
This diploma thesis deals with password-based key derivation functions and their application in disk encryption software. At the very beginning of the computer era there were no passwords. They appeared for the first time in the 1960s and they were introduced by Fernando Corbató \parencite{ctss}. They are going through evolving process as they try to keep pace with developing technology and new ways in which the technology is being used.

Passwords are becoming more and more important. They protect our secrets, our money, our communication. All these things are valuable not only to us but also to other people and some of them would like to gain access to them for malicious purposes. There are many ways how to gain access to the secret passphrase including guessing, social engineering, exploiting application vulnerabilities etc. This thesis deals with the method mentioned first and it tries to look at it from the attacker's perspective. To be more exact, it considers this option from the perspective of attacker's financial resources. It tries to answer a question: How much would it cost to guess a password opening an encrypted disk volume? The thesis shows that to guess a password for LUKS encrypted volume even if created on low performance hardware costs currently more than a hundred million dollars and involves thousands of purchased or rented machines.

The second chapter introduces basic terminology and rationale for usage of PBKDFs in scope of disk encryption. It also summarizes history and possible attacks against PBKDFs. The third chapter describes in detail the PBKDF2 function and the Argon2 function. During their analysis the rationale for using memory-hard functions is shown. The Argon2 function is described in greater detail because it is fundamental for rest of the thesis. At the end of the chapter several other password hashing functions and their features are briefly mentioned.

The fourth chapter analyses the LUKS2 disk encryption scheme and its front-end called Cryptsetup. First the inner workings of LUKS2 are introduced with special focus on usage of PBKDFs in various processes. The rationale for benchmarking of PBKDF parameters is explained. This benchmarking function is then used to collect real world data to be used in subsequent analysis. The last part is dedicated to experimental simulation of an attacker trying to break LUKS2 disk encryption by guessing the passphrase. This process inevitably requires processing of PBKDFs. Methodology, tools and highlights from resulting data are shown.

The fifth chapter summarizes options of a potential attacker and introduces the price model suited for estimating final price of an attack against LUKS2 encrypted volume. The model is applied to real world data based on results from the previous chapter. The option of using CPUs, GPUs and cloud based on-demand allocated resources is taken into account.

The sixth chapter provides conclusions and several paths for further research in this area. The thesis is closed with description of attached archive with source code, scripts, and results, and glossary is provided.

\chapter[Password hashing and key derivation functions]{Password hashing and key derivation\\functions}

\section{Definitions}
\label{definitions}
This thesis deals with various cryptographic terms including password hashing and key derivation. This section briefly explains some of them.

A \emph{hashing function} is a function which receives an input of arbitrary length and produces an output of specified shorter length, effectively compressing the input. These functions are used in many areas such as effective data retrieval \parencite{itmc14}. \emph{Cryptographic hashing functions} are subset of \emph{hashing functions} and they have to meet certain properties, namely preimage resistance, second preimage resistance and collision resistance. Only cryptographic hash functions are considered in this thesis.

\emph{Password hashing} is a process in which a password is supplied to a hash function. This is de facto standard method of storing of saved passwords in operating systems and applications. In case that an attacker gets hold of such password hashes, it should be practically infeasible for an attacker to derive the original password. Therefore, hashing functions are used also in process of password verification, during which the entered password is hashed and compared with the stored hash.

This thesis is not going to deal with \emph{password hashing}. However, many \emph{password hashing functions} described below meet desired properties for being used in the key derivation process.

\emph{Key derivation functions} are based on \emph{hash functions}. Their basic purpose is to take an input and produce an output which can be used as a cryptographic key. The input is usually a password or other material such as biometric sample converted into binary form. These materials could be of course used as cryptographic keys on their own, but they often lack properties of a good cryptographic key such as sufficient entropy or length. 

A \emph{cryptographic salt} is often used during process of key derivation. The purpose of salt is to prevent attacks which use precomputed tables such as Rainbow tables \parencite{rainbowtables}. Salt introduces another factor which influences a derived key. It means that it is no longer dependent only on passphrase. For example suppose that 32 bit long integer is used as a salt. In that case there are \(2^{32}\) possible keys derived from the same passphrase. This makes precomputing attacks effectively infeasible. Salt is usually stored unobfuscated together with hashed material.

A \emph{memory-hard algorithm} is an algorithm which performs asymptotically almost the same number of operations compared to number of accessed memory locations. A \emph{sequential memory-hard function} is defined as a function which can be computed by a sequential memory-hard algorithm and at the same time even the fastest parallel algorithm cannot asymptotically reach a significantly lower cost.

\section{Why do we need PBKDFs?}
\label{whypbkdfs}
Today, as more and more private information is stored on various kinds of media and transferred over the Internet, it is becoming crucial to protect it from being accessed or changed by unauthorized actors. Although there are several interesting authentication options such as biometrics, passwords or passphrases are still the most common method.

Considering passwords we are facing a problematic situation. Organizations and services provide guidelines or requirements which should help a user to choose a strong password \parencites{nistpasswords}{sanspasswordguidelines}. Important parameters are password length (in characters), password complexity, uniqueness and others. I define complexity as amount and diversity of used characters (letters, numbers, symbols, emojis\dots) and by uniqueness I mean the fact that the password does not contain easily guessable or predictable sequences. See mentioned policies for example.

In some situations passwords  do not provide a good cryptographic material to be used as a cryptographic key. There are surprisingly many reasons. They are usually not sufficiently long. Because they are composed of printable characters, they do not meet the requirement of being uniformly distributed. If they should be remembered, they will probably contain dictionary words, which lessens their entropy even more. See section 5.6.4 at \parencite{itmc14} for short but interesting analysis.

PBKDF stands for password-based key derivation function. The goal of these functions is to derive one or more cryptographic keys from a password or a passphrase. This key should be pseudorandom and sufficiently long to make brute-force guessing as time-consuming as possible. As stated above, they are based on cryptographic hashing functions.

Lately PBKDFs are taking another specific task. Due to availability of GPUs, FPGAs and ASICs, there are new possibilities of running functions in parallel computing environment. See chapter 4 at \parencite{mosnacek}. This increases effectiveness of brute-force attacks. PBKDFs try to defend against such attacks by using memory-hard algorithms to slow down potential attackers and to make running the function in parallel extremely expensive or inefficient. See section~\ref{sec:attacks} for more details.

Examples of PBKDFs include Argon2, PBKDF2, Scrypt, Yescrypt and more. See chapter~\ref{chap:pbkdfs} for comparison of several functions.

\section{PBKDFs and disk encryption}
Disk encryption is a very good use case for usage of PBKDFs. Used encryption algorithms require cryptographic keys of certain length \parencite{veracrypt}. It is also important to consider the fact, that it is usually not desirable to change the encryption key often because reencryption of whole disk takes considerable amount of time. Let aside the fact that if an attacker gains permanent access to such an encrypted disk, the key cannot be changed at all and there exists an extensive period of time for cracking of the passphrase.

By looking at \parencite{pbkdf2usage} we can see that PBKDFs are used in many types of disk encryption software. Note that this list mentions only PBKDF2 as this has been most used PBKDF until recent times. PBKDF2 is for example used in LUKS version 1 \parencite{luks1}, FileVault software used by macOS \parencite{filevault}, CipherShed disk encryption software \parencite{ciphershed}, Veracrypt disk encryption software \parencite{veracrypt} and more.

In 2013 there was initiated a new open competition called Password Hashing Competition. Its goal was to find a new password hashing function which would resist new attacks devised against those functions \parencite{phc}. The winner was function named Argon2. It is already used for example in LUKS version 2 \parencite{luks2}.

\section{Attacks on PBKDFs}
\label{sec:attacks}
History of password cracking is as old as computer passwords. People crack passwords for two main reasons. They either want to recover a forgotten password, or they want to recover password of someone else, later being able to use it for authentication purposes. There exist two main techniques for password cracking; brute force attacks and dictionary attacks. The idea behind the first method is to guess the password by trying all passwords of given length composed of all combinations of given characters. Dictionary attacks exploit the fact that people often use passwords containing words found in language dictionaries. It means that if an attacker tries passwords containing dictionary words or permutations, they might have quite good chance of success.

At the beginning passwords were stored on computer systems in plain text, protected only by the fact that users should not be able to read passwords of other users. But soon it had been discovered that plain text passwords could be revealed for example by badly designed software permissions. This was shown in the case of Allan Scherr, who misused capabilities of a printing program to print out whole password file, see page 37 at \parencite{ctss}. Since that time, passwords started to be hashed.

This time marks beginning of the never-ending fight between authors of hash functions and people trying to crack passwords hashed by them. First hash functions were really simple, and they were definitely not cryptographically secure, such as hash mechanism used in Multics. This mechanism squared numerical form of each password and applied a bit mask with AND operation \parencite{multicssecurity}. This increased number of guesses but only negligibly compared to modern functions.

The first cryptographic hash came with Robert Morris and his Crypt function. Crypt used up to Unix 6th edition mimicked the M209 cipher machine from World War II. It proved to be not very secure because the algorithm could be recoded in a way which allowed to test passwords in very short period of time (1.25 milliseconds per password) \parencite{pshistory}. Later version used since Unix 7th edition employed the DES block cipher. This cipher was at that time very slow if implemented in software. The password entry program also introduced two new concepts; automated proactive password strength checking and \emph{cryptographic salt} (12 bit random number at that time). Currently for example the LUKS2 specification uses 32 bytes long salt.

During the 1980s there were organized some password cracking contests and hash functions were also improved. Some of them were made deliberately slow to slow down potential attacker. If this measure was not effective enough, more iterations of hashing function could be used. During the 1990s many password cracking programs appeared including John the Ripper, Crack, LOphtCrack and others \parencite{crackinghistory}. Most of those programs tried to improve the cracking speed by optimizing underlying algorithms and later by using CPU parallelism.

The concept of \emph{KDFs} started being studied in late 1990s. The RFC 2898 for PBKDF2 was released in 2000 and it started being used primarily for key derivation in many applications such as WinZip, Opendocument, Truecrypt or Android. However, it was also used for actual password hashing for example in Mac OS X 10.8. Note that PBKDF1 exists, but it is not recommended because of its limited key length (20 bytes at best) and it is provided solely for backward compatibility. PBKDF2 introduced configurable pseudorandom function, number of iterations and derived key length. These parameters allow flexibility while choosing trade-off between security and user experience.

However, in 2007 there appeared the first password cracker using parallel computing capabilities of GPUs \parencite{elcomsoftgpu}. Other password crackers followed; Whitepixel in 2010, Oclhashcat in 2012, John the ripper in 2012. Until 2012 software could recover primarily MD5 and NTLM hashes, but Oclhashcat introduced recovery of many other hashes.

As far as PBKDF2 is considered, its resistance to password cracking using GPUs or even ASICs/FPGAs is currently not ideal as discussed in section 7 at \parencite{mosnacek}. PBKDF2 does not offer support for parallelism while used as suggested. However, at the same time its low memory requirements and GPU friendly algorithm (see algorithm~\ref{pbkdf2alg}) bring advantage to the attacker. As shown in \parencite{mosnacek}, it was possible to improve cracking speed of LUKS passwords forty times. This required rewriting the function for GPUs. Moreover, it is shown that proper optimization of underlying algorithms can greatly increase PBKDF2 performance even without GPUs. This was shown after analysis of closed source Oclhashcat in \parencite{pbkdf2accel}.

As shown in \parencite{pbkdf2weakness}, there exist other attacks not connected with GPU. This paper shows that an attacker can save 50 \% of PBKDF2 CPU operations if the PBKDF2 is not implemented according to suggested performance improvements described in RFC 2104 \parencite{rfc2104} and NIST FIPS PUB 198 \parencite{fipspub198}. In this case it is possible to precompute first message block of underlying keyed hash function (used as PRF) and replace it with resulting constant in subsequent operations. See lines 11--13 in algorithm \ref{pbkdf2alg}. Note that HMAC function is not described in this thesis, see section~4.1 at \parencite{pbkdf2weakness}. 

In section~4.2 \parencite{pbkdf2weakness} there is shown that an attacker can omit considerable amount of XOR operations while using SHA1 as a pseudorandom function within PBKDF2 because this operation is sometimes performed on two blocks containing only zeroes. Additionally, more XOR operations can be omitted because of padding characters which are constant and some XOR operations in this case just zero out themselves. Finally, in section 4.3, it is shown that it is possible to precompute the word expansion part of the second message block of a keyed hash function. The block is password independent and can be thus precomputed. However, this saves only negligible amount of time compared to previous attacks described in this paragraph.

In 2009 Colin Percival suggested that to defend against usage of parallel computing, PBKDFs should fulfil requirements of memory-hard functions \parencite{memoryhard}. As a reaction to previously mentioned problems of PBKDF2, the Password Hashing Competition was held from 2013 to 2015 to select a new function for password hashing. The winner is Argon2 described in section~\ref{argon2} and it is indeed a memory-hard function. Of course that does not mean that memory-hard functions are not prone to attacks.

In a paper from 2016, authors show that there still exists an attack which can decrease computational complexity of Argon2i version 1.0 function. It was shown that at that time it was needed to configure at least ten passes of Argon2i version 1.0 to mitigate this attack. At the time of releasing the paper, the IRTF proposal suggested only six passes for "paranoid" situations. Fortunately authors of Argon2 reacted to those attacks and improved the function, therefore the attack is not effective against Argon2i any more except when running with only single pass. Note that current  version of Argon2 is 1.3. Details and rationale can be found in section~5.2 at \parencite{argon2}.

\chapter{State of the art PBKDFs}
\label{chap:pbkdfs}

\section{PBKDF2}
\label{sec:pbkdf2}
PBKDF2 is a password-based key derivation function defined in RFC 8018 \parencite{rfc8018}. This RFC thoroughly describes two use cases of PBKDF2; password-based encryption scheme and password-based message authentication scheme. Other mentioned use cases include password checking and derivation of multiple keys from one password. As shown in \ref{luks1}, LUKS version 1 uses PBKDF2 for password checking and derivation of key for encryption or decryption of master key.

The function requires four input parameters; passphrase, cryptographic salt, iteration count and length of a key to be derived. Moreover, the function requires a pseudorandom function (PRF) which is used in process of key derivation.

The term \emph{Passphrase} in this context means any data which are source for subsequent key derivation process. Usually it is a password entered by user. The \emph{cryptographic salt} is represented by randomly generated number.

The purpose of \emph{iteration count} parameter is to defend against brute force and dictionary attacks performed on PBKDF2. The iteration count prolongs the time which is needed to derive a single key. Technically, the iteration count signifies number of successive runs of chosen PRF for every block of the derived key. In general, the \emph{iteration count} should be chosen as large as possible, taking into account the fact that the processing time should be acceptable for the end user \parencite{nistpbkdf2}. According to the cited document, minimum iteration count should be 1,000 iterations and for critical security systems a count of 10,000000 iterations is appropriate. 

The function is described by following algorithm. Verbal description is also provided. Following abbreviations and conventions are used in the algorithm and description:

\begin{description}
\item[P] -- an octet string representing a passphrase

\item[S] -- an octet string representing a cryptographic salt

\item[C] -- a positive integer representing iteration count

\item[dkLen] -- a positive integer representing length of the derived key counted in octets

\item[DK] -- an octet string representing the derived key

\item[PRF] -- a pseudorandom function

\item[hLen] -- length of output of chosen pseudorandom function counted in octets

\item[CEIL(x)] -- the ceiling function returning the smallest integer which is greater or equal to X

\item[F] -- a helper function for better description

\item[||] -- concatenation of strings

\item[INT(x)] -- a big-endian encoding of integer x
\end{description}


\begin{algorithm}
\DontPrintSemicolon
\LinesNumbered
\SetKwInOut{Input}{input}
\SetKwInOut{Output}{output}
\SetKwFunction{KwCeil}{CEIL}
\SetKwFunction{KwF}{F}
\SetKwFunction{KwPrf}{PRF}
\SetKwProg{Fn}{Function}{}{}
\Input{P, S, C, dkLen}
\Output{DK}
\If{\(dkLen > (2^{32} - 1) \times hLen\)}{\Return{Derived key too long}}
\(L \leftarrow \) \KwCeil{\(dkLen/hLen\)} \;
\tcc{l is the number of hLen-octet blocks in the derived key}
\(r \leftarrow dkLen - (l - 1) \times hLen \)
\tcc{r is the number of octets in the last block}
\For{\(i \leftarrow 1\) \KwTo \(l\) }{
\(T_i \leftarrow \) \KwF{p, s, c, i} \;
}
\Return{\( t_1 || t_2 || \dots t_l [0 \dots (r-1)]\)} \;
\Fn{F (s, p, c, i)}{
\( u_1 \leftarrow \) \KwPrf{P, S || INT(i)} \;
\For{\( j \leftarrow 2 \) \KwTo \( c \)}{
\( u_j \leftarrow \) \KwPrf{P, \( u_{j-1} \)} \;
}
\Return{\(u_1 \oplus u_2 \oplus \dots \oplus u_c \)} \;
	}
\caption{PBKDF2 function algorithm}
\label{pbkdf2alg}
\end{algorithm}

At the beginning the algorithm checks if the desired length of the key does not exceed \(2^{32} - 1\). If it does, it exits immediately. Then it processes the input and creates output key in blocks. Every block has length of hLen octets, except for the last one which can have shorter length. 

In the pseudocode there is defined function \emph{F} which is applied to every block. Results of such applications are finally concatenated and returned as the resulting derived key. This function performs \emph{c} iterations of underlying pseudorandom function \emph{PRF}. The \emph{PRF} takes a passphrase as the first parameter and result of previous iteration as the second parameter. The only exception is the first iteration where the second parameter is concatenation of salt and binary representation of the block index \emph{i}. Results of all iterations are xored and returned as a particular block of the derived key. Finally, all blocks are concatenated and returned as the derived key.

Notice that the function F can be rewritten to be quickly computed in parallel computing environments such as GPUs. See section~4.1 at \parencite{mosnacek} for more details.

\section{Argon2}
\label{argon2}
As mentioned in very brief history of attacks on password hashes in section~\ref{sec:attacks}, the Argon2 function is the winner of Password Hashing Competition. Argon2 is a hash function belonging to the set of memory-hard functions as defined in \parencite{memoryhard}. Current version of the function is 1.3 and the latest IETF draft is \parencite{argon2draft}.

The function quickly fills up given amount of memory and performs a sequence of computations over values stored in this memory. The Argon2 comes in three versions which differ in the way in which data in the memory matrix (described further below) is processed. See subsection~\ref{argon2versions} for detailed description.

Argon2 is used as the default PBKDF in LUKS version 2. Note that according to \parencite{cryptsetupmanual} the default PBKDF can be configured during compilation.

The function is optimized for X86 architecture using improvements in handling of cache and memory access in recent Intel and AMD processors. To be exact, currently the Argon2 can utilize SSE2, SSSE3, XOP, AVX2 and AVX512F CPU instructions. All the instructions except for XOP are Intel specific, XOP is specific for AMD processors.

The function can be implemented on specialized hardware, but the previously mentioned fact makes this implementation possibly very slow and expensive and even specialized ASICs shouldn't acquire significant benefit even if they employ large areas of memory. However, no implementations of any Argon2 mode for specialized hardware are known so far.

\subsection{Operation}
The function expects following mandatory input parameters.

\begin{description}
\item[P] -- the message of any length from 0 to \(2^{32} - 1\) bytes. In case of PBKDF this is the passphrase.

\item[S] -- nonce with length from 8 to \(2^{32} - 1\) bytes. In case of PBKDF this is the \emph{cryptographic salt}.
\end{description}

The function also accepts secondary inputs, which do not need to be supplied.

\begin{description}
\item[p] -- degree of parallelism as an integer with a value from 1 to \(2^{24} - 1\). The value determines the number of parallel computational chains to be run. Chains are not independent, synchronization occurs.

\item[\(\tau\)] -- tag length in bytes in range from 4 to \(2^{32} - 1\). This determines an output of the function, in case of PBKDF key length.

\item[m] -- memory size as an integer number in range from 8\emph{p} to \(2^{32} - 1\). The integer determines amount of kilobytes of memory which should be used for computation of the function.

\item[t] -- number of iterations as an integer in range from 1 to \(2^{32} - 1\). This parameter is used to tune the length of the function run by specifying number of iterations.

\item[v] -- one byte version number, currently hard-coded to 0x13.

\item[K] -- a secret value (key) with length from 0 to \(2^{32} - 1\) bytes. By default, no key is assumed.

\item[X] -- associated data with length from 0 to \(2^{32} - 1\) bytes.

\item[y] -- type (mode) of Argon2 to be used. 0 for Argon2d, 1 for Argon2i, 2 for Argon2id.
\end{description}

Argon2 makes use of the permutation function \emph{P} which is based on Blake2b hash function \parencite{blake2}. The function actually copies blake2b design but additionally it uses 64 bit multiplications. This particular modification makes the function more complicated to implement and optimize for ASICs, while the running speed on X86 processors should be degraded only negligibly. See section~3.6 at \parencite{argon2draft} for detailed explanation.

Another internal function of Argon2 is compression function \emph{G} which internally uses previously mentioned function \emph{P}. \emph{G} accepts two 1,024 bytes long inputs and produces one 1,024 bytes long output. Let \emph{X} and \emph{Y} be the inputs. Firstly, the function XORs them:

\(R = X \oplus Y \)

R is treated as a \(8 \times 8 \) matrix of 16 byte registers. The function \emph{P} is first applied to every row of the matrix and then to every column. The result is denoted as \emph{Z}. Finally, the result is computed as \(Z \oplus R\). For more detailed description see section~3.5 at \parencite{argon2draft}.

Argon2 makes use of two more hash functions. The \emph{\(H^X\)} function where \emph{X} denotes the output length in bytes and the whole function is again based on Blake2b. Finally, the variable length hash function \emph{\(H'^X\)} based on \emph{\(H^X\)} defined in section~3.3 at \parencite{argon2draft} is also used.

The argon2 operation can be described as follows. The emphasized variables are Argon2 input parameters, primary and secondary parameters are not distinguished. The numbers in brackets denote the line numbers in algorithm~\ref{argon2alg}.

\begin{enumerate}
\item Initialize the block \(H_0\).

\item Allocate the memory according to \emph{m} and \emph{p} parameters. The real size of allocated memory is denoted with \(m'\) in 1,024 byte blocks. Note that the memory is treated as a matrix \(B[I][J]\) with \emph{p} rows and \(q = m' / p\) columns.

\item Compute \(B[i][0]\) for \(0 <= i < p\).

\item Compute \(B[i][1[\) for \(0 <= i < p\).

\item Compute \(B[i][j]\) for \(0 <= i < p\) and \(2 <= j < q\). This step is different for every Argon2 mode.

\item If \emph{\(t\)}\(> 1\) then repeat step 5 with slight change for every iteration.

\item The final block C is computed.

\item The output tag is computed.
\end{enumerate}

\subsection{Algorithm}
This subsection describes Argon2 through pseudocode. In addition to input parameters mentioned earlier, following notations will be used:

\begin{description}
\item[\(||\)] -- concatenation of two strings

\item[floor(x)] -- function returns the largest integer which is not larger than x

\item[ceil(x)] -- function returns the smallest integer which is not smaller than x

\item[LE32(x)] -- converts 32 bit long integer x to byte string in little endian

\item[length(s)] -- returns length of string s in bytes as a 32 bytes long integer

\item[allocate(x)] -- allocates x bytes of memory
\end{description}

\begin{algorithm}
\DontPrintSemicolon
\LinesNumbered
\SetKwInOut{Input}{input}
\SetKwInOut{Output}{output}
\SetKwFunction{KwCeil}{ceil}
\SetKwFunction{KwFloor}{floor}
\SetKwFunction{KwHapostrophe}{\(H'\)}
\SetKwFunction{KwHapostropheonetwoeight}{\(H'^{128}\)}
\SetKwFunction{KwG}{G}
\SetKwFunction{KwHsixfour}{H64}
\SetKwFunction{KwLethreetwo}{LE32}
\SetKwFunction{KwLength}{length}
\SetKwFunction{KwAllocate}{allocate}
\SetKwFunction{KwHapostrophetau}{\(H'^\tau\)}
\Input{P, S, p, \(\tau\), m, t, v, K, X, y}
\Output{TAG}
\(H_0 \leftarrow \) \KwHsixfour{\KwLethreetwo{p} || \KwLethreetwo{\(\tau\)} || \KwLethreetwo{M} || \KwLethreetwo{t} || \KwLethreetwo{v} || \KwLethreetwo{y} || \KwLethreetwo{\KwLength{P}} || P || \KwLethreetwo{\KwLength{S}} || S || \KwLethreetwo{\KwLength{K}} || K || \KwLethreetwo{\KwLength{X}} || X } \;
\(M' \leftarrow 4 \times p \times \) \KwFloor{\(m / 4p\)} \;
\(B \leftarrow \) \KwAllocate{\(m'\)} \;
\For{\( i \leftarrow 0 \) \KwTo \(p - 1 \)} {
\(B[i][0] \leftarrow \) \KwHapostropheonetwoeight{\(H_0 \) || \KwLethreetwo{0} || \KwLethreetwo{i}} \;
}
\For{\(i \leftarrow 0 \) \KwTo \( - 1p\)} {
\(B[i][1] \leftarrow \) \KwHapostropheonetwoeight{\(H_0 \) || \KwLethreetwo{1} || \KwLethreetwo{i}} \;
}
\For{\(i \leftarrow 0 \) \KwTo \(p - 1\)} {
\For{\(j \leftarrow 2\) \KwTo \(q\)} {
\(B[i][j] \leftarrow \) \KwG{\(B[i][j-1], B[l][z]\)} \;
\tcc{indexes l and z are computed differently for every Argon2 version, see subsection~\ref{argon2versions}}
}
}
\If{\(t > 1\)} {
\For{\(i \leftarrow 0 \) \KwTo \(p - 1\)} {
\(B[i][0] \leftarrow \) \KwG{\(B[i][q-1], B[l][z]\)} \;
\For{\(j \leftarrow 1\) \KwTo \(q\)} {
\(B[i][j] \leftarrow \) \KwG{\(B[i][j-1], B[l][z]\)} \;
}
}
}
\(C \leftarrow B[0][q-1] \oplus B[1][q-1] \oplus \dots \oplus B[p-1][q-1]\) \;
\Return{\KwHapostrophetau{C}}
\caption{Argon2 function algorithm}
\label{argon2alg}
\end{algorithm}

\subsection{Differences in Argon2 versions}
\label{argon2versions}
Argon2 comes in three versions:

\begin{description}
\item[Argon2D] -- data-dependent variant of Argon2. This version uses previously computed data while performing computations and memory access. It is recommended to be used for cryptocurrencies and other proof-of-work applications as well for hashing on back-end servers. In general, this version is suited for an environment where no side-channel timing attacks are expected. It provides better protection against brute-force attacks using specialized hardware and trade-off attacks.

\item[Argon2i] -- data-independent version. This version does not rely on previously computed data while performing calculations in memory. The version is recommended for key derivation and password hashing, where side-channel timing attacks are more probable because an adversary can have physical access to the machine. The mode is slower because it performs multiple passes over memory to defend against trade-off attacks.

\item[Argon2id] -- combination of previous versions. In this mode the function behaves as Argon2i during computation of the first half of the first iteration over the memory. During remaining operation the Argon2d mode is used. The mode combines benefits of mitigation of side-channel timing attacks and brute force attacks.
\end{description}

The difference lies in line number xx in the algorithm~\ref{argon2alg}. In particular indexes \emph{l} and \emph{z} are computed differently within different Argon2 versions. The allocated memory is represented as a matrix \(B[I][J]\) with \emph{p} rows and \(q = m' / p\) columns. Before computing indexes \emph{l} and \emph{z}, values \(J_1\) and \(J_2\) have to be calculated. This is the difference among Argon2 versions. After calculating of these values, further processing needs to be done but this stays the same for all versions.

Rows of memory are called lanes and there are p lanes according to the parameter \emph{p}, which signifies degree of parallelism. Moreover, the memory is divided into 4 vertical slices, where number of slices is denoted as \emph{S}. Every intersection of a lane and a slice is called a segment. Segments belonging to the same slice are computed in parallel and they must not reference blocks of each other.

Argon2d computes \(J_1\) and \(J_2\) based on data stored in memory. Therefore, it is data-dependent.

\(J_1 \leftarrow int32(extract(B[i][j-1], 1))\)

\(J_2 \leftarrow int32(extract(B[i][j-1], 2))\)

Where the function int32(s) converts the 32 bit string s into non-negative integer represented in little endian. The function extract(s, i) extracts ith set 32 bits long from bit string s while s is indexed from 0.

Argon2i computes indexes by running two rounds of the compression function \emph{G} in counter mode in the following way
\begin{eqnarray*}
x &\leftarrow& G(ZERO, LE64(r) || LE64(l) || LE64(s) || LE64(m') || LE64(t) ||\\
 && || LE64(y) || LE64(i) || ZERO) 
\end{eqnarray*}
The result x is 64 bits long and therefore it can be viewed as two 32 bit strings.

\(x_1 || x_2 \leftarrow x\)

\(j_1 \leftarrow int32(x_1)\)

\(j_2 \leftarrow int32(x_2)\)

Inputs for the function \emph{G} are described below. Note that no data from memory blocks are used. Therefore, Argon2i is data-independent.

\begin{description}
\item[r] -- the pass number

\item[l] -- the lane number

\item[s] -- the slice number

\item[\(m'\)] -- total number of memory blocks

\item[t] -- total number of passes

\item[y] -- Argon2 mode

\item[i] -- counter starting from 1 in every segment

\item[ZERO] -- string of zeroes, in the first case 1024 bytes long, in the second case 968 bytes long
\end{description}

Argon2id behaves as Argon2i if the pass number is 0 and at the same time the slice number is 0 or 1. In further computations it behaves like Argon2D.

Further computations common for all Argon2 modes are described in the subsection 3.4.2 at \parencite{argon2draft}.

\section{Scrypt}
\label{sec:scrypt}
Scrypt is a PBKDF introduced by Colin Percival in \parencite{memoryhard}. At the time of releasing the paper (2009) problems of PBKDF2 and its parallel computation were already well-known. Percival came with two new concepts. He introduced a concept of memory-hard algorithm and a sequential memory-hard function. Both concepts came as a reaction to increasing computing power of parallel hardware. He also defined the ROMix and SMix functions which are sequential memory hard functions and they perform mixing of data blocks in memory.

Scrypt accepts the memory/CPU cost parameter \emph{n}, the parallelization parameter \emph{p} and the block size parameter \emph{r}. The idea behind the Scrypt function can be described as follows:

\begin{enumerate}
\item Use PBKDF2 with PRF being SHA256 to process the passphrase and salt provided as Scrypt input and generate \emph{p} blocks. Their size is affected by parameter \emph{r}.

\item Blocks generated in the previous step are independently mixed by a mixing function which is actually composed of ROMix function, SMix function and 8-round version of Salsa20 stream cipher function. This step makes whole computation expensive.

\item The result is used as salt for another PBKDF2 computation which produces the final result.
\end{enumerate}

There exist theoretical work describing cache timing side-channel attack against Scrypt but it has not been put into practice yet \parencite{scryptattack}.

\section{Other PBKDFs}
The Password Hashing Competition received 24 submissions. The panel chose 9 finalists and finally Argon2 as the winner. Four finalists which received special recognition by PHC panel are described in this section.

The Catena is presented as a password-scrambling framework \parencite{catena}. It can be used for many purposes including password derivation or proof of work. Catena instance is characterized by cryptographic primitive, a reduced primitive, optional randomization layer and a memory-hard function. These parts are independent on each other and therefore the whole platform is somehow modular. Three tunable parameters are provided, affecting time and memory requirements and also resistance against precomputation attacks.

An interesting feature is introduced by server relief protocol which allows shifting actual computation from a server to a client, while keeping memory and time requirements. For example authentication servers could benefit from this feature by not being burdened by too many authentication requests. Another interesting feature is option to increase security parameters (recomputing hash) without knowledge of the actual password.

The Lyra2 function was chosen for its usage of cryptographic sponges \parencite{lyra2}. The function accepts parameters for tuning of memory and time cost. The function supports parallelism on multi-core CPU platforms but tries to minimize benefits of parallelism on GPUs. Different underlying sponge functions can be used.

The main highlight of the Makwa function is its ability to delegate its computation to untrustworthy clients \parencite{maqa}. This feature is similar to the one provided by Catena. A feature which makes this function unique is support for password escrow. This feature allows recovering of a password by knowing a specific private key. This can be used for example as a password recovery, although this method is definitely not ideal and should be considered carefully.

The Yescrypt function is based on the Scrypt function briefly described in section~\ref{sec:scrypt} \parencite{yescrypt}. It is therefore backward compatible with Scrypt and can compute Scrypt hashes. It also offers delegation of computation to external clients and hash upgrades without knowledge of the password. The running time can be increased separately from the memory usage and parallelism.

\chapter{Analysis of LUKS2}
\label{chap:analysis}
This chapter deals with the actual practical research performed as a part of the thesis. The important goal of this thesis is to create a price model for potential attacker trying to get unauthorized access to LUKS2 encrypted partition. Another goal is tightly related. As explained in section~\ref{luks2pbkdf}, LUKS2 can determine three parameters for Argon2 through its own benchmarking function. It is highly probable that most of Cryptsetup users do not provide their own Argon2 parameters and therefore use the benchmarking function.

Therefore, before the actual model is created, it is necessary to collect sufficient data about parameters used as a result of this benchmark under various real world conditions. The process and results are described in section~\ref{sec:benchmark}. These parameters are further used while simulating attacks on LUKS2 encrypted volumes on real machines. This is described in section~\ref{sec:attack}. Results are used to create an actual price model in chapter~\ref{chap:model}.

\section{LUKS}
LUKS stands for Linux Unified Key Setup. This project started being developed by Clemens Fruhwirth as a reaction to several incompatible disk encryption schemes which coexisted at the same time at the beginning of 21st century. At certain point there existed three incompatible disk encryption schemes which varied from Linux distribution to Linux distribution. If a user had created an encrypted disk, they could not be sure if they will be able to encrypt the disk with a different distribution or even with a new version of the same distribution.

LUKS began as a metadata format for storing information about cryptographic key setup. However, Fruhwirth discovered that to design a proper metadata format, he needs to know enough information about key setup process \parencite{newmethods}. Therefore, he created TKS1 and TKS2. These are templates for the key setup process. Together with LUKS they ensure safe and standardized key management during disk encryption. After some user feedback, LUKS on-disk specification version 1.0 was created in 2005 \parencite{luks1}. Currently the latest version is LUKS on-disk specification version 2 \parencite{luks2}.

The reference implementation of both versions of LUKS is called Libcryptsetup. The user space interface is called Cryptsetup. In the following two subsections, default parameters and information concerning command line switches are specific to this implementation.

\subsection{Usage of PBKDFs in LUKS version 1}
\label{luks1}
PBKDF2 function is used as a key derivation function in LUKS version 1. It is used during master key initialization, adding of a new password, master key recovery, and also during password changing because this operation is actually composed of previously mentioned operations. During all operations it internally uses a hash algorithm specified by user during initialization of the LUKS header. By default, SHA256 algorithm is used.

During initialization, the PBKDF2 function is used to create a checksum of a master key. This key is subsequently used for symmetric encryption of actual data stored on the encrypted disk. The function receives following parameters:

\begin{description}
\item[masterKey] -- a new randomly generated master key of user specified length

\item[phdr.mk-digest-salt] -- a cryptographic salt 32 bytes long which is used to prevent attacks against password using precomputed tables, see subsection~5.6.3 at \parencite{itmc14}

\item[phdr.mk-digest-iteration-count] -- number of iterations for PBKDF2, see section~\ref{sec:pbkdf2}

\item[LUKS\-DIGEST\-SIZE] -- length of he computed digest in bytes, default is 20
\end{description}

The generated 20 bytes long checksum is stored in the LUKS header together with the iteration count and salt. Please note that the \emph{phdr.mk-digest-iteration-count} parameter is obtained by performing a benchmark with minimum of 1,000 iterations.

During adding of a new password, PBKDF2 is used to process the passphrase supplied by user. It receives following parameters:

\begin{description}
\item[password] -- a passphrase supplied by user

\item[ks.salt] -- randomly generated salt used for this particular keyslot with length of 32 bytes

\item[ks.iteration-count] -- number of PBKDF2 iterations

\item[MasterKeyLength] -- length of the derived key
\end{description}

The resulting key derived from the passphrase is used to encrypt the master key. This key has to be present in memory either because initialization has happened recently or it has been successfully recovered through a different key stored in different keyslot. Together with the encrypted master key, the salt and iteration count are also written into the keyslot.

Note that the ks.iteration-count parameter can be influenced by user in several ways \parencite{cryptsetupmanual}. One possibility is to specify number of iterations directly with the command line option \verb+--pbkdf-force-iterations+. Another option is to specify the iteration time through \verb+-i+ or \verb+--iter-time+ command line options. This option expects a number which signifies number of milliseconds which should be spent unlocking the keyslot. A benchmark is used to calculate number of iterations which corresponds to this time. If no option is specified then the default iteration time of 2,000 milliseconds is used.

The function is also used during the master key recovery. This process is performed while unlocking the encrypted partition. During key recovery the PBKDF is actually used twice for every keyslot until the master key is decrypted or there are no more keyslots to try. First it is used to derive a decryption key from a passphrase supplied by a user. Then this decryption key is used to decrypt the encrypted master key stored in the current keyslot. The result is called the candidate master key because we are still not sure if the passphrase was correct. This candidate is again hashed with PBKDF and finally compared with hash of the master key stored in the LUKS header. If hashes match then the passphrase was entered correctly and the master key can be used.

In the first case the function receives following parameters:

\begin{description}
\item[pwd] -- user supplied passphrase

\item[ks.salt] -- the salt value read from currently tried keyslot

\item[ks.iteration-count] -- the iteration count read from currently tried keyslot

\item[masterKeyLength] -- length of the derived key
\end{description}

In the second case the function receives following parameters:

\begin{description}
\item[masterKeyCandidate] -- candidate master key, see above

\item[ph.mk-digest-salt] -- the salt value which was used during the initialization phase and stored in the header

\item[ph.mk-digest-iter] -- number of PBKDF2 iterations used during the initialization phase and also stored in the header

\item[LUKS\_DIGEST\_SIZE] -- 20 bytes
\end{description}

The process of changing a password is composed of previously mentioned operations and hence I am not mentioning it here in greater details. To sum it up, firstly the master key is recovered, then a new password is added to a new keyslot and the previous one is revoked.

Both password-based encryption and password checking require additional cryptographic primitives which process the derived key. For encryption, reference implementation of LUKS uses AES-XTS-plain64 and for hashing it uses sha256. Usage of PBKDF2 requires underlying pseudorandom function. In case of LUKS version 1, default PRF is SHA256. Alternatively, it is possible to choose SHA512, ripemd160 or whirlpool.

\subsection{Usage of PBKDFs in LUKS version 2}
\label{luks2pbkdf}
LUKS version 2 extends LUKS version 1 and uses similar principles. Therefore, I will focus on differences between version 1 and 2 which are related to usage of PBKDFs. For detailed list of changes see the section 1.1 at \parencite{luks2}.

The new version supports configurable algorithms for encryption, hashing and also key derivation. That means that the set of algorithms can be extended as new are developed and other are obsoleted. Note that there still exist some requirements for algorithms provided by cryptographic back-end \parencite{luks2}. The back-end has to support SHA-1 and SHA-256 hashing functions, PBKDF2, Argon2i and Argon2id key derivation functions and AES-XTS symmetric encryption.

LUKS2 introduces PBKDF memory hard functions Argon2i and Argon2id which are described in~\ref{argon2}. Argon2 functions should offer increased resistance to brute-force attacks.

The volume key digest is no longer limited by length of 20 bytes, because it no longer relies on SHA-1 hashing function. The processes described in subsection~\ref{luks1} are the same in LUKS version 2. The same applies for situations in which PBKDFs are used. 

As stated in subsection~\ref{luks1},  in case of PBKDF2 user can influence number of iterations directly or specify approximate time required for processing of the passphrase. This stays the same for LUKS2. Functions from Argon2 family introduce two additional parameters; memory cost and parallel cost. Both parameters can be specified through \verb+--pbkdf-memory+ and \verb+--pbkdf-parallel+ command line parameters respectively. The number of iterations is either benchmarked or it can be manually specified through \verb+--pbkdf-force-iterations+.

The benchmarking function takes as an input a desired unlocking time specified in milliseconds. The function tries to find the best parameters while not exceeding this unlocking time. Note that the number of parallel threads will be always at most 4 and it will decrease if not enough CPUs are online at the time of its usage. In Cryptsetup version 2.1.0 the default unlocking time is set at 2,000 milliseconds and the default memory cost is set at 1,048,576 KiB. Both default values can be changed at compilation time.

\section{Benchmarking of LUKS2 and Argon2 parameters}
\label{sec:benchmark}
In \ref{luks2pbkdf} three parameters for Argon2 were mentioned. They have crucial impact on resistance of the resulting hash against dictionary or brute-force attacks. Their effects are described in detail in section~\ref{argon2}. From the Cryptsetup's point of view, these parameters impact not only security but user experience as well. If they are set to too low values, the security of encryption keys can be endangered. If they are set to too high values, the unlocking process can last undesirably long time or excessive amount of computing power and memory can be used. Cryptsetup does not suppose that all users know exact implications of choosing particular parameters. Therefore, it introduces a benchmarking function. Note that advanced users can still choose parameters manually.

The benchmarking function takes as an input the desired unlocking time specified in milliseconds. It starts with default parameters and performs several PBKDF computations. By increasing and decreasing parameters during computations it tries to find the most secure (the highest) parameters while not exceeding the unlocking time. Additionally, the function performs some estimations to reduce number of required computations because the benchmarking process should not last too long. Due to this approach the resulting parameters are dependent on several conditions, in particular available hardware and current workload of the system.

Currently there exist some limitations which are hard-coded into Cryptsetup. The minimal number of iterations for Argon2 is 4, maximum number is limited by limit of unsigned 32 bit integer for the given platform. The minimum memory parameter is 32 KiB, maximum is 4 GiB. The minimal degree of parallelism for Argon2 is 1, maximum is 4. All the limits are hard-coded in \parencite{cryptsetupgitpbkdfcheck}. Note that the number of parallel threads will be always at most 4 and it will decrease if not enough CPUs are online at the time of the benchmark.

In Cryptsetup version 2.1.0 the default unlocking time is set at 2,000 milliseconds. The default Argon2 memory cost is set at 1,048,576 KiB. Both default values can be changed at compilation time.

\subsection{Benchmarking tool}
As a part of my thesis I performed collection of Argon2 parameters estimated by the function mentioned above. I created a small tool which can perform series of benchmarks with given parameters and output results in human-readable or machine-readable format. Additionally, the tool measures time spent performing the benchmark. Although this information is not directly related to the topic of the thesis, it might be useful in the future. The source code of the tool is part of the thesis \parencite{thesisrepo}.

The tool accepts several parameters. It is possible to specify desired unlocking time, maximum degree of parallelism, maximum memory cost and number of benchmarks performed.

The tool can output results in a text format or in a CSV format including headers for better machine processing. The tool does not perform actual benchmarking, it utilizes API of Libcryptsetup library. Therefore, the library is required for it to work. The tool does not create any Cryptsetup volumes and does not require root privileges.

\subsection{Methodology of collecting of real world parameters}
\label{subsec:laptop}
I used the previously mentioned tool to collect real data on real physical hardware. I tried to simulate various environments in which Cryptsetup can be used to create and unlock encrypted volumes. I decided to use real hardware and not virtual machines because of possible inaccuracies during measurement caused by virtualization layer.

The first hardware configuration consisted of Lenovo Thinkpad P50 laptop with Intel Core i7-6820HQ CPU @ 2.70GHz with 4 cores, 8 threads. The laptop was equipped with 32 GiB of SODIMM DDR4 Synchronous 2,133 MHz (0,5 ns) RAM \parencite{laptopspecs}.

The second configuration comprised Raspberry Pi 3, model B+. This device was equipped with quad core Broadcom CM2837B0, Cortex-A53 (ARMv8) 64-bit SoC @ 1.4GHz processor and 1 GiB LPDDR2 SDRAM memory \parencite{raspberryspecs}. Note that parameters of this device are considerably lower but there still might be scenarios in which a user might wish to perform drive encryption utilizing this device (network attached storage, home media server). The device was running Raspbian GNU/Linux 9 (stretch).

I have performed following steps on both devices:

\begin{enumerate}
\item install required libraries and headers

\item download and unpack the source code of Cryptsetup version 2.1.0

\item configure and compile Cryptsetup without support for udev and blkid to minimize required dependencies, see below for remarks concerning configuration options

\item compile the benchmarking tool linking it to the Libcryptsetup compiled in the previous step

\item if desired, simulate hardware limitations (amount of available CPUs, amount of available memory)

\item run series of benchmarks over unlocking times of 1,000, 2,000, 3,000, 4,000, 5,000, 10,000 and 20,000 milliseconds with 1, 2, 3 and 4 parallel threads with every benchmark repeated 100 times
\end{enumerate}

During compilation, I decided to run the configure script with \verb+--disable-udev+ and \verb+--disable-blkid+ options. They disable support for udev and device signature detection through blkid. These features were not needed during benchmarking, and they caused complications while installing necessary development libraries.

Simulating various hardware conditions by limiting CPU performance and available memory appeared to be an interesting challenge. My goal was to simulate different number of available CPU cores and different amount of available RAM memory. I wanted to collect results from various combinations of those parameters.

At first, I tried to use Control Groups 2 feature of Linux kernel \parencite{cgroups2}. Unfortunately, I was not able to get desired results. In particular, I was not able to guarantee that the benchmarking process will really use limited set of CPU cores. Then I tried to use the Cpulimit project \parencite{cpulimit}. Unfortunately I still was not able to guarantee that the benchmarking function uses only certain number of CPU cores.

At last, I found the solution for limiting number of available CPU cores. I was able to disable individual cores by modifying the file 
\begin{tt}
/sys/devices/system/cpu/cpu\emph{N}/online
\end{tt}
 Where the \emph{N} signifies the number of the core to be turned off. Note that this method allowed me to change state not only of individual cores, but also of individual virtual cores created with hyper-threading technology.

This method could not be applied to raspberry, as the needed file had not existed, probably because of difference of underlying hardware platform. Therefore, I limited number of available CPUs by modifying the 
\begin{tt}
maxcpus=\emph{n}
\end{tt}
 kernel parameter. This device did not offer hyper-threading.

Due to the way in which the Cryptsetup detects available amount of memory, I decided to perform modification to Cryptsetup itself \parencite{cryptsetuputils}. I decided to modify directly the function \emph{crypt\_getphysmemory\_kb} which returns available physical memory in KiB. I modified it so that it reads the value from an environment variable, see the patch in the thesis git repository \parencite{thesisrepo}.

After finishing the benchmark, I used the Datamash tool to perform basic statistic computations on resulting CSV data \parencite{datamash}. Raw data as well as Datamash results can be found in the thesis Git repository \parencite{thesisrepo}.

\FloatBarrier

\subsection{Collected benchmarking results}
Following tables present chosen results gained through methodology described in the previous subsection. Following conventions in table headers are used:

\begin{description}
\item[time] -- requested unlocking time in milliseconds

\item[avg(T)] -- average benchmarked Argon2 time cost expressed as a number of iterations

\item[stdev(T)] -- standard deviation of benchmarked Argon2 time cost expressed as a number of iterations

\item[avg(M)] -- average benchmarked Argon2 memory cost expressed in KiB

\item[stdev(M)] -- standard deviation of benchmarked Argon2 memory cost expressed in KiB
\end{description}

The table \ref{tab:l8c32g} serves as an introduction and it shows what parameters to expect. The most important is the row concerning unlocking time of 2,000 milliseconds as these values were used as baseline for simulating of attacks in the next chapter. Tables \ref{tab:l4c4g} and \ref{tab:l4c2g} demonstrate hardware limitations affecting Argon2 parameters. See the Git repository for more combinations of CPU and memory limitations \parencite{thesisrepo}.

\noindent
\begin{table}
\caption{Benchmark results for laptop with 8 threads and 32 GiB of memory}
\label{tab:l8c32g}
\begin{tabularx}{\textwidth}{| X | X | X | X | X |}
\hline
time in ms & avg(t) in iterations & stdev(t) in iterations & avg(M) in KiB & stdev(M) in KiB\\
\hline
1,000 & 4 & 0 & 804,032.75 & 3,886\\
\hline
2,000 & 6 & 0 & 1,048,576 & 0\\
\hline
3,000 & 9 & 0 & 1,048,576 & 0\\
\hline
4,000 & 12 & 0 & 1,048,576 & 0\\
\hline
5,000 & 15 & 0 & 1,048,576 & 0\\
\hline
10,000 & 31.01 & 0.099 & 1,048,576 & 0\\
\hline
20,000 & 64 & 0 & 1,048,576 & 0\\
\hline
\end{tabularx}
\end{table}

\noindent
\begin{table}
\caption{Benchmark results for laptop with 4 cores and 4 GiB of memory}
\label{tab:l4c4g}
\begin{tabularx}{\textwidth}{| X | X | X | X | X |}
\hline
time in ms & avg(t)  in iterations & stdev(t) in iterations & avg(M) in KiB & stdev(M) in KiB\\
\hline
1,000 & 4 & 0 & 785,173.44 & 4,532\\
\hline
2,000 & 5 & 0 & 1,048,576 & 0\\
\hline
3,000 & 9 & 0 & 1,048,576 & 0\\
\hline
4,000 & 12 & 0 & 1,048,576 & 0\\
\hline
5,000 & 15 & 0 & 1,048,576 & 0\\
\hline
10,000 & 31 & 0 & 1,048,576 & 0\\
\hline
20,000 & 61.97 & 0.298 & 1,048,576 & 0\\
\hline
\end{tabularx}
\end{table}

\noindent
\begin{table}
\caption{Benchmark results for laptop with 4 cores and 2 GiB of memory}
\label{tab:l4c2g}
\begin{tabularx}{\textwidth}{| X | X | X | X | X |}
\hline
time in ms & avg(t)  in iterations & stdev(t) in iterations & avg(M) in KiB & stdev(M) in KiB\\
\hline
1,000 & 4 & 0 & 781,944 & 1,715.758\\
\hline
2,000 & 6 & 0 & 1,048,576 & 0\\
\hline
3,000 & 9 & 0 & 1,048,576 & 0\\
\hline
4,000 & 12 & 0 & 1,048,576 & 0\\
\hline
5,000 & 15 & 0 & 1,048,576 & 0\\
\hline
10,000 & 31.26 & 0.438 & 1,048,576 & 0\\
\hline
20,000 & 63.57 & 1.041 & 1,048,576 & 0\\
\hline
\end{tabularx}
\end{table}

\FloatBarrier

Tables \ref{tab:comp2000} and \ref{tab:comp1000} compare benchmarking results for unlocking time of 1,000 and 2,000 ms respectively. Cryptsetup was compiled with different options explained below:

\begin{description}
\item[none] -- no special options were added

\item[SSE] -- the \verb+--enable-internal-sse-argon2+ configuration option was added. If this option is specified, additional checks for special CPU features are performed (SSE2, SSE3, AVX2, AVX512). If they are detected, Argon2 is compiled with support for the most advanced one.

\item[native] -- the \verb+-march=native+ flag was passed to the compiler, optimizing the source code for the particular platform

\item[external] -- Cryptsetup was compiled against external Argon2 library
\end{description}

\noindent
\begin{table}
\caption{Comparing results for various Cryptsetup compilation options with unlocking time of 2,000 ms}
\label{tab:comp2000}
\begin{tabularx}{\textwidth}{| l | X | X | X | X |}
\hline
flags & avg(t)  in iterations & stdev(t) in iterations & avg(M) in KiB & stdev(M) in KiB\\
\hline
none & 6 & 0 & 1,048,576 & 0\\
\hline
SSE & 7 & 0 & 1,048,576 & 0\\
\hline
native & 6 & 0 & 1,048,576 & 0\\
\hline
native + SSE & 11 & 0 & 1,048,576 & 0\\
\hline
external & 6 & 0 & 1,048,576 & 0\\
\hline
\end{tabularx}
\end{table}

\noindent
\begin{table}
\caption{Comparing results for various Cryptsetup compilation options with unlocking time of 1,000 ms}
\label{tab:comp1000}
\begin{tabularx}{\textwidth}{| l | X | X | X | X |}
\hline
flags in ms & avg(t)  in iterations & stdev(t) in iterations & avg(M) in KiB & stdev(M) in KiB\\
\hline
none & 4 & 0 & 804,032.75 & 3,886.047\\
\hline
SSE & 4 & 0 & 949,740.83 & 2,305.669\\
\hline
native & 4 & 0 & 863,822.25 & 1,414.500\\
\hline
native + SSE & 5 & 0 & 1,048,576 & 0\\
\hline
external & 4 & 0 & 846,713.44 & 1,359.307\\
\hline
\end{tabularx}
\end{table}

Notice that combining optimized versions of Argon2 with compilation for the particular platform produces the best results. It almost doubles the number of iterations compared with the benchmark without these optimizations and it produces only one less iteration compared to unlocking time of 4,000 milliseconds benchmarked without optimization. Unfortunately, such combination of configuration options is not very common. A quick review of several Linux distributions (Debian, Ubuntu, Fedora) shows that distributions package the Argon2 library separately and link the Cryptsetup against it. Also, due to the support of the broadest array of devices possible, platform-specific optimizations are not used.

The table \ref{tab:r4c1g} shows benchmarking results for the Raspberry Pi.  Notice that the memory cost is significantly lower. It is important to note here that the benchmarking algorithm of Cryptsetup will never use more than half of available amount of physical memory to prevent out of memory killer from killing the actual benchmarking process. Note also that the algorithm tries to use maximum available memory first and then tries to increase number of iterations.

\noindent
\begin{table}
\caption{Benchmark results for Raspberry Pi with 4 cores and 1 GiB of memory}
\label{tab:r4c1g}
\begin{tabularx}{\textwidth}{| X | X | X | X | X |}
\hline
time in ms & avg(t)  in iterations & stdev(t) in iterations & avg(M) in KiB & stdev(M) in KiB\\
\hline
1,000 & 4 & 0 & 43,291.74 & 3,628\\
\hline
2,000 & 4 & 0 & 95,296.58 & 8,199\\
\hline
3,000 & 4 & 0 & 150,860.59 & 10,508\\
\hline
4,000 & 4 & 0 & 203,604.14 & 6,723\\
\hline
5,000 & 4 & 0 & 255,361.51 & 10,598\\
\hline
10,000 & 4 & 0 & 474,446.97 & 2,737\\
\hline
20,000 & 7.81 & 0.879 & 474,722 & 0\\
\hline
\end{tabularx}
\end{table}

By comparing tables \ref{tab:r4c1g} and \ref{tab:r4c1go} it can be seen that there is not a significant difference in displayed values. The values from the table \ref{tab:r4c1g} were benchmarked after compilation with \verb+-march=native+ flag. The \verb+--enable-sse-internal-argon2+ option cannot be used on Raspberry Pi, because the ARM architecture does not offer this kind of CPU instructions. It also confirms the premise that Argon2 should be difficult to optimize for other platforms than X86.

\noindent
\begin{table}
\caption{Benchmark results for Raspberry Pi with optimizations}
\label{tab:r4c1go}
\begin{tabularx}{\textwidth}{| X | X | X | X | X |}
\hline
time in ms & avg(t)  in iterations & stdev(t) in iterations & avg(M) in KiB & stdev(M) in KiB\\
\hline
1,000 & 4 & 0 & 25,639.15 & 4,745.422\\
\hline
2,000 & 4 & 0 & 53,143.46 & 1,493.866\\
\hline
3,000 & 4 & 0 & 82,432.72 & 2,296.222\\
\hline
4,000 & 4 & 0 & 111,220.07 & 1,818.082\\
\hline
5,000 & 4 & 0 & 140,627.5 & 1,984.865\\
\hline
10,000 & 4 & 0 & 292,344.17 & 1,726.421\\
\hline
20,000 & 4.17 & 0.375 & 474,722 & 0\\
\hline
\end{tabularx}
\end{table}


\FloatBarrier

\section{Attacking LUKS2}
\label{sec:attack}
This section describes the rest of the practical research. The next chapter then defines and presents the price model which is based on results collected in this chapter. The following three subsections describe utilized hardware and software. The last subsection presents the results.

The motivation behind performing of experimental simulations described in this chapter was to make the price model more precise and realistic. I found some results of benchmarks of Argon2 on CPU or GPU \parencite{argon2gpuold}, they are, however, almost four years old. Since that time there appeared new technologies in GPU and CPU segment and the Argon2 went through several upgrades. Another reason was that by using a real hardware I was able to fine-tune parameters such as number of CPU cores, amount of available memory, available specialized CPU instructions etc.

\subsection{Utilized hardware}
For the purpose of this thesis I chose to utilize machines offered by the MetaCentrum VO organization because they are easily accessible to registered students, and they offer diverse spectrum of computing resources. This proved useful especially considering the fact that according to the definition of an attacker in section~\ref{sec:attacker} it is expected to utilize as powerful machines as possible. MetaCentrum VO is a virtual organization supporting all members of Czech National Grid Organization. It manages and operates distributed  computing and storage resources owned by Cesnet as well as other cooperating academic organizations in Czech Republic.

The following list describes parameters of the most interesting machines used during the research. For description of all the machines offered in Metacentrum see \parencite{metacentrumhw}.

\begin{description}
\item[Doom] -- $2 \times 8$ cores Intel Xeon E5-2,650v2 2.60 GHz, hyper-threading 2 threads per core, supporting SSSE3 instruction, 64 GiB RAM, Nvidia Tesla K20 with 4,743 MiB of memory

\item[Konos] -- $2\times10$ cores Intel Xeon CPU E5-2630 v4 at 2.20 GHz, hyper-threading 2 threads per core, supporting AVX2 instruction, 128 GB of RAM, Nvidia GeForce GTX 1080 Ti with 11,178 MiB of GPU memory

\item[Nympha] -- $2\times16$ cores Intel Xeon Gold 6130 CPU at 2.10 GHz, hyper-threading 2 threads per core, supporting AVX512F instruction, 192 GiB of RAM

\item[Manegrot] -- $4\times$ 8-core Intel Xeon E5-4627v2 at 3.30GHz, no hyper-threading, supporting SSSE3 instruction, 512 GiB of RAM

\item[Uruk] -- $8\times18$ cores Intel Xeon Gold 6140  3.70 GHz, hyper-threading 2 threads per core, supporting AVX512F instruction, 2.86 TiB of RAM

\item[White] -- $2\times12$ core Intel Xeon Gold 6138 2.0 GHz, no hyper-threading, supporting SSSE3 instruction, 512 GiB of RAM, GPU Tesla P100 with 16,280 MiB of memory

\item[Zubat] -- $2\times8$ core Intel Xeon E5-2630v3 2.40GHz, hyper-threading 2 threads per core, supporting SSSE3 instruction, 128 GiB of RAM, GPU Tesla K20Xm with 5,700 MiB of memory
\end{description}

While performing computations on CPU I always allocated whole computing host and requested maximum number of available CPUS. I always requested at least $(T/L) \times M$ GiB of memory where \emph{T} denotes number of available CPU threads, \emph{L} denotes degree of parallelism for instance of Argon2 being computed and \emph{M} denotes memory cost. While simulating attacks on GPU I always reserved one instance of GPU, 1 CPU and 1 GiB of RAM memory. Reservations of GPUs are always exclusive. All tests used SSDs as underlying storage.

\subsection{Naive method}
\label{subsec:naive}
The first chosen attack method is naive but simple. It requires Libcryptsetup library to be present because it directly utilizes the API. The main idea is to try to recover the master key through normal means but to perform this action in multiple parallel threads.

I created the tool which performs this attack. The tool was based on proof of concept called Dict\_search \parencite{cryptsetupdictsearch}.  I modified it so that it uses threads instead of processes and it measures every recovery attempt and provides statistics. I also improved its usability. The tool can be found in the thesis Git repository \parencite{thesisrepo}.

The tool expects following inputs where letters denoting parameters correspond to the command line options:

\begin{description}
\item[t] -- type of encrypted volume, so far LUKS1, Truecrypt and LUKS2 are recognized and other types supported by Cryptsetup can be easily added by adding appropriate handler into the source code

\item[i] -- base name of input device files

\item[p] -- input password file with one password per line

\item[T] -- number of parallel threads
\end{description}

The tool performs basic validation of input parameters and launches defined number of threads. Every thread receives a handler to the specified file containing the encrypted partition. Every thread opens the file with passwords and starts attempting to unlock the keyslot with passwords read from the input file. The file is internally divided into segments and every thread reads a different segment. The tool runs until the passphrase is found or all the passphrases in the input file had been tried.

Memory footprint of this method can be expressed as $M \times T$, where \emph{M} denotes the memory cost parameter used during creation of the encrypted partition and \emph{T} denotes number of cracking threads specified while launching the tool. The number of utilized CPU threads can be expressed as $P \times T$, where \emph{P} denotes the degree of parallelism of Argon2 used during initialization of the encrypted drive and \emph{T} is number of parallel cracking threads.

Because an attacker would try to speed up the cracking process as soon as possible, I compiled Cryptsetup with support for optimized Argon2 using the \verb+--enable-sse-internal-argon2+ and I applied a simple patch which disabled clearing of internal Argon2 memory after every Argon2 computation. The patch can be found in the Git repository of the thesis \parencite{thesisrepo}.

\subsection{Method based on Argon2-gpu-bench tool}
This method utilizes a tool called Argon2-gpu-bench written by Ondrej Mosnáček \parencite{argon2gpu}. The tool was originally created to benchmark speed and energy consumption of Argon2 computing hashes. The background scenario was offline brute-force attack against Argon2 hash, which overlaps with goals of my thesis. That is the reason why I selected this tool.

Note that this is not a password guessing tool in its current implementation. It only computes provided Argon2 hashes. Originally the tool generated random strings 64 bytes long. I modified the tool so that strings are loaded from provided file containing one password per line. My modifications can be found at \parencite{argon2gpuvojta}.

The scenario expects that an attacker would use this tool to precompute Argon2 hashes and later, possibly in parallel, would use hashes as decryption keys to recover the master key. The right password could be verified by decrypting several sectors of a volume with recovered master key and searching for well-known signatures. Note that AES used during decryption of sectors can be definitely parallelized and there exist special CPU instructions to improve its speed. Although the time required to try all the candidate hashes as decryption keys is not negligible, it is definitely smaller and less memory-intensive compared to Argon2 computations.

The tool can compute Argon2 using three different modes. It can utilize standard CPU implementation and automatically choose the most optimized one according to supported CPU instructions. Clearing of internal memory is disabled for performance reasons. It also contains Argon2 implementation for OpenCL platform and three different GPU kernels (implementations) to be used with CUDA platform. Every GPU kernel handles the GPU memory differently, possibly producing different results under different parameters.

The \emph{Master} kernel uses only shared memory of GPU. The \emph{Warpshuffle} kernel uses only GPU registers. The \emph{Warpshuffleshared} kernel uses combination of both. Additionally, while running in GPU or OpenCL mode, it is possible to select two modes of operation. The \emph{Oneshot} computes individual Argon2 computation as one GPU / OpenCL task, whereas \emph{By-segment} mode divides computation of Argon2 slices among individual GPU / OpenCL tasks.

Argon2-gpu-bench can also precompute values for Argon2i and Argon2id, speeding up the process. This feature is available only for CUDA and OpenCL modes.

Following list describes chosen input parameters for the application. Letters match command line options.

\begin{description}
\item[L] -- Argon2 degree of parallelism

\item[M] -- Argon2 memory cost parameter

\item[T] -- number of Argon2 iterations

\item[s] -- number of computations

\item[b] -- number of hashes computed in every computation (batches)
\end{description}

Please notice the \emph{s} and \emph{b} parameters, they are important while interpreting results collected through this tool. The tool performs \emph{s} consecutive discrete computations. Every such computation is measured separately and the tool outputs its length including time of writing and reading of data for OpenCL and CUDA modes. In every such phase \emph{b} hopefully parallel hash computations is performed. I say hopefully because it is not always possible or desirable. Length of every such small computation is measured and averaged at the end of the whole benchmark.  So at the end the tool performs $s \times b$ computations.

The tool tries to run as many parallel threads as possible. The number of such threads is determined as $b \times L$. The amount of needed memory is determined as $b \times M$. However, there are upper limits for both values considering effectiveness of calculations. While computing on CPU through standard Argon2 CPU implementation or OpenCL platform, the number of available CPU threads should be effectively the limit. While performing computations on GPU through OpenCL or CUDA, the tool cannot run if amount of memory required is higher than amount of memory available at the GPU.

\FloatBarrier

\subsection{Results}
Here I describe and compare results collected by two methods described in previous subsections. These numbers serve as direct basis for the price model described in the next chapter. The following list shows column headings used in tables with their description and appropriate units:

\begin{description}
\item[threads] -- number of threads passed as parameter to cracker

\item[batch size] -- number of concurrent Argon2 computations passed through \verb+-b+ parameter to the Argon2-gpu-bench tool

\item[avg(time per hash)] -- average time spent calculating one hash in milliseconds

\item[stdev(time per hash)] -- standard deviation of time spent computing one hash in milliseconds

\item[avg(time per batch)] -- average time spent computing one batch of hashes in milliseconds

\item[stdev(time per batch)] -- standard deviation of time spent computing one batch of hashes in milliseconds
\end{description}

Tables \ref{tab:nymphacracker}, \ref{tab:nymphacpu}  and \ref{tab:konosgpu} show us a baseline of results. Nympha and Konos were the most powerful machines for computing Argon2 I managed to find in MetaCentrum. We can see that cracking hashes with high memory parameter (1,049,576 KiB) is highly ineffective with the naive approach. The best result was achieved while running two concurrent unlocking operations, therefore occupying 8 threads. This configuration enabled computation of 1 hash per 1.107 seconds.

However, using powerful enough CPU supporting AVX512F instructions it is possible to get more interesting results. The table \ref{tab:nymphacpu} is very interesting in particular. As we can see, we can get to the speed of 1 hash per 215 milliseconds, that is 4.651 hashes per second. From the theoretical point of view, the most effective computation should occur while running 16 concurrent Argon2 computations using $16 \times 4 = 64$ threads, as Nympha machine offers 64 CPU threads. As we can observe, at this degree of parallelism one hash can be computed in 296.3 milliseconds. However, increasing degree of parallelism up to 512 decreases the time up to 215 milliseconds. This is probably caused by desynchronization of threads during the computing process because it is not possible to have 512 threads running at the same time if every thread requires 1 GiB of memory and the machine has only 180 GiB of memory. Apparently threads get managed by operating system scheduler so that performance is not degraded.

Considering GPU it is possible to get even slightly better times. The GPU provided by Konos machine can run at most 10 parallel Argon2 computations calculating one hash in circa 307 milliseconds, that is 3.25 hashes per second. The table \ref{tab:konosgpuopt} shows us that thanks to precomputing values for Argon2i it is possible to reach 294.5 milliseconds per hash. All values in table \ref{tab:konosgpuopt} were computed with $\emph{b} = 10$.

\noindent
\begin{table}
\caption{Cracker running on Nympha against drive benchmarked on laptop with unlocking time of 2,000 ms}
\label{tab:nymphacracker}
\begin{tabularx}{\textwidth}{| X | X | X |}
\hline
threads & avg(time per hash) in ms & stdev(time per hash) in ms\\
\hline
1 & 1,584.767 & 50.545\\
\hline
2 & 1,634.917 & 82.044\\
\hline
4 & 1,906.227 & 106.585\\
\hline
8 & 2,502.031 & 178.339\\
\hline
16 & 4,302.858 & 385.439\\
\hline
32 & 8,482.634 & 816.596\\
\hline
64 & 17,180.945 & 1,810.248\\
\hline
128 & 34,929.874 & 3,893.617\\
\hline
\end{tabularx}
\end{table}

\noindent
\begin{table}
\caption{Argon2-gpu-bench running on Nympha in CPU mode against parameters benchmarked on laptop with unlocking time of 2,000 ms}
\label{tab:nymphacpu}
\begin{tabularx}{\textwidth}{| X | X | X | X | X |}
\hline
batch size & avg(time per hash) in ms & stdev(time per hash) in ms & avg(time per batch) in ms & stdev(time per batch) in ms\\
\hline
1 & 1,641.408 & 29.667 & 1,641.408 & 29.667\\
\hline
2 & 893.785 & 21.578 & 1,787.570 & 43.156\\
\hline
4 & 525.360 & 10.351 & 2,101.441 & 41.407\\
\hline
8 & 351.323 & 10.219 & 2,810.587 & 81.753\\
\hline
16 & 296.345 & 4.376 & 4,741.530 & 70.018\\
\hline
32 & 252.618 & 2.967 & 8,083.808 & 94.964\\
\hline
64 & 235.185 & 2.112 & 15,051.856 & 135.227\\
\hline
128 & 224.986 & 2.046 & 28,798.256 & 261.982\\
\hline
256 & 219.888 & 1.691 & 56,291.514 & 433.037\\
\hline
512 & 217.429 & 1.660 & 111,323.760 & 850.345\\
\hline
\end{tabularx}
\end{table}

\noindent
\begin{table}
\caption{Argon2-gpu-bench running on Konos in CUDA mode against parameters benchmarked on laptop with unlocking time of 2,000 ms}
\label{tab:konosgpu}
\begin{tabularx}{\textwidth}{| X | X | X | X | X |}
\hline
batch size & avg(time per hash) in ms & stdev(time per hash) in ms & avg(time per batch) in ms & stdev(time per batch) in ms\\
\hline
1 & 2,996.175 & 0.057 & 2,996.175 & 0.057\\
\hline
2 & 1,499.818 & 0.025 & 2,999.637 & 0.051\\
\hline
3 & 1,000.870 & 0.049 & 3,002.610 & 0.149\\
\hline
4 & 752.094 & 0.015 & 3,008.378 & 0.060\\
\hline
5 & 602.253 & 0.007 & 3,011.267 & 0.037\\
\hline
6 & 502.370 & 0.106 & 3,014.223 & 0.640\\
\hline
7 & 430.885 & 0.007 & 3,016.202 & 0.053\\
\hline
8 & 382.385 & 0.079 & 3,059.081 & 0.638\\
\hline
9 & 340.405 & 0.060 & 3,063.645 & 0.545\\
\hline
10 & 306.682 & 0.042 & 3,066.823 & 0.425\\
\hline
\end{tabularx}
\end{table}


\noindent
\begin{table}
\caption{Argon2-gpu-bench running on Konos in CUDA mode comparing various options against parameters benchmarked on laptop with unlocking time of 2,000 ms}
\label{tab:konosgpuopt}
\begin{tabularx}{\textwidth}{| X | X | X | X | X |}
\hline
option & avg(time per hash) in ms & stdev(time per hash) in ms & avg(time per batch) in ms & stdev(time per batch) in ms\\
\hline
master & 306.682 & 0.042 & 3,066.823 & 0.425 \\
\hline
master + precomputes & 294.535 & 0.045 & 2,945.355 & 0.456 \\
\hline
master + oneshot & 305.809 & 0.009 & 3,058.096 & 0.989 \\
\hline
master + oneshot + precomputes & 334.033 & 0.008 & 3,340.339 & 0.085 \\
\hline
warp\-shuffle & 306.163 & 0.838 & 3,061.633 & 8.387 \\
\hline
warpshuffle + precomputes & 294.932 & 0.054 & 2,949.324 & 0.547\\
\hline
warp\-shuffle\-shared & 306.103 & 0.834 & 3,061.032 & 8.340\\
\hline
warp\-shuffle\-shared + precomputes & 297.277 & 0.003 & 2,972.774 & 0.037\\
\hline
\end{tabularx}
\end{table}

\FloatBarrier

Comparing table \ref{tab:nymphacpu} with tables \ref{tab:manegrot} and \ref{tab:uruk} shows that there probably exist several factors influencing performance of Argon2 hashing on different CPUs. Note that Manegrot does not support AVX512F instruction nor AVX2 instruction. Therefore, the performance is degraded even taking into account higher frequency of its CPU. Uruk is getting close to results of Nympha, but even while having more CPUs with higher frequency, it is still lacking circa 20 milliseconds.

\FloatBarrier

\noindent
\begin{table}
\caption{Argon2-gpu-bench running on Manegrot against parameters benchmarked on laptop with unlocking time of 2,000 ms}
\label{tab:manegrot}
\begin{tabularx}{\textwidth}{| X | X | X | X | X |}
\hline
batch size & avg(time per hash) in ms & stdev(time per hash) in ms & avg(time per batch) in ms & stdev(time per batch) in ms\\
\hline
1 & 3,385.726 & 119.515 & 3,385.726 & 119.515\\
\hline
2 & 1,622.290 & 46.643 & 3,244.581 & 93.287\\
\hline
4 & 818.661 & 20.524 & 3,290.041 & 82.098\\
\hline
8 & 657.445 & 14.087 & 5,259.567 & 112.699\\
\hline
16 & 629.229 & 11.710 & 10,067.678 & 187.361\\
\hline
32 & 606.408 & 9.880 & 19,405.078 & 316.163\\
\hline
64 & 594.680 & 6.248 & 38,059.535 & 399.911\\
\hline
128 & 585.261 & 6.576 & 74,913.420 & 841.826\\
\hline
256 & 580.427 & 6.157 & 148,589.520 & 1,576.225\\
\hline
\end{tabularx}
\end{table}

\noindent
\begin{table}
\caption{Argon2-gpu-bench running on Uruk against parameters benchmarked on laptop with unlocking time of 2,000 ms}
\label{tab:uruk}
\begin{tabularx}{\textwidth}{| X | X | X | X | X |}
\hline
batch size & avg(time per hash) in ms & stdev(time per hash) in ms & avg(time per batch) in ms & stdev(time per batch) in ms\\
\hline
1 & 1,420.081 & 158.506 & 1,420.081 & 158.506\\
\hline
2 & 522.104 & 83.144 & 1,044.208 & 166.289\\
\hline
4 & 323.664 & 49.838 & 1,294.656 & 199.352\\
\hline
8 & 370.464 & 9.467 & 2,963.713 & 75.738\\
\hline
16 & 288.062 & 36.603 & 4,609.003 & 585.661\\
\hline
32 & 262.378 & 20.554 & 8,396.125 & 657.729\\
\hline
64 & 248.382 & 32.444 & 15,896.489 & 2,076.469\\
\hline
128 & 262.579 & 56.605 & 33,610.180 & 7,245.519\\
\hline
256 & 243.868 & 20.708 & 62,430.24 & 5,301.325\\
\hline
512 & 266.685 & 31.304 & 136,542.840 & 16,027.799\\
\hline
\end{tabularx}
\end{table}

The table \ref{tab:gpus} summarizes results of running Argon2-gpu-bench on several different GPU models. We can see that results can vary from 294 milliseconds per hash to 4.5 seconds per hash. All runs were computed using maximum amount of GPU memory and precomputing of values for Argon2i was enabled.

\noindent
\begin{table}
\caption{Comparing Argon2-gpu-bench on various GPUs against parameters benchmarked on laptop with unlocking time of 2,000 ms}
\label{tab:gpus}
\begin{tabularx}{\textwidth}{| X | X | X | X | X | X |}
\hline
device & batch size & avg (time per hash) in ms & stdev (time per hash) in ms & avg (time per batch) in ms & stdev (time per batch) in ms\\
\hline
Doom (CUDA) & 4 & 4,463.151 & 0.566 & 17,852.602 & 2.265\\
\hline
Konos (CUDA) & 10 & 294.535 & 0.045 & 2,945.355 & 0.456\\
\hline
Konos (OpenCL) & 10 & 330.875 & 0.0004 & 3,308.754 & 0.0004\\
\hline
White  (CUDA) & 15 & 332.779 & 0.388 & 4,991.698 & 5.823\\
\hline
Zubat (CUDA) & 5 & 1,922.469 & 0.601 & 9,612.346 &0.300\\
\hline
\end{tabularx}
\end{table}

\noindent
\begin{table}
\caption{Cracker running on Nympha cracking drive with parameters benchmarked on Raspberry Pi with unlocking time of 2,000 ms}
\label{tab:nymphacrackerrpi}
\begin{tabularx}{\textwidth}{| X | X | X |}
\hline
threads & avg(time per hash) in ms & stdev(time per hash) in ms\\
\hline
1 & 191.108 & 28.144\\
\hline
2 & 181.387 & 19.477\\
\hline
4 & 175.174 & 17.050\\
\hline
8 & 217.215 & 25.309\\
\hline
16 & 348.036 & 44.765\\
\hline
32 & 757.198 & 134.525\\
\hline
64 & 1,574.535 & 227.503\\
\hline
128 & 3,352.727 & 609.037\\
\hline
\end{tabularx}
\end{table}

\noindent
\begin{table}
\caption{Comparing cracking methods for parameters benchmarked on Raspberry Pi with unlocking time of 2,000 ms}
\label{tab:rpicrack}
\begin{tabularx}{\textwidth}{| X | X | X | X | X | X |}
\hline
device & batch size & avg(time per hash) in ms & stdev (time per hash) in ms & avg(time per batch) in ms & stdev (time per batch) in ms\\
\hline
Nympha (CPU) & 512 & 12.559 & 0.191 & 6,430.389 & 98.118\\
\hline
Konos (GPU) & 118 & 3.132 & 0.708 & 369.580 & 0.835\\
\hline
White (GPU) & 172 & 2.433 & 0.0004 & 418.639 & 0.812\\
\hline
Zubat (GPU) & 18 & 15.791 & 0.348 & 947.505 & 2.093\\
\hline
\end{tabularx}
\end{table}

\chapter{The price of an attack}
\label{chap:model}
This chapter focuses on creating a price model for potential attacker trying to gain unauthorized access to a disk volume encrypted with LUKS2 with Argon2 used as PBKDF. First an attacker and available hardware and software options are briefly described. Then the actual price model is introduced and applied to real world examples. The price model is based on real parameters collected during benchmarking in section~\ref{sec:benchmark} and mainly on computing Argon2 hashes based on these parameters using powerful hardware configurations described in section~\ref{sec:attack}.

\section{Attacker definition}
\label{sec:attacker}
For the purpose of this thesis an attacker is defined as an entity which gained unauthorized access to a LUKS2 volume header with Argon2 used as PBKDF. The header can be part of actual encrypted volume or it can be a detached header stored for example for backup purposes. The volume is not damaged. The attacker has moderate knowledge of information security and computers in general but does not want to invest the time in searching for vulnerabilities in Argon2 or Cryptsetup. The attacker decides to use brute-force or dictionary attack.

The attacker currently does not have any hardware with sufficient computing power and amount of RAM to be used for cracking the passphrase. However, suppose that the attacker has sufficient financial resources to purchase needed hardware or rent cloud computing resources. There is no upper limit set for financial resources.

\subsection{Hardware}
While considering cracking a password or a hash, there are several types of hardware to consider. An attacker can use powerful CPUs which can be cheaper solution compared to other possibilities, but they are also usually slower because of their generic nature. The fact which is crucial while considering Argon2 is that they can have access to relatively large volume of RAM without degrading their computing performance. It is also less time-consuming to eventually optimize the hash algorithm for CPU than for other types of hardware.

The next option often used in this process is to use GPUs. Their architecture is suitable for computing many parallel identical tasks as is the case for hash cracking. GPUs showed to be effective against PBKDF2 \parencite{mosnacek}. However, the price of a GPU might be higher than the price of a CPU. To use full computing power of GPU it is important that the data being processed is copied into GPU memory which is limited. This fact could reduce their effectiveness in this particular case considering possible high memory demands of Argon2.

Last two options are FPGAs and ASICs. They represent two groups of hardware which can be optimized for highly specific tasks. FPGAs are in general less powerful but they can be easily reprogrammed after production. ASICs can incorporate parts which can be later reprogrammed but definitely not to the same extent as FPGAs. ASICs are also more expensive than FPGAs when produced in small volumes. Moreover, to the best of the author's knowledge, there does not exist any publicly available implementation of Argon2 for FPGAs or ASICs and it would take for an attacker nonnegligible amount of time to create one.

\subsection{Software}
There does not exist any publicly available software offering feature to crack passphrases of LUKS2 headers. There exists a simple proof of concept program distributed with Cryptsetup called Dict\_search \parencite{cryptsetupdictsearch}. It mounts dictionary attack against specified device supporting Truecrypt and LUKS1. After slight modifications it can be used also against LUKS2 volumes. It uses API of Libcryptsetup. Therefore, it obviously introduces slight overhead by creating and deleting structures pertaining to Cryptsetup. Modification of this tool is used in this thesis in subsection~\ref{subsec:naive}.

Then there is a possibility of extracting the master key from the captured LUKS2 header and trying to find the right password by comparing it to it after being hashed. This removes some slight overload mentioned above. Support for cracking of Argon2 hashes appeared in well-known open source password auditing software John the ripper in July 2016. This allows to use some infrastructure offered by the framework but the hash function is not optimized in any way.

Ondrej Mosnáček created an experimental program for benchmarking speed of Argon2 while running on CPUs and GPUs \parencite{argon2gpu}. The project can use both CUDA and OpenCL technologies to run on multiple GPUs or CPUs. The program currently does not perform password cracking, passwords are generated randomly and they are not compared to any hash. But slight modifications to the project could turn it into a password cracker. I modified this tool so that it reads passwords from a provided file, see \parencite{argon2gpuvojta}.

\section{Price model}
\label{sec:model}
Based on previous assumptions of an attacker's options I tried to create a price model which will estimate costs connected with finding the right passphrase to unlock the LUKS2 encrypted volume. These costs include purchase of devices and electricity costs. Following variables were defined to formalize the model:

\begin{description}
\item[D] -- power draw of one machine expressed in kilowatts

\item[E] -- price of electricity expressed in dollars per kilowatt hour

\item[F] -- expected final price of whole attack expressed in dollars

\item[H] -- initial price of one machine (CPU, RAM, accessories) expressed in dollars

\item[L] -- expected length of an attack expressed in days

\item[N] -- number of machines expressed as an integer

\item[P] -- number of passwords contained in the chosen password space expressed as an integer

\item[R] -- price to rent one machine for one hour expressed in dollars

\item[S] -- speed of one machine expressed as number of seconds spent computing one hash
\end{description}

The model expects that an attacker uses array of homogeneous machines and that the speed of Argon2 hash cracking was at least approximately benchmarked. The model does not assume any hints about the password. The model expects that passwords are distributed uniformly through the password space and therefore an attacker should on average search through one half of \emph{P} to recover the password. The model can simulate two different cases. In the first case an attacker plans to purchase actual physical hardware and in the second case an attacker rents machines from an online cloud provider (Amazon, Microsoft Azure, Alibaba Cloud etc.)

The final price is determined by set of following equations:

\begin{equation}
\label{eq1}
N = \frac{\frac{S}{60 \times 60 \times 24} \times \frac{P}{2}}{L}
\end{equation}

\begin{equation}
\label{eq2}
F = ( N \times H ) + ( N \times D \times E \times 24 \times L)
\end{equation}

\begin{equation}
\label{eq3}
F = N \times R \times 24 \times L
\end{equation}

The equation \ref{eq1} determines how many machines are needed to exhaust complete password space \emph{P} in \emph{L} days. The result denoted as \emph{N} can then be used in equations \ref{eq2} or \ref{eq3}.

The equation \ref{eq2} is used in the case of purchasing and using physical hardware. In that case an attacker should estimate power draw of single machine by collecting information from data sheets or rather by performing real world tests. There exist online versions of power consumption calculators. If an attacker decides to allocate computing resources in online clouds, then the equation \ref{eq3} should be used. Online cloud providers usually provide online price lists for their services.

\section{Real world cost estimation}
In this section I apply the model defined in section \ref{sec:model} to real world data. Please note that results acquired through this price model are only estimates based on data sources described below and experiments performed throughout this thesis described in chapter~\ref{chap:analysis}.

Following subsections show that to crack a passphrase eight characters long used for LUKS2 volume created on modern laptop, hundreds of thousands or even millions of machines are needed and the expected price is in range of billions of dollars. Even if parameters are benchmarked on relatively low performance hardware such as Raspberry Pi, there are still needed thousands of machines and the price stays in millions of dollars. When compared to similar research performed earlier with PBKDF2, it is visible that the price of an attack increased with Argon2 \parencite{pbkdfresearch}.

\subsection{Sources of real world data}
Information about price and TDP of CPUs were collected from \parencite{intelspecs}, whereas information about GPUs were collected from \parencites{teslak20}{teslap100}{geforceprice}{geforce}. Information about prices of RAM memory were collected from \parencite{memoryprice}. Estimated electricity price is based on \parencite{electricity} and \parencite{euroelectricity}. Prices of cloud services are based on available offers from Amazon \parencites{amazon}{amazoncalc}, Microsoft Azure \parencite{azure} and Alibaba Cloud \parencite{alibaba}.

Note that power draw of machines is based on TDP of CPU or GPU as listed in particular technical specifications. Power draw of RAM and other peripheral components is not taken into account as it is relatively small and at the same time it is hard to set certain average value. I understand that TDP is not exact representation of actual power draw but as experimental machines were not under my physical control, I decided to use this value.

I define the \emph{P} as space of passphrases eight characters long containing upper case letters, lower case letters and numbers.

$$P = (26 + 26 + 10)^8 = 2.183401056 \times 10^{14}$$

I set the \emph{E} constant to 0.21. This was the price of electricity per kilowatt hour in USA expressed in dollars as shown in \parencite{electricity}. The average price in EU was even slightly higher after conversion from euro to dollars reaching 0.23.

\subsection{Case 1 - modern laptop}
\FloatBarrier
For the first case, suppose that an attacker decides to buy machines comparable to the machine Nympha which can compute one hash in circa 217 ms. Also suppose that an attacker gets hold of a drive encrypted with LUKS2 with parameters of 6 iterations, 1,048,576 KiB of memory required and degree of parallelism 4.

\noindent
\begin{table}
\caption{number of Nympha-like machines to exhaust password space depending on expected length of attack}
\label{tab:at1}
\begin{tabularx}{\textwidth}{| X | X |}
\hline
L in years & number of machines\\
\hline
1 & 751,202\\
\hline
2 & 375,601\\
\hline
5 & 150,241\\
\hline
10 & 75,121\\
\hline
\end{tabularx}
\end{table}

Further, let's set $H = 2 \times 1,900 + 14 \times 70 = 4,780$ as total price of 2 16-core CPUs and 14 16 GiB memory modules. TDP of the particular CPU is 125 watts, running 24 hours per day the power draw is 3 kilowatts.

$$F = ( N \times 4,780 ) + ( N \times 3 \times 0.21 \times 24 \times L)$$

Inserting respective \emph{N} and \emph{L} values according to the table \ref{tab:at1} produces following results.

\noindent
\begin{table}
\caption{Final prices for first model case using physical hardware}
\label{tab:at2}
\begin{tabularx}{\textwidth}{| X | X | X |}
\hline
L  in years & number of machines & F in dollars\\
\hline
1 & 751,202 & 7,736,479,157.6\\
\hline
2 & 375,601 & 5,941,106,377.6\\
\hline
5 & 150,241 & 4,863,902,134.0\\
\hline
10 & 75,121 & 4,504,856,128.0\\
\hline
\end{tabularx}
\end{table}

The table \ref{tab:at2} shows that prices are in range of billions of dollars. Just to put this into some context, if an attacker would like to crack the passphrase in one year, according to this model the amount is equal to circa 3.4 percent of Czech Republic GDP \parencite{czechgdp}. Notice that although the number of machines was cut by one half for \emph{L} set to 10 years, the total price dropped only by circa 7.3 percent.

Considering cloud options there are multiple choices. For example Microsoft Azure cloud offers F72s v2 instances offering 72 virtual CPUs based on Intel Xeon Platinum 8168 which can reach sustained turbo frequencies for all cores up to 3.4 GHz and supports AVX512F instruction set. The instance provides only 144 GiB of RAM but it is sufficient as a rough estimate. Following table shows expected final prices counting in discounts for long term usage. Values were calculated by placing appropriate price values showed at \parencite{azure} into equation \ref{eq3}. In particular, at the time of writing (spring 2019) the price were set at \$1.7996 per hour if the instance is reserved for one year and \$1.1161 if it is reserved for three years. The price without reservation discounts was \$3.06 per hour and was not taken into consideration at all.

\noindent
\begin{table}
\caption{Final prices for first model case using Microsoft Azure Cloud}
\label{tab:at3}
\begin{tabularx}{\textwidth}{| X | X | X |}
\hline
L  in years & number of machines & F in dollars\\
\hline
1 & 751,202 & 11,842,320,924.19\\
\hline
2 & 375,601 & 11,842,320,924.19\\
\hline
5 & 150,241 & 7,344,558,328.38\\
\hline
10 & 75,121 & 7,344,607,213.559999\\
\hline
\end{tabularx}
\end{table}

The table \ref{tab:at3} shows that renting an online instance for the same computation is definitely not the cheaper option. Even with expected time of ten years the price is still above seven billion of dollars per hash.

\FloatBarrier

\subsection{Case 2 - Raspberry Pi}
As the second example suppose that an attacker gains access to a drive encrypted with parameters benchmarked on Raspberry Pi. In particular parameters were 4 iterations, 95,296.58 KiB of memory and degree of parallelism 4. Suppose that an attacker decides to use Nvidia GeForce GTX 1080 Ti with 11,178 MiB of GPU memory equal with provided one by Konos machine. In the table \ref{tab:rpicrack} it is shown that it takes 3.132034 milliseconds to compute one Argon2 hash.

\noindent
\begin{table}
\caption{Number of Konos-like machines to exhaust password space depending on expected length of attack}
\label{tab:at4}
\begin{tabularx}{\textwidth}{| X | X |}
\hline
L in years & number of machines\\
\hline
1 & 10,843\\	
\hline
2 & 5,422\\
\hline
5 & 2,169\\
\hline
10 & 1,085\\
\hline
\end{tabularx}
\end{table}

Suppose $H = 1,000$, Nvidia unfortunately does not provide suggested customer price, therefore the price was estimated according to \parencite{geforceprice}. The TDP of the card is 250 watts, therefore \emph{D} is set to 6 kilowatts.

\noindent
\begin{table}
\caption{Final prices for the second model case using physical hardware}
\label{tab:at5}
\begin{tabularx}{\textwidth}{| X | X | X |}
\hline
L  in years & number of machines & F in dollars\\
\hline
1 & 10,843 & 130,523,696.8\\
\hline
2 & 5,422 & 125,113,734.39\\
\hline
5 & 2,169 & 121,871,772.0\\
\hline
10 & 1,085 & 120,842,959.99\\
\hline
\end{tabularx}
\end{table}

The table \ref{tab:at5} confirms that drastically decrease in parameters lowers also the expected attack price. Note that the memory parameter was decreased almost ten times. The price is still above hundred million of dollars. Notice also that the price to buy initial hardware affects the final price less with longer expected time of attack.

Although this GPU did not achieve the best hash computing time, it provided relatively low price compared to Nvidia Tesla P100 (estimated 1,000 dollars versus estimated 4,500 dollars). But it is possible to rent Tesla P100 in the cloud. As an example let's take the instance NCV2 providing up to 4 GPUs per instance provided by Microsoft Azure. The table \ref{tab:at6} shows the results and demonstrates that in certain extreme situations it might be cheaper to rent online resources. Note that at the time of writing (spring 2019) the renting price was set to \$1.3187 per hour for one year reserved instance and \$0.9189 for three years reserved instance.


\noindent
\begin{table}
\caption{Final prices for the second model case using Microsoft Azure Cloud}
\label{tab:at6}
\begin{tabularx}{\textwidth}{| X | X | X |}
\hline
L  in years & number of machines & F in dollars\\
\hline
1 & 10,843 & 125,256,297.52\\
\hline
2 & 5,422 & 125,267,849.33\\
\hline
5 & 2,169 & 87,297,521.58\\
\hline
10 & 1,085 & 87,337,769.4\\
\hline
\end{tabularx}
\end{table}

\chapter{Conclusions}
The thesis introduced password-based key derivation functions and their usage in disk volume encryption focusing on LUKS2 encryption scheme. Resistance of LUKS2 to brute-force or dictionary-based password cracking was researched and experimentally verified.

Firstly a custom tool was used to collect real world Argon2 parameters as benchmarked by Cryptsetup. Two very different but real hardware configurations were used; a modern high-end laptop and a single-board computer. Factors influencing such benchmarks were briefly described. These data were then used as a basis for experimental simulation of brute-force attack on LUKS2 passphrase.

There were introduced two methods. The first method used API provided by Cryptsetup. The second method used slightly modified tool originally created for benchmarking Argon2 performance. The first method proved to be less effective and it was not considered in further research. Both methods were applied on a hardware provided by MetaCentrum VO organization. This hardware provided tens of CPU cores and hundreds of gibibytes of RAM per single machine because it was assumed that determined attacker would buy or rent such machines. Both CPUs and GPUs were used if applicable. Results were collected and showed.

Based on information about Argon2 and collected data the price model was created. This model was then applied to real world data described in previous chapters. Note that some data had to be only estimated (power draw of machines, hardware prices etc). The model showed that resistance of Argon2 to brute-force guessing had greatly improved compared to PBKDF2.

To point out one example, cracking in ten years an eight characters long passphrase used to unlock encrypted volume in circa two seconds on Raspberry PI could take up to 1,085 Nvidia Tesla P100 GPUs costing circa 120 million dollars. Trying to crack a volume created on modern laptop would require up to 75,121 powerful machines running for ten years costing over four billion dollars.

Based on collected results the default Argon2 parameters used by Cryptsetup and LUKS2 provide sufficient security considering intended use of this software.

\section{Further research suggestions}
There are many ways in which topic of PBKDFs and their resilience to guessing attacks could be explored more. In this section I try to suggest several areas for further research.

The main method for simulating of an attack against Argon2 was based on benchmarking tool. Only computation of Argon2 is involved, no actual decryption of volumes is performed. It would be interesting to create a fully working cracking tool and test its performance, eventually making the price model more precise. This process could be connected with measuring of real power draw of physical machines while performing the attack. These values would make the model even more precise.

Another part which could be researched more are cloud-based computing services. The prices and estimates described in the fifth chapter of the thesis were made only based on theoretical information and no actual computation was made using these resources. The real efficiency might differ and in that case it might positively or negatively influence the resulting price and suitability for the attack.

The Argon2-gpu-bench tool could be probably more improved with support for modern GPUs and their features. As shown in table \ref{tab:gpus}, the performance on older Nvidia Geforce GPU was better than on more recent P100 GPU which should theoretically be more powerful.

\printbibliography
\appendix
\chapter{Description of the attached archive}
The thesis archive contains source codes of used tools, automation scripts, and raw resulting data. This chapter briefly describes the archive directory structure.

Source codes are located in the \emph{src} directory. There are following files:

\begin{description}
\item[argon2-gpu-master.zip] -- snapshot of the Argon2gpu repository taken on May 14 \parencite{argon2gpuvojta}.

\item[benchmark.c] -- source code of the benchmarking tool used in section~\ref{sec:benchmark}.

\item[cracker.c] -- the naive cracking tool used in subsection~\ref{subsec:naive}
\end{description}

The \emph{scripts} folder contains helper scripts used during benchmarking and also the naive cracking method.

\begin{description}
\item[analyse\_benchmark.sh and analyse\_cracker.sh] -- scripts used to calculate simple statistics from results provided by benchmarking tool and naive cracker using Datamash tool

\item[benchmark.sh]  - a script used to run the benchmarking tool

\item[compile.sh] -- a script used to compile the benchmarking or the cracking tool with \verb+-b+ or \verb+-c+ parameter respectively

\item[core.c.patch] -- a file used while compiling the cracker to prevent clearing of internal memory to speed up the whole process

\item[metacentrum\_cpu.sh, metacentrum\_cracker.sh, metacentrum\_gpu.sh] -- scripts used for running Argon2-GPU and cracker in MetaCentrum

\item[utils.c.patch] -- a patch file which makes possible to simulate memory limit for Cryptsetup through environment variables, see subsection~\ref{subsec:laptop}
\end{description}

The \emph{results} folder contains raw resulting data from benchmarking of Argon2 parameters as well as from cracking Argon2 hashes. All data are in CSV format. See respective \emph{readme.md} files for exact information.

\chapter{Glossary}
The following lists provides explanations for abbreviations used throughout the thesis.

\begin{description}
\item[aes-xts-plain64] -- Advanced Encryption Standard with Ciphertext Stealing (XTS) encryption mode. This mode was created specifically to protect data while stored and it should offer better protection against unauthorized manipulation of encrypted data.

\item[ASIC (Application-Specific Integrated Circuit)] -- a circuit customized for particular use, for example fast computation of particular algorithm

\item[AVX2] -- an Intel-specific SSE, working with 256 bit operands, improving the AVX extension

\item[AVX512F] -- an Intel-specific SSE working with 512 bit operands, replacing AVX2. 

\item[CUDA] -- a parallel computing platform created by Nvidia

\item[FPGA (Field-Programmable Gate Array)] -- a custom integrated circuit designed to be configured by a customer after being manufactured

\item[HMAC (Hash-Based Message Authentication Code)] -- a message authentication code involving cryptographic hash function and a cryptographic key used to verity message integrity as well as authentication

\item[Hyper-threading] -- Intel-specific proprietary simultaneous multithreading implementation improving parallelization on X86 processors

\item[IRTF (Internet Research Task Force)] -- an organization creating and maintaining long-term research groups working on topics related to the Internet, its protocols and applications

\item[OpenCL (Open Computing Language)] -- framework for writing programs executed across heterogeneous systems comprising many computational units

\item[PRF (Pseudorandom Function)] -- an efficient function simulating random oracle

\item[SHA (Secure Hash Algorithm)] -- family of cryptographic hashing functions created by NIST

\item[SIMD (Single Instruction, Multiple Data)] -- a concept in which multiple processing elements perform same computation on multiple pieces of data

\item[Sponge functions] -- also called cryptographic sponges is a family of algorithms which take bit stream of arbitrary length and produce bit stream of desired length, they are often used  as cryptographic primitives

\item[SSE (Streaming SIMD Extensions)] -- SIMD instruction set extending the X86 architecture created by Intel

\item[SSSE3 (Supplemental Streaming SIMD Extensions 3)] -- extension of SSE following after SSE3 and preceding SSE4

\item[TDP (Thermal Design Power)] -- the maximum amount of heat generated by a computer component that the cooling system is designed to dissipate under any workload

\item[XOP (Extended Operations)] -- AMD-specific SSE instruction set
\end{description}
\end{document}