background.tex

\chapter{Background}
\label{c:background}
Our traffic light detection system uses sensor fusion to estimate the motion and orientation of a smartphone and image processing in an estimated subpart of a video frame to detect the traffic lights.
Our system also uses neural network architecture to classify the walk and stop signs in video frames.
We use some existing techniques for these stages. 
In this section, we provide an overview of these techniques. 

%% In this chapter, we discuss the background of our system.
%% The main approach of our system is to improve the traffic light detection using smartphone sensor.
%% In the first part of this chapter, we discuss the inertial sensor of Android and sensor fusion.
%% Later we discuss the traffic bulb detection process.

\section{Inertial sensor of smartphone}
\label{s:sensor}
Virtually all smartphones come with a set of inertial sensors. These include an accelerometer, a gyroscope, and a geomagnetic field sensor. 
Below, we provide a brief overview of these sensors. 


\subsection{Accelerometer}
The accelerometer is one of the motion sensors in a smartphone.
Smartphones nowadays have three axis accelerometer, which can measure acceleration along the three axes of a smartphone's reference frame.
The accelerometer does not measure free fall acceleration.
Instead, it measures the forces that are applied to the accelerometer itself.
For example, when the phone is on the table, it measures the gravitational acceleration $g$ because this is proportional the accelerometer's weight.
%\todo{this is not clear}.

\ref{f:acc} shows the reference coordination frame of an accelerometer in an Android smartphone.
The accelerometer measures the acceleration with respect to the reference frame and returns a vector $[a_x, a_y, a_z]$.
Here, $a_z$ points perpendicular with respect to the smartphone's screen and $a_x$ and $a_y$ point right and up in the phone's planes respectively as shown in \ref{f:acc}.

\begin{figure*}[!ht]
  \centering
  \includegraphics[width=2.5in]{figures/coord_acceleration.pdf}
  \caption{Accelerator coordination}
  \label{f:acc}
\end{figure*}


\begin{figure*}
  \centering
  \includegraphics[width=2.5in]{figures/coord_omega.pdf}
  \caption{Gyroscope coordination}
  \label{f:gyro}
\end{figure*}
  
%\subfloat[Accelerator coordination] {\label{f:acc}\includegraphics[width=2.5in]{figures/coord_acceleration.pdf}}
%\hfill
%\subfloat[Gyroscope coordination] {\label{f:gyro}\includegraphics[width=2.5in]{figures/coord_omega.pdf}}

%%   \caption{Smartphone reference coordination system.}
%%   \label{f:coord_dia}
%% \end{figure*}

\subsection{Gyroscope}
The accelerometer is another motion sensor in a smartphone.
The gyroscope measures the rate of rotation or the angular velocity of rotation along the three axes of a smartphone's reference frame.
%This is also a motion sensor of smartphone like an accelerometer.
The gyroscope is a MEMS (Micro-Electro-Mechanical Sensing) sensor that uses vibrating elements to measure Coriolis effect \cite{coriolis}.
When a smartphone rotates, there is a change of direction in the vibrating elements because of the Coriolis force.
A MEMS gyroscope measures these variations along the three axes to estimate the rate of rotation.
Gyroscopes are very responsive to small rotations and can measure the variation in the rate of rotation smoothly.

\ref{f:gyro} shows the reference coordination frame of a gyroscope on an Android phone.
The coordination frame for the gyroscope is same as the coordination frame of accelerometer described above.
The gyroscope measures the angular velocity with respect to the reference frame and returns a vector $[\omega_x, \omega_y, \omega_z]$.

%% The angular velocity of rotation $\omega_z$ points towards the outside of screen face, that is perpendicular to the phone plane, which is the angular velocity of phone's rotation in the X-Y plane.
%% The rate of rotation $\omega_x$ and $\omega_y$ is along with the phone plane, which is the angular velocity of phone's rotation in the Y-Z plane and the Z-X plane respectively.

\subsection{Geomagnetic field sensor}
Geomagnetic field sensor is a position sensor of the smartphone.
It helps to determine the smartphone's physical position in the Earth's reference frame.
%Geomagnetic field sensor measures the change of the magnetic field and estimates the magnetic field at earth's point to find the declination from the true North.
Geomagnetic field sensor measures the change of the magnetic field and estimates the declination (the angle between geographical North and the magnetic North) of a smartphone with respect to the true North. 
The magnetic field in a certain point is directional and has its magnitude.
This geomagnetic field sensor of an Android smartphone provides a vector along the three axes, that describes the strength and the direction of the magnetic field relative to the phone.
The X coordinate represents the magnetic field strength in the direction of magnetic North.
A positive X value represents a smartphone's orientation with respect to the North and a negative X value represents the smartphone's orientation with respect to the South.
Similarly, Y coordinate represents the East and West direction.
The Z coordinate represents the direction of a smartphone's up or down direction.


\subsection{Orientation}
Each sensor described in the previous section has its own strength and weakness.
The gyroscope is fast, accurate and reliable.
It is also very responsive to a small rotation.
As a result, it can track the change of rotation rapidly.
We can get the orientation from the gyroscope by integrating its values.
However, the integration can accumulate a large error over time.
Because of this, integration of gyroscope values is not used in practice. 
Instead, the accelerometer is used to inject the correction term that keeps orientation accurate with respect to the gravity and removes the drift.
Additionally, to obtain the position of a smartphone in the Earth's reference frame, we combine the gyroscope and accelerometer data with geomagnetic field sensor.

In summary, we get orientation from the gyroscope and the accelerometer corrects the error accumulation.
Finally, geomagnetic field sensor provides correction due to the magnetic North.

Android Sensor API \cite {api_android} provides the orientation of the phone with respect to the reference frame described before by fusing the data from the accelerometer, gyroscope, and magnetic sensor. 
This orientation contains the pitch, roll, and azimuth of the smartphone. 
%Another motion sensor in android, rotation vector, use this three sensor data to report the orientation of the device in vector form along the axis of the reference frame.

%This rotation vector provides the axis of the system and the angle of rotation from these axes.
%
%\note{new figure with horizontal phone image}
\begin{figure}
\centering
\includegraphics[width=3.8in]{figures/roll_pitch_yaw.pdf}
\caption{Orientation of the smartphone}
\label{f:rpy_dia}
\end{figure}

\ref{f:rpy_dia} shows the orientation of a smartphone with respect to the smartphone's coordination frame.
Here, the pitch is the degree of rotation about the X axis.
Pitch value is positive when the positive Y axis rotates to the negative Z axis and this value is negative when positive Y axis rotates to the positive Z axis.
%Pitch is the angle between the plane parallel to the smartphone and parallel to the ground.
%\note{fixed?}
%\todo{clarify, parallel twice is ambiguous.}
%If the smartphone's top edge inclines to the ground the pitch will be positive and vice versa.
%\note{fixed?}
%\todo{overall below is confusing to read. May be table is better}
Roll is the rotation angle with respect to the Y axis.
Roll value is positive when the positive X axis turns to the negative Z axis and it is negative when positive X axis turns to the positive Z axis.
Azimuth is the rotation angle with respect to the Z axis.
This is the angle between the smartphone Y axis and the magnetic North of the earth.
That means if the phone's Y axis is placed in straight to the magnetic North of earth the azimuth is 0.

\ref{t:rpy} shows the range of pitch, roll, and the azimuth value.
\begin{table}[h!]
  \centering
 
  \rowcolors{2}{gray!25}{white}
  \begin{tabular}{  l   c  }
    \rowcolor{gray!50}
    \hline
    Range for pitch value & -180 degrees to 180 degrees \\
    Range for roll value & -90 degrees to 90 degrees  \\
    Range for azimuth value & 0 degree to 360 degrees \\
    \rowcolor{gray!50}
    & 0 degree when Y axis aligns with the magnetic North \\
    
  \end{tabular}
  \caption{The range of pitch, roll, and azimuth.}
  \label{t:rpy}
\end{table}


\section {Image processing}
We process each video frame using computer vision techniques to detect traffic lights. 
Below, we provide a brief overview of these techniques. 

\subsection{Color space}
\label{s:color_space}
RGB \cite{rgb} and HSV \cite{hsv} are two primary color spaces that are used in image processing and computer vision.
RGB defines a color as a combination of the three primary colors red (R), green (G), and Blue (B).
Hence, RGB is an additive color space.
%RGB color space describes colors with the amount of red, green and blue color presents on that frame.

\ref{f:rgb} shows the model of RGB space \cite{rgb}.
It shows that any color of this cube is a combination of red, blue, and green.

\begin{figure}[htbp]
\begin{minipage}[t]{0.45\linewidth}
    \includegraphics[width=\linewidth]{images/RGB.png}
    \caption{RGB color space}
    \label{f:rgb}
\end{minipage}
    \hfill
\begin{minipage}[t]{0.45\linewidth}
    \includegraphics[width=\linewidth]{images/HSV.png}
    \caption{HSV color space}
    \label{f:hsv}
\end{minipage} 
\end{figure}

HSV color space describes color in terms of Hue, Saturation, and Value.
HSV color space is similar to the way human perceive a color.
It separates the color information from the intensity/brightness.
Every color can be represented by a different value of Hue.

Saturation represents the vibrancy of the color.
The lower value of saturation represents the gray tone in a color and higher value of a saturation represents a primary color.

The Value represents the brightness of a color.
A lower Value means less bright color and vice-versa.

\ref{t:hsv} shows the ranges for hue, saturation, and value.
It shows that original hue range is 0-360, however, the OpenCV \cite{opencv} represents color values using the range from 0 to 255. 
Hence, the OpenCV converts the hue original range to 0-180.

\begin{table}[h!]
  \centering
  
  \rowcolors{2}{gray!25}{white}
  \begin{tabular}{  l  c  }
    \rowcolor{gray!50}
    \hline
    Range for Hue & 0 degree to 360 degrees \\
    \rowcolor{gray!50}    
    & at OpenCV - 0 degree to 180 degrees\\  
    Range for Saturation & 0 to 255 \\
    Range for Value & 0 to 255 \\
    \hline 
    
  \end{tabular}
  \caption{The range of hue, saturation, and value.}
  \label{t:hsv}
\end{table}


\ref{f:hsv} shows the cylindrical model of HSV color space \cite{hsv}.
Here, the Hue is the central part.
The radius of this cylinder represents Saturation and the height of the cylinder represents the Value.

\subsection{Hough circle transform}
Our system detects the circular bulb in a traffic light using the Hough circle algorithm \cite{houghcir_alg}.
The Hough circle algorithm first detect the edges in a video frame.
It uses Canny method \cite{canny} for edge detection.
%\todo{can you make it more clear?}
Then, at every edge pixel, it generates a circle.
To find the intersection of these circles, Hough method uses an accumulator matrix.
If a circle passes through the grid of the accumulator matrix, it increases the voting number of the grid.
The position of the local maxima of this accumulator matrix provides the corresponding circle's centers.

\section{Neural network architecture}
The Neural network is a supervised machine learning algorithm that that can achieve high accuracy in recognition tasks involving complex datasets. 
A neural network typically consists of a large number of nodes that are organized into layers along with interconnection between the layers. 
\ref{f:nn_archi} shows a sample neural network architecture.

\begin{figure}[h]
\centering
\includegraphics[width=3.8in]{figures/nn.pdf}
\caption{Architecture of neural network}
\label{f:nn_archi}
\end{figure}

At every layer, each node has different weights.
During data propagation, the data values are multiplied with the weights.
The final value passes through the activation layers if it exceeds the threshold.
During training, the threshold value and weight value are tuned iteratively to achieve best possible accuracy in the training data. 
Typically, back-propagation method is used to train a neural network.