cap-related-work.tex

%% ------------------------------------------------------------------------- %%
\chapter{Related Work}
\label{cap:related-work}

There is a large number of works of skin detection based on color information and there are a couple of them comparing different techniques and classifiers, mainly from the point of view of performance, color models, skin color modeling and different datasets \citep{vezhnevets:03,kakumanu:07,mahmoodi:16}.

On different techniques, statistical models are those which estimate the probability that an observed pixel is associated with skin, based on a estimation dataset. One approach is the single histogram based Look-UP Table (LUT) that is capable to obtain the distribution of skin pixels in a particular color space, by using a set of training pixels \citep{mahmoodi:16}. Considering the Red, Green, Blue (RGB) color model for instance, a histogram with 256 bins per channel - $256^3$ in total - can be constructed for further counting the probability (see Eq. \ref{eq:lut_probability}) of each possible RGB value \citep{jones:02}.

\begin{equation*}
    P(rgb) = \frac{\# counts~of~rgb}{total~counts}
    \label{eq:lut_probability}
\end{equation*}

In~\citet{jones:02}, the authors applied this technique to figure out the decision boundary of skin pixels distribution by using a $3$-dimensional histogram model constructed from approximately 2 billion pixels. Those pixels were collected from 18,696 images over the Internet to perform skin detection. First, visualization techniques were used to examine the shape of these distributions. Then, by examining the 3D histogram from several angles, \citet{jones:02} realized that its overall shape could be inferred.

So, two different histograms for skin and non-skin in the RGB color model were calculated. Using those histograms along with training data, a surprisingly accurate pixel-wise classifier was derived. The best performance at a success rate of 88\% was reached for histograms of size 32~\citep{jones:02}.

On the basis of the output of the skin detector, \citet{jones:02} also trained a classifier to determine whether a naked person is present or not in the scene. The features used from the output to create the feature vector were~\citep{jones:02}:
\begin{itemize}
    \item Percentage of pixels detected as skin
    \item Average probability of the skin pixels
    \item Size in pixels of the largest connected component of skin
    \item Number of connected components of skin
    \item Percent of colors with no entries in the skin and nonskin histograms
    \item Height of the image
    \item Width of the image
\end{itemize}

In the last step, 10,679 images were manually classified in 5,453 naked and 5,226 non-naked image sets to train a neural network classifier. The neural network outputs a number between 0 and 1, with 1 indicating a naked person in the image. The detection rate achieved was of 88\%, with a false alarm rate of 11.3\%~\citep{jones:02}.

Skin detection is sometimes used as a primary task for other applications. \citet{hsu:02}~have embedded skin segmentation with the purpose to build a face detector application. The algorithm first performs a lighting compensation on the R, G, and B components of the pixels of the image. Then, these color corrected components are transformed into YCbCr space. \citet{hsu:02}~observed that a skin tone cluster is formed in the $YCb$ and $YCr$ subspaces. Based on the premise that skin tone color is independent of the luminance component, they~\citep{hsu:02} non-linearly transformed the YCbCr. Then, in this transformed YCbCr space, the skin pixels are detected using an elliptical model.

The experiments have shown good skin detection results, around 96\% and 80\% detection rate respectively on HH1 MPEG7 video and on Champion datasets, which contain images with frontal, near-frontal, half-profile and profile faces under varying illumination conditions and backgrounds. On the other hand, the results also showed a high false positive rate, that is reduced further with facial feature detection procedure based on the spatial arrangement of the detected skin patches~\citep{hsu:02}.

Another typical method is to define explicitly, through a number of rules, the boundaries that delimit the grouping of skin pixels in some color space~\citep{vezhnevets:03}. This approach has been adopted by~\citet{kovac:03} in the RGB color model, resulting in a true positive rate of 90.66\%. Basically, to find out the skin cluster in the RGB color model, the skin color is determined with the following rules~\citep{kovac:03}:

\begin{align*}
\text{Skin color at uniform daylight illumination}\\
R > 95, G > 40, B > 20 \\
max(R, G, B) - min(R, G, B) > 15 \\
|R - G| > 15 \\
R > G \\
R > B
\end{align*}

\begin{align*}
\text{Skin color at light daylight illumination}\\
R > 220, G > 210, B > 170 \\
|R - G| \leq 15 \\
R > B \\
G > B
\end{align*}

They~\citep{kovac:03} also performed experiments in comparison with~\citet{hsu:02}, where only the chromaticity channels Cb and Cr from the YCbCr color space are used. The results showed that the performance of the classifier is inferior in relation to the approach using all the three channels. \citet{hsu:02} method indeed diminished the influence of noise in dark images, but in images that are captured under standard daylight illumination, they label too many pixels as skin, decreasing the performance of the face detector by increasing the number of false positive pixels~\citep{kovac:03}.

\citet{chai:99} proposed a similar method in YCbCr color space, where a skin color map was designed using a histogram approach based on a given set of training images. \citet{chai:99} observed that the Cb and Cr distributions of skin color fall in the ranges $[77, 127]$ and $[133, 173]$, respectively, regardless the skin color variation in different races.

However, after an exhaustive image histogram analysis, \citet{basilio:11} found that the thresholds given by~\citet{chai:99} was robust only for images with Caucasian people. Once their~\citep{basilio:11} purpose was to find human skin from different people races, from any place of the world, a new threshold for each chromaticity (Cb, Cr) channel has been set up, regardless of skin  color:

\begin{align*}
80  \leq Cb \leq 120 \\
133 \leq Cr \leq 173
\end{align*}

The key advantage of this method is the simplicity of skin detection rules that lead to the construction of a very fast classifier. On the other hand, achieving high recognition rates with this method is difficult because it is necessary to find a good color space and empirically appropriate decision rules~\citep{vezhnevets:03}.

Differently from~\citet{kovac:03}, the authors of~\citet{yogarajah:11} developed a technique where the thresholds defined in the rules are dynamically adapted. The method consists of detecting the region of the eye and extracting an elliptical region to delimit the corresponding face. A Sobel filter is applied to detect the edges of the resulting region which is subjected to a dilation in order to get non-smooth regions, (i.e. eyes and mouth). The resulting image is subtracted from the elliptical region. As a result, there is a more uniform skin region where the thresholds are calculated.

Every single pixel in the colored image is classified as skin or non-skin, based on the calculated dynamic threshold values. When the algorithm detects multiple possible face regions in the image, a dynamic threshold is constructed for each of them and, subsequently, submitted to perform skin segmentation on the whole image. Finally, a logical OR operation is applied in all of the segmented regions obtained as of each dynamic threshold~\citep{yogarajah:11}.

The technique was used as part of a preprocessing step in~\citet{tan:12} in a strategy combining a $2$-dimensional density histogram and a Gaussian model for skin color detection. First, human eyes are located and, then, an elliptical mask model is used to generate the elliptical face region in the image. Due to computational simplicity, a Sobel filter is employed to detect non-smooth (i.e., eyes, eyebrows, mouth, etc.) regions. Then, the detected edge pixels are further submitted to a dilation operation to get the optimal non-smooth regions. Finally, a new image, that only consists of face regions, is obtained.

It is worth mentioning that the dynamic threshold with smoothed 2-D histogram is based on the assumption that the face and body of a person always share the same colors~\citep{tan:12}.

Thereafter, a 2-D histogram with smoothed densities and a Gaussian model are used as features to represent the skin and non-skin distributions, respectively. They~\citep{tan:12} also applied a fusion technique that uses the product rule on the two features to obtain better skin detection results.

Experiments were carried out using three public databases: Pratheepan, ETHZ PASCAL, and Stottinger. A comparison between different color spaces as well as proposed fusion and non-fusion approaches were also established. Quantitative results are only available for Stottinger dataset with $F-measure = 0.6490$, $Precision = 0.6403$ and $Recall = 0.6580$ measures~\citep{tan:12}.

\citet{naji:12} built a rule-based classifier in the HSV color space for 4 different skin ethnic groups in parallel (standard skin, shadow/blackish skin, light skin and red concentrated skin and lips). Basically, the classifier applies a rule-based region growth algorithm, after primitive segmentation, in which the output of the first layer is used as a seed, and then, the final mask in subsequent layers is constructed iteratively by neighboring skin pixels. The problem of shadow regions has been also addressed by lightening them with a skin color correction approach. A number of 125 images were collected from two different datasets for experiments. The rate of true positive pixels reported was of 96.5\% with a very low false positive rate of 0.76\%.

The model proposed by~\citet{kawulok:13} is based on global and local image information from where a skin probability map is built and, next, used to generate the initial seed for spatial analysis of skin pixels. The spatial analysis result is improved once these seeds, extracted from a local model, are highly adapted to the image.

In their~\citep{kawulok:13} proposal, first, human faces are captured in the input image through a multi-level ellipse detector, from where a local skin color model is learned. The local model is used to perform skin detection in the input image. Next, the local skin probability map is used to apply a high-probability threshold and obtain the seeds. Finally, based on the seeds gathered on the previous step, the spatial analysis is carried out in the global probability map, created on the basis of the global skin color model.

In general, color can be considered one of the most decisive properties that affect the performance of skin detection algorithms, even though some of them do not use the color as a feature directly~\citep{mahmoodi:16}. Therefore, the choice of a color space is directly related to the outcome produced by a particular approach~\citep{mahmoodi:16}. However, \citet{albiol:01} proved that the optimum performance of the skin classifiers is independent of the color space chosen.

RGB is the most commonly used color space for storing and representing digital images since the cameras are enabled to provide the images in such model as well as displays to show them. However, RGB suffers a high influence of the environment illumination. In order to reduce this influence, the RGB channels can be normalized~\citep{kakumanu:07}.

This particular characteristic led~\citet{bergasa:00} to build an adaptive and unsupervised Gaussian model to segment skin in the normalized RGB color space, using only the $r$ and $g$ channels (the normalized R and G channels), after the evaluation of other color spaces for the same application.

In~\citet{jayaram:04}, a comparative study using a Gaussian and a histogram approaches in a dataset of 805 color images in 9 different color spaces has been performed. The results revealed that the absence of the luminance component, that means, using only two channels of the color space, significantly impacts the performance. The best results were obtained in the SCT, HSI, and CIELab color spaces with histogram approach.

In~\citet{chaves:10}, the authors compared, under the same circumstances, the performance of 10 color spaces based on the $k$-means clustering algorithm on 15 images of the Aleix and Robert (AR) face image database~\citep{ar-face-database:98}. The ground truth images have been carefully generated by the authors. This means that parts of the photo which do not include skin color, such as hair, beard, lips, eyes, and background, were completely removed.

Experiments have been executed over the test set images provided in AR database. For each color space, \citet{chaves:10}~have done several tests with each channel lonely, using two channels -- with all possible combinations --, and with the three channels.

The tests have been carried out comparing, in a detailed quantitative manner, each color space with an accurate and objective criterion. In other words, the output image produced by the $k$-means was compared, in a pixel-wise fashion, to the ground truth images to get the quantitative measures. According to the results obtained, using two channels combined did not produce a good outcome. On the other hand, the results with each channel isolated were surprisingly good in comparison to the three channels fused. Lastly, among all the color spaces used for the skin detection experiments, the most appropriate were, in this order, HSV, YCgCr, and YDbDr~\citep{chaves:10}.

A similar study on color spaces and different skin color modeling have been provided in~\citet{khan:12}. A total of six color spaces (IHLS, HSI, RGB, normalized RGB, YCbCr and CIELab) and nine algorithms (AdaBoost, Bayesian network, J48, Multilayer Perceptron, Naive Bayesian, Random Forest, RBF network, SVM and Histogram based) were evaluated on 8991 manually pixel-wise annotated images by means of F-measure. \citet{khan:12} concluded that color space transformation, as well as the removal of the luminance channel, degrades the performance. In addition, classification results can be improved with the usage of lighting correction algorithms.

In~\citet{kaur:12}, an algorithm similar to that proposed by~\citet{kovac:03} have been implemented, where the boundaries that delimit the grouping of skin pixels are defined by explicit rules. After segmenting the image with the explicit rules, the algorithm also performs morphological and filtering operations to improve the accuracy of the method. The authors applied the algorithm in the YCbCr and CIELab color spaces, ignoring the Y and L luminance components, respectively. The results were more satisfactory when the algorithm was applied on CIELab. A similar technique was implemented in~\citet{shaik:15} and~\citet{kumar:15} in the HSV and YCbCr color spaces, the latter providing the best results in both.

Finally, \citet{brancati:17} proposed a novel rule-based skin detection method that works in the YCbCr color space. Basically, to classify skin pixels in the image, the algorithm evaluates the combinations of chrominance (Cb, Cr) components by way of correlation rules. These correlation rules are calculated relying upon the shape and size of dynamically generated skin color clusters. Geometrically, the clusters create trapezoidal polygons in the YCb and YCr subspaces whose apparent arrangement reflects a proportional behavior observed in relation to chrominance components. They~\citep{brancati:17} compared the results, in terms of quantitative performance metrics, with six other well known rule-based methods in the literature, outperforming them in almost all of the metrics.