-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmethod.tex
157 lines (149 loc) · 6.34 KB
/
method.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
\section{Method}
\subsection{Dataset}
We used both public dataset and customized dataset to train and evaluate our model.
Firstly, the MS-COCO~\cite{datasetcoco} which contains 2.5 million annotated
instances in 328k images categorized as 91 types. As our aim is to train a model that
can not only locate birds in images but also avoid false detections that recognize
non-bird objects as birds. All 80 types of training subset is used. However, as
the MS-COCO dataset is lack of instances of birds that is difficult to be detected.
We collected 708 birds image in field environment and manually labeled all these birds
instances with their bounding boxes and segments. Many of these instances is hard to be
detected because of factors such as occlusion. Besides, we also noted birds with the
following criterias.
\begin{enumerate}
\item whether the instance is occluded.
\item whether the instance is not clear.
\item whether the instance is photographed in side direction.
\item whether the instance is photographed in backlighting condition.
\end{enumerate}
A tabel describing our dataset based on these four criterias is shown in~\ref{th-bird-table}.
\begin{table}[H]
\centering % 使表格居中
\captionsetup{justification=centering} % 使标题居中
\begin{tabular}{|c|c|}
\hline
& Instances amount \\ \hline
Occluded & 262 \\ \hline
Not clear & 400 \\ \hline
Not side direction & 136 \\ \hline
Backlighting & 70 \\ \hline
Total & 1142 \\ \hline
\end{tabular}
\caption{Summary of TH-Bird dataset}
\label{th-bird-table}
\end{table}
\subsection{Data augmentation}
In order to simulate a shooting scene in the field,
we adapted and designed some data argument methods.
In the existing method, we apply the random disorder
in color zone, random horizontal flip,
random crop and random extend.
In the field environment, birds may be occluded by
branches or be unclear on photos because of movement.
We designed the stochastic fuzzy algorithm and
random occlusion algorithm to simulate.
\subsubsection{Stochastic fuzzy algorithm}
Firstly, extract the bounding boxes coordinates of
birds instances and determine the area to be blurred.
Then, apply a convolution kernel of a specific size.
All weights of this kernel is one. Therefore, the
convolution results can be derived as
\[
I'(x, y) = \sum_{i=-k}^{k} \sum_{j=-k}^{k} K(i, j) \cdot I(x+i, y+j)
\]
\begin{itemize}
\item \( I(x, y) \): Original pixel value at \((x, y)\).
\item \( I'(x, y) \): Blurred pixel value at \((x, y)\).
\item \( K(i, j) \): Kernel value at \((i, j)\).
\item \( k \): Half the kernel size (e.g., for a 3x3 kernel, \( k=1 \)).
\end{itemize}
For example, to get average value from $3\times 3$ regions to blur the instances.
The weights are chosen as:
\[
K = \frac{1}{9}
\begin{bmatrix}
1 & 1 & 1 \\
1 & 1 & 1 \\
1 & 1 & 1
\end{bmatrix}
\]
\begin{figure}[H]
\centering
\begin{subfigure}{0.20\textwidth}
\includegraphics[width=\linewidth]{fuzz_example.jpg}
\end{subfigure}
\begin{subfigure}{0.20\textwidth}
\includegraphics[width=\linewidth]{fuzz_exampleX.jpg}
\end{subfigure}
\caption{An exaple of stochastic fuzzy algorithm}
\end{figure}
\subsubsection{Random occlusion algorithm}
Being blocked by branches is a very common circumstances in
bird photography. We design an algorithm to generate
simulated branches from data augmentation. To obtain
the optimized color, the RGB color and standard deviation
of each channel were counted in a branches dataset~\cite{datsetbr}.
The statistic results are shown in~\ref{bra-table}
\begin{table}[H]
\centering % 使表格居中
\captionsetup{justification=centering} % 使标题居
\begin{tabular}{|c|c|c|}
\hline
\backslashbox{Value}{Channels} & Average value & Standard deviation \\ \hline
R & 87.35 & 2.487 \\ \hline
G & 89.24 & 2.497 \\ \hline
B & 82.97 & 2.435 \\ \hline
\end{tabular}
\caption{Average value and standard deviation from~\cite{datsetbr}}
\label{bra-table}
\end{table}
\begin{figure}[H]
\centering
\begin{subfigure}{0.20\textwidth} % 每张图的宽度为容器宽度的45%
\includegraphics[width=\linewidth]{AugSampleImages/1.png} % 替换为你的图片路径
\end{subfigure}
\begin{subfigure}{0.20\textwidth}
\includegraphics[width=\linewidth]{AugSampleImages/1X.png}
\end{subfigure}
\vspace{0.5em} % 行间距调整
\begin{subfigure}{0.20\textwidth}
\includegraphics[width=\linewidth]{AugSampleImages/2.png}
\end{subfigure}
\begin{subfigure}{0.20\textwidth}
\includegraphics[width=\linewidth]{AugSampleImages/2X.png}
\end{subfigure}
\vspace{0.5em}
\begin{subfigure}{0.20\textwidth}
\includegraphics[width=\linewidth]{AugSampleImages/3.png}
\end{subfigure}
\begin{subfigure}{0.20\textwidth}
\includegraphics[width=\linewidth]{AugSampleImages/3X.png}
\end{subfigure}
\vspace{0.5em}
\begin{subfigure}{0.20\textwidth}
\includegraphics[width=\linewidth]{AugSampleImages/4.png}
\end{subfigure}
\begin{subfigure}{0.20\textwidth}
\includegraphics[width=\linewidth]{AugSampleImages/4X.png}
\end{subfigure}
\caption{Demonstrations of branches simulation algorithm}
\label{aug-exp}
\end{figure}
The for each sample image, the birds instances
are found and determine whether the bird's
width and height are greater 25 to enable branches
augmentation. To draw a occlusion line, its "center" point
is randomly selected within the bounding box. After that,
the slope and x coordinates of the start and end points
are determined. The y coordinates is derived with x coordinates,
slop and bonding box limit. The thinness varies along
the line with an average value determined by the size of the
bird and a fixed standard deviation. The flow graph and the pseudo
code are displayed in~\ref{png:random-branch}
\begin{figure}[H]
\includegraphics[width=0.45\textwidth]{bracn_aug.png}
\caption{The flow graph of random branches augmentation}
\label{png:random-branch}
\end{figure}
\subsection{Model architecture}
Our model is based on DETRs Beat YOLOs on Real-time Object Detection or RT-DETR~\cite{RTDETR}.