Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[submodule "vshampor/deshuffler/unit_tests/googletest"]
path = vshampor/deshuffler/unit_tests/googletest
url = https://github.com/google/googletest.git
57 changes: 57 additions & 0 deletions vshampor/SAS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# "deshuffler"
## Software Architecture Specification
##### Vasily Shamporov, Apr 2017

### Overview
The program is written in C++ (with the support of C++14 standard). The basic program control flow is presented on the figure below:

![alt text](control_flow.png)

The input YUV file , which has every frame (except the first one) shuffled in random order on the basis of 64x64 blocks, is first opened for reading; next, for each frame the data which describes the correct position of each shuffled tile on the unshuffled frame ("permutation data") is calculated. Afterwards (optionally) the original unshuffled stream is completely reconstructed and output to the disk using the input shuffled stream and the permutation data calculated in the previous step. The calculation of permutation data is based on motion estimation between consecutive frames of the input YUV stream. More details on some of the steps of the algorithm follow.

### Details
##### Calculate permutation for the stream
![alt text](perm_gen.png)
This step incorporates frame-level parallelism to improve performance - the input stream is divided into M equal batches, with consecutive frame sequences in each batch, and each part is assigned a worker thread. Each worker thread then calculates permutation data between pairs of consecutive frames inside their batch, starting from the first one in display order.

The batch containing the first, unshuffled frame and the corresponding worker thread (hereafter "primary" thread) are of special interest. Non-primary threads will calculate permutation data between pairs of shuffled frames, wherefore the primary thread is able to always calculate permutations between a shuffled frame and a reconstructed preceding frame, since its batch has the first, unshuffled frame. Hence, the permutation data produced by non-primary threads will only be relative to the first frames of their respective batches, while permutation data produced by the primary thread will be absolute. An additional post-processing step is therefore required to produce absolute permutation data for the whole stream.

It is assumed that motion estimation between a shuffled frame and an unshuffled one will be more effective in producing correct permutation data than motion estimation between two shuffled, although consecutive frames and the calculation of permutation data for some of the frames in the non-primary thread batches may fail (see below for more details on the failure status assignment). To address this, the failed frames from each non-primary thread are aggregated, and then, after all threads have finished their calculations, the failed frames are processed in sequential order while using reconstructed preceding frames (which should be available by this moment of time, either as video data or absolute permutation data), and the correct permutation data is calculated for these frames.

##### Calculate permutation for a sequential frame batch
![alt text](perm_batch.png)
As stated above, each worker thread processes its own batch of sequential frames starting with the first pair of consecutive frames in display order. Calculating permutation data between two frames is performed using FEI PREENC, which performs motion estimation on a 16x16 block basis, while shuffled tiles have a size of 64x64 pixels. Theoretically, it is sufficient to only perform motion estimation for a single 16x16 block inside the 64x64 tile to calculate the tile position on the preceding frame. This may be prone to errors, but brings obvious performance gain; therefore, as a first step, for each pair of consecutive frames (K_(i - 1), K_i) a pair of special frames (S_(i - 1), S_i) is constructed by taking a 16x16 block from the center of each 64x64 tile and putting them side-by-side in the same raster scan order as for the original frames. The permutation data is then calculated for frames (S_(i - 1), S_i). If this fails, the algorithm falls back to motion estimation on the full-res frames (K_(i - 1), K_i). If this fails as well (if, for example, it was not possible to reconstruct frame K_(i - 1)), then the whole frame K_i is assigned a failure status and the processing progresses to the next pair of frames in the batch. It is assumed that the primary thread should not fail at this point, otherwise deshuffling as a whole fails since no other means to improve the motion estimation accuracy are included in the algorithm.

##### Calculate permutation for a frame pair
![alt text](perm_pair.png)
When permutation data is calculated for two frames A and B, one of them serves as a reference for the other in terms of motion estimation. Let A be the reference frame - depending on the situation, it may already have absolute permutation data (calculated previously by the primary thread), relative permutation data (calculated previously by a non-primary thread), or no permutation data at all (if motion estimation by a non-primary thread failed, or frame A is the first one in a batch belonging to a non-primary thread). If frame A has absolute permutation data, then frame B will be assigned absolute permutation data after PREENC run as well, and it is marked as such. Otherwise, frame B is marked as having relative permutation data.

Next, PREENC is run on frames A and B with A as reference. The output of PREENC is a map of (multiple) motion vectors per each 16x16 block of the frame and corresponding distortion values. Afterwards, if frames A and B were down-sized using the algorithm described in the previous section, a single best motion vector is selected for each 16x16 block (representing a 64x64 tile on the full-resolution frame); otherwise, if frames A and B had full resolution, a single best motion vector is selected for each 64x64 tile. Either way, at this point a per-tile map of motion vectors is produced for frame B relative to frame A. If this map specifies a valid permutation of tiles (i.e. no two MVs point to the same tile on frame A), then the calculation is deemed successful and actual permutation data is computed and assigned to frame B; a success status is returned. Otherwise, the calculation is deemed a failure - no permutation data is computed and a failure status is returned.

###### PREENC call specifics
As stated above, PREENC works on a 16x16 block basis. However, the range of produced MVs is limited by the PREENC window size (roughly 128x96 pixels) - see picture below:

![alt text](preenc_single.png)

For our purposes the desired MVs (specifying the tile permutation) may be larger than the PREENC window size - as large as the frame width/height. In order to ensure that each 16x16 block is being searched for across the whole frame, PREENC will be called multiple times on the same pair of frames, but each time with a different "offset vector map" - a 2D-array of vectors (x;y), one for each 16x16 block, which specify offsets of the PREENC search window from the center of the 16x16 block.

The number of PREENC calls is determined based on the frame size and the PREENC window size. The principle is to break the frame into an integer number of equal search areas, each having width and height equal to PREENC window size; the number of PREENC calls will be equal to the number of the search areas. By this time, the frame size is aligned by 16 pixels, but not aligned by the search area size, so the search areas will be overlapping, as illustrated in the following picture, which has 12 search areas (red dots correspond to the centers of the search areas):

![alt text](preenc.png)

For each PREENC call corresponding to one search area the offset vector map is constructed in the next way - for each 16x16 block on the frame the offset vector is drawn from the center of the block to the center of the search area. This is illustrated on the picture below (only the offset vectors for the first 9 top-left blocks are shown):

![alt text](preenc_map.png)

The resulting motion vectors and distortion values from each call are aggregated per-16x16 block and passed higher up the architecture for purposes of finding the ultimate per-64x64 tile motion vector map.

Since each PREENC call associated with a search area is independent from the others, these calls can be distributed among threads, achieving, roughly speaking, a "search-area parallellism".

###### Checking the per-tile MV map for consistency
Determining whether the per-tile MV map specifies a valid permutation of tiles is performed in the following way: first, a 2-D array of M x N boolean values `bool hitmap[M][N]` is allocated (where M and N are width and height of the frame in tile units respectively) and each boolean value is initialized to false. Next, per-tile motion vectors are processed in tile raster scan order; the coordinates N_x, N_y (in tile units) of the "target" tile , i.e. the tile where the motion vector points to when centered on the tile it belongs to ("source tile"), are calculated. If `hitmap[N_x][N_y]` is `false`, then it is set to `true` to mark that the corresponding "target" tile has been associated with one of the "source" tiles. If `hitmap[N_x][N_y]` is already `true`, the MV map is deemed as not specifying valid permutation data. Otherwise, if, after processing all per-tile MVs there has not been a situation where `hitmap[N_x][N_y]` is already ` true`, the MV map is deemed as specifying valid permutation data. The complexity of this algorithm is O(M * N) in computations and O(M * N) in memory.

##### Permutation data
The permutation data format for frame B relative to frame A is simple - it is a list of integers (one integer for each tile of frame B in raster scan order), each one representing a position of the corresponding tile on frame A in raster scan order.

##### Reconstructing the original stream
Since by the time the original stream reconstruction step is executed the absolute permutation data is known (i.e. each frame can be reconstructed using only its own pixel data and the permutattion data), this step is easily parallelizable on the pixel-level - basically, a single thread may be assigned to each tile to be replaced.
Binary file added vshampor/control_flow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
20 changes: 20 additions & 0 deletions vshampor/deshuffler/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
/Debug/
*.yuv
CMakeFiles/*
bin/*
*/bin/*
*/CMakeCache.txt
CMakeCache.txt
CMakeFiles
CMakeScripts
Testing
Makefile
cmake_install.cmake
install_manifest.txt
compile_commands.json
CTestTestfile.cmake
lib/*
sample_common/lib/*
*.pc
/build/
*.lib
37 changes: 37 additions & 0 deletions vshampor/deshuffler/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
cmake_minimum_required (VERSION 3.11)
project (deshuffler)
if (NOT CMAKE_BUILD_TYPE)
message(STATUS "No build type selected, default to Debug")
set(CMAKE_BUILD_TYPE "Debug")
endif()

set (CMAKE_CXX_STANDARD 11)

set(CMAKE_BINARY_DIR bin)
set(EXECUTABLE_OUTPUT_PATH ${CMAKE_BINARY_DIR})
set(LIBRARY_OUTPUT_PATH lib)

include_directories(include)
include_directories(msdk_api/include)
include_directories(sample_common/include)

add_subdirectory(unit_tests)

add_subdirectory(sample_common)

add_library(deshuffler STATIC
src/deshuffler.cpp
src/input_params.cpp
src/permutation_data.cpp
src/yuv_reader_seek_i420.cpp)
add_dependencies(deshuffler sample_common)
target_link_libraries(deshuffler sample_common)

add_executable(deshuffler_cl src/main.cpp)
add_dependencies(deshuffler_cl deshuffler)
target_link_libraries(deshuffler_cl deshuffler)

add_custom_target(run_tests ALL
DEPENDS deshuffler
COMMAND unit_tests
WORKING_DIRECTORY unit_tests/bin/)
27 changes: 27 additions & 0 deletions vshampor/deshuffler/include/deshuffler.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#ifndef DESHUFFLER_H_
#define DESHUFFLER_H_

#include "permutation_data.h"
#include "input_params.h"
#include "yuv_reader_seek_i420.h"
#include "permut_calc_task.h"
#include <sample_utils.h>

class Deshuffler
{
public:
Deshuffler() = default;
Deshuffler(const InputParams& params) : m_params(params) {}
void CalculatePermutation();
void OutputPermutation();
void ReconstructStream();
void OutputStream();
private:
std::vector<PermutCalcTask> GeneratePermutCalcTasks();
mfxStatus CalculatePermutCalcTask(PermutCalcTask& task);
InputParams m_params;
PermutationData m_permutation_data;
CSmplYUVWriter m_YUVWriter;
};

#endif /* DESHUFFLER_H_ */
26 changes: 26 additions & 0 deletions vshampor/deshuffler/include/input_params.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@

#ifndef INPUT_PARAMS_H_
#define INPUT_PARAMS_H_

#include <mfxdefs.h>
#include <string>

struct StreamInfo
{
std::string filename;
mfxU32 width;
mfxU32 height;
mfxU32 frame_count;
};

class InputParams
{
public:
InputParams() = default;
InputParams(int argc, char* argv[]);
mfxU32 thread_count = 8;

};


#endif /* INPUT_PARAMS_H_ */
10 changes: 10 additions & 0 deletions vshampor/deshuffler/include/permutation_data.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#ifndef PERMUTATION_DATA_H_
#define PERMUTATION_DATA_H_

class PermutationData
{
public:
PermutationData();
};

#endif /* PERMUTATION_DATA_H_ */
18 changes: 18 additions & 0 deletions vshampor/deshuffler/include/yuv_reader_seek_i420.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#ifndef SRC_YUV_READER_SEEK_H_
#define SRC_YUV_READER_SEEK_H_

#include <sample_utils.h>
#include <string>
#include <input_params.h>

class YUVReaderSeekI420: public CSmplYUVReader
{
public:
mfxStatus Init(const StreamInfo& stream_info);
void Seek(mfxU32 frame_number);
protected:
mfxU32 m_width = 0;
mfxU32 m_height = 0;
};

#endif /* SRC_YUV_READER_SEEK_H_ */
162 changes: 162 additions & 0 deletions vshampor/deshuffler/msdk_api/include/mfxastructures.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
// Copyright (c) 2017 Intel Corporation
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#ifndef __MFXASTRUCTURES_H__
#define __MFXASTRUCTURES_H__
#include "mfxcommon.h"

#if !defined (__GNUC__)
#pragma warning(disable: 4201)
#endif

#ifdef __cplusplus
extern "C"
{
#endif /* __cplusplus */

/* CodecId */
enum {
MFX_CODEC_AAC =MFX_MAKEFOURCC('A','A','C',' '),
MFX_CODEC_MP3 =MFX_MAKEFOURCC('M','P','3',' ')
};

enum {
/* AAC Profiles & Levels */
MFX_PROFILE_AAC_LC =2,
MFX_PROFILE_AAC_LTP =4,
MFX_PROFILE_AAC_MAIN =1,
MFX_PROFILE_AAC_SSR =3,
MFX_PROFILE_AAC_HE =5,
MFX_PROFILE_AAC_ALS =0x20,
MFX_PROFILE_AAC_BSAC =22,
MFX_PROFILE_AAC_PS =29,

/*MPEG AUDIO*/
MFX_AUDIO_MPEG1_LAYER1 =0x00000110,
MFX_AUDIO_MPEG1_LAYER2 =0x00000120,
MFX_AUDIO_MPEG1_LAYER3 =0x00000140,
MFX_AUDIO_MPEG2_LAYER1 =0x00000210,
MFX_AUDIO_MPEG2_LAYER2 =0x00000220,
MFX_AUDIO_MPEG2_LAYER3 =0x00000240
};

/*AAC HE decoder down sampling*/
enum {
MFX_AUDIO_AAC_HE_DWNSMPL_OFF=0,
MFX_AUDIO_AAC_HE_DWNSMPL_ON= 1
};

/* AAC decoder support of PS */
enum {
MFX_AUDIO_AAC_PS_DISABLE= 0,
MFX_AUDIO_AAC_PS_PARSER= 1,
MFX_AUDIO_AAC_PS_ENABLE_BL= 111,
MFX_AUDIO_AAC_PS_ENABLE_UR= 411
};

/*AAC decoder SBR support*/
enum {
MFX_AUDIO_AAC_SBR_DISABLE = 0,
MFX_AUDIO_AAC_SBR_ENABLE= 1,
MFX_AUDIO_AAC_SBR_UNDEF= 2
};

/*AAC header type*/
enum{
MFX_AUDIO_AAC_ADTS= 1,
MFX_AUDIO_AAC_ADIF= 2,
MFX_AUDIO_AAC_RAW= 3,
};

/*AAC encoder stereo mode*/
enum
{
MFX_AUDIO_AAC_MONO= 0,
MFX_AUDIO_AAC_LR_STEREO= 1,
MFX_AUDIO_AAC_MS_STEREO= 2,
MFX_AUDIO_AAC_JOINT_STEREO= 3
};

typedef struct {
mfxU32 CodecId;
mfxU16 CodecProfile;
mfxU16 CodecLevel;

mfxU32 Bitrate;
mfxU32 SampleFrequency;
mfxU16 NumChannel;
mfxU16 BitPerSample;

mfxU16 reserved1[22];

union {
struct { /* AAC Decoding Options */
mfxU16 FlagPSSupportLev;
mfxU16 Layer;
mfxU16 AACHeaderDataSize;
mfxU8 AACHeaderData[64];
};
struct { /* AAC Encoding Options */
mfxU16 OutputFormat;
mfxU16 StereoMode;
mfxU16 reserved2[61];
};
};
} mfxAudioInfoMFX;

typedef struct {
mfxU16 AsyncDepth;
mfxU16 Protected;
mfxU16 reserved[14];

mfxAudioInfoMFX mfx;
mfxExtBuffer** ExtParam;
mfxU16 NumExtParam;
} mfxAudioParam;

typedef struct {
mfxU32 SuggestedInputSize;
mfxU32 SuggestedOutputSize;
mfxU32 reserved[6];
} mfxAudioAllocRequest;

typedef struct {
mfxU64 TimeStamp; /* 1/90KHz */
mfxU16 Locked;
mfxU16 NumChannels;
mfxU32 SampleFrequency;
mfxU16 BitPerSample;
mfxU16 reserved1[7];

mfxU8* Data;
mfxU32 reserved2;
mfxU32 DataLength;
mfxU32 MaxLength;

mfxU32 NumExtParam;
mfxExtBuffer **ExtParam;
} mfxAudioFrame;

#ifdef __cplusplus
}
#endif /* __cplusplus */

#endif


Loading