The execution time for flip operations for larger images is surprisingly high (e.g., a flip operation for a 1280x720 pixels takes approximately 20ms).
Changing from EmguCv's Flip wrapper function to a direct CvInvoke.cvFlip reduced the overhead to approximately 10ms, which is still too high.
A possible solution for future improvement could be the use of GpuInvoke.Flip() or even better the OpenCL counterpart.