This repository is a modified version of the original torch_dct. The original code from the repository only supported discrete cosine transform (DCT) and inverse discrete cosine transform (IDCT) on square matrices. I have extended the functionality to support DCT and IDCT on matrices with unequal height and width, allowing for broader applicability in scenarios where non-square matrices are involved.
Below are Referenced from:
Thanks!
For a two-dimensional signal (such as an image)
where:
-
$F(u, v)$ are the DCT coefficients in the frequency domain. -
$f(x, y)$ is the signal in the spatial domain (e.g., image pixel values). -
$N$ and$M$ are the width and height of the image, respectively. -
$\alpha(u)$ and$\alpha(v)$ are normalization factors defined as:
The DCT transforms the image from the spatial domain to the frequency domain. In the frequency representation, low-frequency components are concentrated in the top-left corner, while high-frequency components are distributed towards the bottom-right corner. A notable characteristic of DCT is its ability to concentrate most of the information in the low-frequency region, which is particularly useful in image compression.
The Inverse Discrete Cosine Transform (IDCT) is the inverse operation of DCT and is used to convert the signal from the frequency domain back to the spatial domain. Its mathematical formula is as follows:
Before diving into DCT transformation, let’s first take a look at the DFT transformation formula:
Of course, we can split the above equation into:
Clearly, the real part is handled by
-
Real part:
$Re[k]=\sum_{n = 0}^{N - 1}{x[n]}cos(kt)$ -
Imaginary part:
$Im[k]=- \sum_{n = 0}^{N - 1}{x[n]}sin(kt)$
Obviously, since cosine is an even function and sine is an odd function, we get:
If the original signal
$x[n]$ is a real and even function? Clearly, since an even function multiplied by an even function is still an even function, and an odd function multiplied by an even function is still odd, we get:$x[n]sin(kt)$ becomes an odd function. Since it’s an odd function, naturally:
As you can see, after the transformation, the imaginary part vanishes. Therefore, when the original time-domain signal is a real and even signal, we can rewrite the DFT as:
In fact, this is the core idea behind the DCT transformation. It’s quite simple, right? The DCT transformation is essentially a constrained form of the DFT transformation, and it’s not because the transformation method itself is different.
But this isn’t quite enough yet!
You might notice that this still looks a bit different from the DCT formula you see in textbooks. Let’s take a look at the most commonly used DCT transformation formula:
Where, when
Otherwise:
If this is the first time you’ve seen the DCT transformation, you might be a bit confused. What’s going on here? We said that DCT is just a DFT transformation of a real and even input signal, right? Don’t worry, let me explain it in detail.
First of all, we need to reiterate that DCT is indeed a special case of the DFT transformation. That’s correct. The special part lies in the fact that the original signal is a real and even function. However, in real-world applications, we rarely have perfectly real and even signals to work with. So, to make it more broadly applicable, we construct an even signal from a real signal if the natural signal isn’t already even.
Given a discrete real signal of length
Simply speaking, the signal becomes as shown in the following figure:
The blue line represents the original signal, and the red line represents the extended signal.
This way, we’ve transformed a real signal into a real and even signal. Now, how do we write the DFT transformation for this extended signal? Clearly, the signal’s interval has now changed from
However, extending the signal in this way introduces a problem: this signal is not symmetric around
Using Euler's formula and expanding the above equation, we only need the real part, as we have already discussed that the imaginary part becomes zero:
At this point, it’s still not ideal, as
Next, let
Now, we are very close to the standard DCT formula. The remaining issue is: what is that
In the case of DCT, this term appears mainly to orthogonalize the matrix when the DCT transformation is represented in matrix form, making further computation easier. In this case, the coefficient should be set to
Multiplying this coefficient into the above equation:
An ingenious operation is to apply the dct
function on an identity matrix, which yields the DCT matrix.
Here's a specific example:
Below is an example of a
Applying DCT function to a
For a 1D signal
Taking the 0th row as an example:
Therefore:
To understand the idct
function, it is essential to first establish a key conclusion:
For a real sequence
$x[n]$ , its frequency domain representation$X[k]$ exhibits conjugate symmetry:
Here is the proof of this conclusion:
The Discrete Fourier Transform (DFT) converts a time-domain signal
According to the DFT definition, by replacing
Simplifying the exponential part:
Since
Thus,
Consider the complex conjugate of
Using the linearity of complex conjugation:
Since
Comparing
This conclusion indicates that the frequency domain representation of a real sequence exhibits conjugate symmetry, meaning that the Fourier transform of a real sequence is symmetric about the midpoint, with opposite signs for the imaginary part.
Understanding this conclusion clarifies why the code operates as follows:
V_t_r = X_v
# Note: here 't' stands for temporal
V_t_i = torch.cat([X_v[:, :1] * 0, -X_v.flip([1])[:, :-1]], dim=1)
Note: Since torch.fft.irfft only handles the first half of the frequency domain, this operation is actually valid.
To understand the function's principle, we start with the standard inverse Fourier transform formula and utilize frequency-domain conjugate symmetry.
The standard inverse discrete Fourier transform formula is:
For a real sequence
Conjugate symmetry implies that the first half of the frequency domain contains all the information, while the second half is a mirror image of the first half. This property allows us to compute only the first half of the frequency data (i.e., from
Using conjugate symmetry, the inverse Fourier transform formula can be split into the symmetric first half and the second half:
Using symmetry, the sum of the second half can be rewritten as the conjugate of the first half:
We know that:
since
Substituting the above expression into the original formula:
Notice that
Therefore, the final inverse transform formula is: