You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repository is a modified version of the original torch_dct. The original code from the repository only supported discrete cosine transform (DCT) and inverse discrete cosine transform (IDCT) on square matrices. I have extended the functionality to support DCT and IDCT on matrices with unequal height and width, allowing for broader applicability in scenarios where non-square matrices are involved.
The DCT transforms the image from the spatial domain to the frequency domain. In the frequency representation, low-frequency components are concentrated in the top-left corner, while high-frequency components are distributed towards the bottom-right corner. A notable characteristic of DCT is its ability to concentrate most of the information in the low-frequency region, which is particularly useful in image compression.
The Inverse Discrete Cosine Transform (IDCT) is the inverse operation of DCT and is used to convert the signal from the frequency domain back to the spatial domain. Its mathematical formula is as follows:
Clearly, the real part is handled by $\sum_{n = 0}^{N - 1}{x[n]}(\cos \frac{2\text{πkn}}{N})$, and the imaginary part is handled by $- j \sum_{n = 0}^{N - 1}{x[n]}{sin}( \frac{2{πkn}}{N})$. Let’s define $cos (\frac{2\text{πkn}}{N}) = cos(kt)$, so we can summarize the equation as follows:
Real part: $Re[k]=\sum_{n = 0}^{N - 1}{x[n]}cos(kt)$
Obviously, since cosine is an even function and sine is an odd function, we get:
If the original signal $x[n]$ is a real and even function? Clearly, since an even function multiplied by an even function is still an even function, and an odd function multiplied by an even function is still odd, we get: $x[n]sin(kt)$ becomes an odd function. Since it’s an odd function, naturally:
$$
Im[k]=-\sum_{n = 0}^{N - 1}{x[n]}sin(kt)=0
$$
As you can see, after the transformation, the imaginary part vanishes. Therefore, when the original time-domain signal is a real and even signal, we can rewrite the DFT as:
In fact, this is the core idea behind the DCT transformation. It’s quite simple, right? The DCT transformation is essentially a constrained form of the DFT transformation, and it’s not because the transformation method itself is different.
But this isn’t quite enough yet!
You might notice that this still looks a bit different from the DCT formula you see in textbooks. Let’s take a look at the most commonly used DCT transformation formula:
If this is the first time you’ve seen the DCT transformation, you might be a bit confused. What’s going on here? We said that DCT is just a DFT transformation of a real and even input signal, right? Don’t worry, let me explain it in detail.
First of all, we need to reiterate that DCT is indeed a special case of the DFT transformation. That’s correct. The special part lies in the fact that the original signal is a real and even function. However, in real-world applications, we rarely have perfectly real and even signals to work with. So, to make it more broadly applicable, we construct an even signal from a real signal if the natural signal isn’t already even.
Given a discrete real signal of length $N$, ${x[0], x[1], \dots, x[N-1]}$, we first extend its length to twice the original, making it $2N$. We define the new signal $x^{'}[m]$ as:
$$
x^{'}[m]=x[m] \quad (0\leq m \leq N-1)
$$
$$
x^{'}[m]=x[-m-1] \quad (-N\leq m \leq -1)
$$
Simply speaking, the signal becomes as shown in the following figure:
The blue line represents the original signal, and the red line represents the extended signal.
This way, we’ve transformed a real signal into a real and even signal. Now, how do we write the DFT transformation for this extended signal? Clearly, the signal’s interval has now changed from $[0, N-1]$ to $[-N, N-1]$, so the DFT formula becomes:
$$
X[k]=\sum_{m=-N}^{N-1}{x^{'}[m]e^{\frac{-j2\pi mk}{2N}}} \quad \text{(Note that the length of the extended signal is now 2N)}
$$
However, extending the signal in this way introduces a problem: this signal is not symmetric around $m=0$, but around $m=-\frac{1}{2}$. Therefore, to make the signal symmetric about the origin, it’s a good idea to shift the entire extended signal by $\frac{1}{2}$ units to the right:
At this point, it’s still not ideal, as $m$ turns out to be a fraction and can even be negative. Therefore, we need to further modify equation. Since we know that the sequence is an even-symmetric sequence, we can modify it as follows:
Now, we are very close to the standard DCT formula. The remaining issue is: what is that $\alpha(u)$ term in the standard formula?
In the case of DCT, this term appears mainly to orthogonalize the matrix when the DCT transformation is represented in matrix form, making further computation easier. In this case, the coefficient should be set to $\sqrt{\frac{1}{2N}}$ (except when $k=0$, which requires separate consideration; for a detailed derivation, refer to the reference).
Multiplying this coefficient into the above equation: $\sqrt{\frac{2}{N}}*\sum_{n=0}^{N-1}{x^{'}[n]cos({\frac{(n+\frac{1}{2}) \pi k}{N}})}$
Generating the DCT Matrix
An ingenious operation is to apply the dct function on an identity matrix, which yields the DCT matrix.
Here's a specific example:
Below is an example of a $4 \times 4$ DCT matrix for better understanding:
Applying DCT function to a $4 \times 4$ identity matrix results in a DCT matrix $D$ like above. We construct this matrix row by row according to the DCT definition.
For a 1D signal $x$ of length $N$, the $k$-th DCT coefficient is defined as:
Comparing $X[N - k]$ and $\overline{X[k]}$ yields:
$$
X[N - k] = \overline{X[k]}.
$$
This conclusion indicates that the frequency domain representation of a real sequence exhibits conjugate symmetry, meaning that the Fourier transform of a real sequence is symmetric about the midpoint, with opposite signs for the imaginary part.
Understanding this conclusion clarifies why the code operates as follows:
V_t_r=X_v# Note: here 't' stands for temporalV_t_i=torch.cat([X_v[:, :1] *0, -X_v.flip([1])[:, :-1]], dim=1)
Note: Since torch.fft.irfft only handles the first half of the frequency domain, this operation is actually valid.
Principle of torch.fft.irfft
To understand the function's principle, we start with the standard inverse Fourier transform formula and utilize frequency-domain conjugate symmetry.
The standard inverse discrete Fourier transform formula is:
For a real sequence $x[n]$, its discrete Fourier transform $X[k]$ satisfies the following conjugate symmetry:
$$
X[N - k] = \overline{X[k]}.
$$
Conjugate symmetry implies that the first half of the frequency domain contains all the information, while the second half is a mirror image of the first half. This property allows us to compute only the first half of the frequency data (i.e., from $k = 0$ to $\frac{N}{2}$) and then use symmetry to reconstruct the complete signal.
Using conjugate symmetry, the inverse Fourier transform formula can be split into the symmetric first half and the second half:
$$
x[n] = \frac{1}{N} \sum_{k=0}^{\frac{N}{2}} X[k] e^{j \frac{2\pi k n}{N}} + \frac{1}{N} \sum_{k=\frac{N}{2}+1}^{N-1} X[k] e^{j \frac{2\pi k n}{N}}.
$$
Using symmetry, the sum of the second half can be rewritten as the conjugate of the first half: