
- Introduction
- Highlights
- Objective
- Methodology
- Software and Libraries Used
- Directory Structure
- Project Flow Diagram
- Usage
- Setup
- Examples
- Future Work
- Contributing
- License
This project implements a lossless text file compression and decompression tool using the **Huffman Coding algorithm **. The tool can efficiently compress text files by assigning variable-length binary codes to characters based on their frequency of occurrence, achieving compression ratios of typically 40-50% for text files.
Huffman Coding is a greedy algorithm that builds an optimal prefix code tree, ensuring that no code is a prefix of another, making the compressed data uniquely decodable without data loss.
✅ Lossless Compression - Perfect reconstruction of original files
✅ Variable-Length Encoding - More frequent characters get shorter codes
✅ Optimal Compression - Uses Huffman's greedy algorithm for optimal prefix codes
✅ Cross-Platform - Works on Windows, macOS, and Linux
✅ CLI Interface - Easy-to-use command-line interface
✅ Modular Design - Clean separation of concerns with dedicated classes
✅ Comprehensive Tests - Unit and integration tests for reliability
✅ Optimized File Format - Minimal header overhead for better compression ratios
The primary objectives of this project are to:
- Demonstrate the practical implementation of Huffman Coding algorithm in Java
- Achieve significant compression of text files while maintaining data integrity
- Provide a user-friendly CLI tool for file compression and decompression
- Showcase software engineering best practices including modular design, testing, and documentation
- Create an educational resource for understanding lossless compression algorithms
The compression process follows these key steps:
- Frequency Analysis - Calculate frequency of each character in the input file
- Huffman Tree Construction - Build optimal binary tree using a min-heap (priority queue)
- Code Generation - Assign binary codes: '0' for left branches, '1' for right branches
- File Header Creation - Store character frequencies for decompression
- Bit-Level Encoding - Replace characters with their Huffman codes and pack into bytes
- Header Parsing - Read character frequencies from compressed file
- Tree Reconstruction - Rebuild the Huffman tree using stored frequencies
- Bit-by-Bit Decoding - Traverse tree according to bits until leaf nodes are reached
- Character Output - Write decoded characters to output file
- Min-Heap (Priority Queue) - For efficient tree construction
- Greedy Algorithm - Huffman's approach for optimal code assignment
- Binary Tree Traversal - For code generation and decoding
- Bit Manipulation - For efficient binary I/O operations
Technology | Version | Purpose |
---|---|---|
Java | 21.0.5 | Core programming language |
Gradle | 8.14 | Build automation and dependency management |
JUnit 5 | 5.10.0 | Unit testing framework |
IntelliJ IDEA | 2024.2.6 | Integrated Development Environment |
java.util.PriorityQueue
- Min-heap for Huffman tree constructionjava.util.HashMap
- Frequency counting and code mappingjava.io.*
- File I/O operationsjava.nio.file.*
- Modern file handling
HuffmanCompressor/
│
├── build.gradle.kts # Gradle Kotlin DSL build configuration
├── settings.gradle.kts # Gradle project settings (rootProject.name, etc.)
├── README.md # Project documentation
├── .gitignore # Git ignore rules
├── out/ # <-- Gradle build outputs (ignored by git)
│ ├── classes/
│ │ ├── java/
│ │ │ ├── main/ # Compiled main source .class files
│ │ │ └── test/ # Compiled test .class files
│ ├── libs/ # Built JAR files
│ ├── reports/ # Test + build reports
│ └── ...
│
├── src/
│ ├── main/
│ │ ├── java/
│ │ │ └── com/
│ │ │ └── huffman/
│ │ │ ├── Main.java # Entry point class
│ │ │ ├── HuffmanNode.java # Tree node class
│ │ │ ├── HuffmanEncoder.java # Compression engine
│ │ │ ├── HuffmanDecoder.java # Decompression engine
│ │ │ ├── FileHandler.java # File I/O utilities
│ │ │ └── BitWriterReader.java # Bit-level write/read helper
│ │ └── resources/
│ │ ├── sample.txt # Test input file
│ │
│ └── test/
│ ├── java/
│ │ └── com/
│ │ └── huffman/
│ │ ├── HuffmanEncoderTest.java
│ │ ├── HuffmanDecoderTest.java
│ │ └── FileHandlerTest.java
│ └── resources/
│ └── test_sample.txt
│
└── gradle/ # Gradle wrapper files
│ └── wrapper/
│ ├── gradle-wrapper.jar
│ └── gradle-wrapper.properties
└── ...
java -jar out/libs/HuffmanCompressor-1.0-SNAPSHOT.jar compress input.txt compressed.bin
java -jar out/libs/HuffmanCompressor-1.0-SNAPSHOT.jar decompress compressed.bin restored.txt
./gradlew runHuffman --args="compress input.txt output.bin"
./gradlew runHuffman --args="decompress output.bin restored.txt"
- Java Development Kit (JDK) 17 or higher
- Git (for cloning the repository)
- Windows PowerShell or Command Prompt (Windows users)
- Clone the repository:
git clone https://github.com/antilneeraj/HuffmanCompressor.git
cd HuffmanCompressor
- Build the project:
./gradlew clean build
- Run tests:
./gradlew test
- The executable JAR will be generated at:
out/libs/HuffmanCompressor-1.0-SNAPSHOT.jar
For convenient usage, set up PowerShell aliases:
-
Open PowerShell as Administrator
-
Create/Edit your PowerShell profile:
notepad $profile
- Add these functions to the profile:
function huffc {
param($inputFile, $outputFile)
java -jar "C:\path\to\your\project\out\libs\HuffmanCompressor-1.0-SNAPSHOT.jar" compress $inputFile $outputFile
}
function huffd {
param($inputFile, $outputFile)
java -jar "C:\path\to\your\project\out\libs\HuffmanCompressor-1.0-SNAPSHOT.jar" decompress $inputFile $outputFile
}
Replace C:\path\to\your\project\
with your actual project path
- Save and reload your profile:
. $profile
- Now you can use short commands:
huffc input.txt compressed.bin
huffd compressed.bin restored.txt
echo "This is a sample text file for Huffman compression testing." > sample.txt
java -jar out/libs/HuffmanCompressor-1.0-SNAPSHOT.jar compress sample.txt sample.bin
Original: ~62 bytes, Compressed: ~35-40 bytes (typical 35-45% reduction)
java -jar out/libs/HuffmanCompressor-1.0-SNAPSHOT.jar decompress sample.bin recovered.txt
fc sample.txt recovered.txt # Windows
diff sample.txt recovered.txt # Linux/macOS
- Compression Ratio: 35-50% size reduction for typical text files
- Time Complexity: O(n log k) where n = file size, k = unique characters
- Space Complexity: O(k) for storing the Huffman tree
- File Types: Works best with
.txt
,.java
,.html
,.css
,.js
, etc.
- GUI Interface - Develop a user-friendly graphical interface
- Batch Processing - Support for compressing multiple files at once
- Directory Compression - Compress entire folders recursively
- Compression Statistics - Detailed reports on compression ratios and performance
- Advanced Algorithms - Implement LZ77, LZW, or hybrid compression methods
- Binary File Support - Extend support to non-text file types
- Progress Indicators - Show compression/decompression progress for large files
- Compression Level Options - Different optimization levels for speed vs. ratio trade-offs
- Adaptive Huffman Coding - Update tree dynamically as file is processed
- Canonical Huffman Codes - More efficient code representation
- Multi-threading - Parallel processing for large files
- Memory Optimization - Streaming compression for files larger than available RAM
Contributions are welcome! Please feel free to submit issues, suggestions, or pull requests.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
- Follow Java coding conventions
- Write unit tests for new features
- Update documentation as needed
- Ensure all tests pass before submitting
This project is licensed under the MIT License
- David A. Huffman - For the Huffman Coding algorithm (1952)
- GeeksforGeeks - For algorithmic reference and inspiration
- Java Community - For excellent documentation and libraries
⭐ If you found this project helpful, please consider giving it a star!