Skip to content

A simple, lossless text file compressor and decompressor built in Java using Huffman Coding. It shrinks files by giving shorter codes to frequent characters, and restores them exactly when decompressed. Features include a command‑line interface, clean modular design, and efficient bit‑level file handling

License

Notifications You must be signed in to change notification settings

antilneeraj/HuffmanCompressor

Repository files navigation

Text File Compression and Decompression Using Huffman Coding

Java Gradle License

Compression Ratio Graph

Table of Contents


Introduction

This project implements a lossless text file compression and decompression tool using the **Huffman Coding algorithm **. The tool can efficiently compress text files by assigning variable-length binary codes to characters based on their frequency of occurrence, achieving compression ratios of typically 40-50% for text files.

Huffman Coding is a greedy algorithm that builds an optimal prefix code tree, ensuring that no code is a prefix of another, making the compressed data uniquely decodable without data loss.


Highlights

Lossless Compression - Perfect reconstruction of original files
Variable-Length Encoding - More frequent characters get shorter codes
Optimal Compression - Uses Huffman's greedy algorithm for optimal prefix codes
Cross-Platform - Works on Windows, macOS, and Linux
CLI Interface - Easy-to-use command-line interface
Modular Design - Clean separation of concerns with dedicated classes
Comprehensive Tests - Unit and integration tests for reliability
Optimized File Format - Minimal header overhead for better compression ratios


Objective

The primary objectives of this project are to:

  1. Demonstrate the practical implementation of Huffman Coding algorithm in Java
  2. Achieve significant compression of text files while maintaining data integrity
  3. Provide a user-friendly CLI tool for file compression and decompression
  4. Showcase software engineering best practices including modular design, testing, and documentation
  5. Create an educational resource for understanding lossless compression algorithms

Methodology

The compression process follows these key steps:

Compression

  1. Frequency Analysis - Calculate frequency of each character in the input file
  2. Huffman Tree Construction - Build optimal binary tree using a min-heap (priority queue)
  3. Code Generation - Assign binary codes: '0' for left branches, '1' for right branches
  4. File Header Creation - Store character frequencies for decompression
  5. Bit-Level Encoding - Replace characters with their Huffman codes and pack into bytes

Decompression

  1. Header Parsing - Read character frequencies from compressed file
  2. Tree Reconstruction - Rebuild the Huffman tree using stored frequencies
  3. Bit-by-Bit Decoding - Traverse tree according to bits until leaf nodes are reached
  4. Character Output - Write decoded characters to output file

Key Algorithms Used

  • Min-Heap (Priority Queue) - For efficient tree construction
  • Greedy Algorithm - Huffman's approach for optimal code assignment
  • Binary Tree Traversal - For code generation and decoding
  • Bit Manipulation - For efficient binary I/O operations

Project Flow Diagram


Software and Libraries Used

Technology Version Purpose
Java 21.0.5 Core programming language
Gradle 8.14 Build automation and dependency management
JUnit 5 5.10.0 Unit testing framework
IntelliJ IDEA 2024.2.6 Integrated Development Environment

Core Java Libraries

  • java.util.PriorityQueue - Min-heap for Huffman tree construction
  • java.util.HashMap - Frequency counting and code mapping
  • java.io.* - File I/O operations
  • java.nio.file.* - Modern file handling

Directory Structure

HuffmanCompressor/
│
├── build.gradle.kts       # Gradle Kotlin DSL build configuration
├── settings.gradle.kts    # Gradle project settings (rootProject.name, etc.)
├── README.md              # Project documentation
├── .gitignore             # Git ignore rules
├── out/                   # <-- Gradle build outputs (ignored by git)
│   ├── classes/
│   │   ├── java/
│   │   │   ├── main/              # Compiled main source .class files
│   │   │   └── test/              # Compiled test .class files
│   ├── libs/                      # Built JAR files
│   ├── reports/                   # Test + build reports
│   └── ...
│
├── src/
│   ├── main/
│   │   ├── java/
│   │   │   └── com/
│   │   │       └── huffman/
│   │   │           ├── Main.java               # Entry point class
│   │   │           ├── HuffmanNode.java        # Tree node class
│   │   │           ├── HuffmanEncoder.java     # Compression engine
│   │   │           ├── HuffmanDecoder.java     # Decompression engine
│   │   │           ├── FileHandler.java        # File I/O utilities
│   │   │           └── BitWriterReader.java    # Bit-level write/read helper
│   │   └── resources/
│   │       ├── sample.txt           # Test input file 
│   │
│   └── test/
│       ├── java/
│       │   └── com/
│       │       └── huffman/
│       │           ├── HuffmanEncoderTest.java
│       │           ├── HuffmanDecoderTest.java
│       │           └── FileHandlerTest.java
│       └── resources/
│           └── test_sample.txt
│
└── gradle/                 # Gradle wrapper files
│   └── wrapper/
│       ├── gradle-wrapper.jar
│       └── gradle-wrapper.properties
└── ...

Usage

Command Line Interface

Compress a file:

java -jar out/libs/HuffmanCompressor-1.0-SNAPSHOT.jar compress input.txt compressed.bin

Decompress a file:

java -jar out/libs/HuffmanCompressor-1.0-SNAPSHOT.jar decompress compressed.bin restored.txt

Using Gradle Tasks (Development)

Compress

./gradlew runHuffman --args="compress input.txt output.bin"

Decompress

./gradlew runHuffman --args="decompress output.bin restored.txt"

Setup

Prerequisites

  • Java Development Kit (JDK) 17 or higher
  • Git (for cloning the repository)
  • Windows PowerShell or Command Prompt (Windows users)

Building the Project

  1. Clone the repository:
git clone https://github.com/antilneeraj/HuffmanCompressor.git
cd HuffmanCompressor
  1. Build the project:
./gradlew clean build
  1. Run tests:
./gradlew test 
  1. The executable JAR will be generated at: out/libs/HuffmanCompressor-1.0-SNAPSHOT.jar

Setting up Aliases (Windows)

For convenient usage, set up PowerShell aliases:

  1. Open PowerShell as Administrator

  2. Create/Edit your PowerShell profile:

notepad $profile
  1. Add these functions to the profile:
function huffc {
param($inputFile, $outputFile)
java -jar "C:\path\to\your\project\out\libs\HuffmanCompressor-1.0-SNAPSHOT.jar" compress $inputFile $outputFile
}

function huffd {
param($inputFile, $outputFile)
java -jar "C:\path\to\your\project\out\libs\HuffmanCompressor-1.0-SNAPSHOT.jar" decompress $inputFile $outputFile
}

Replace C:\path\to\your\project\ with your actual project path

  1. Save and reload your profile:
. $profile
  1. Now you can use short commands:
huffc input.txt compressed.bin
huffd compressed.bin restored.txt

Examples

Basic Compression Example

1. Create a Sample Text File

echo "This is a sample text file for Huffman compression testing." > sample.txt

Compress the file

java -jar out/libs/HuffmanCompressor-1.0-SNAPSHOT.jar compress sample.txt sample.bin

Check compression ratio

Original: ~62 bytes, Compressed: ~35-40 bytes (typical 35-45% reduction)

Decompress back

java -jar out/libs/HuffmanCompressor-1.0-SNAPSHOT.jar decompress sample.bin recovered.txt

Verify integrity

fc sample.txt recovered.txt # Windows
diff sample.txt recovered.txt # Linux/macOS

Performance Characteristics

  • Compression Ratio: 35-50% size reduction for typical text files
  • Time Complexity: O(n log k) where n = file size, k = unique characters
  • Space Complexity: O(k) for storing the Huffman tree
  • File Types: Works best with .txt, .java, .html, .css, .js, etc.

Future Work

Planned Enhancements

  • GUI Interface - Develop a user-friendly graphical interface
  • Batch Processing - Support for compressing multiple files at once
  • Directory Compression - Compress entire folders recursively
  • Compression Statistics - Detailed reports on compression ratios and performance
  • Advanced Algorithms - Implement LZ77, LZW, or hybrid compression methods
  • Binary File Support - Extend support to non-text file types
  • Progress Indicators - Show compression/decompression progress for large files
  • Compression Level Options - Different optimization levels for speed vs. ratio trade-offs

Algorithmic Improvements

  • Adaptive Huffman Coding - Update tree dynamically as file is processed
  • Canonical Huffman Codes - More efficient code representation
  • Multi-threading - Parallel processing for large files
  • Memory Optimization - Streaming compression for files larger than available RAM

Contributing

Contributions are welcome! Please feel free to submit issues, suggestions, or pull requests.

How to Contribute

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Development Guidelines

  • Follow Java coding conventions
  • Write unit tests for new features
  • Update documentation as needed
  • Ensure all tests pass before submitting

License

This project is licensed under the MIT License


Acknowledgments

  • David A. Huffman - For the Huffman Coding algorithm (1952)
  • GeeksforGeeks - For algorithmic reference and inspiration
  • Java Community - For excellent documentation and libraries

⭐ If you found this project helpful, please consider giving it a star!

About

A simple, lossless text file compressor and decompressor built in Java using Huffman Coding. It shrinks files by giving shorter codes to frequent characters, and restores them exactly when decompressed. Features include a command‑line interface, clean modular design, and efficient bit‑level file handling

Topics

Resources

License

Stars

Watchers

Forks

Languages