Skip to content

A Rust library designed to facilitate the conversion of various document formats into markdown text.

License

Notifications You must be signed in to change notification settings

uhobnil/markitdown-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

903b3de · Apr 17, 2025

History

23 Commits
Feb 25, 2025
Feb 25, 2025
Jan 18, 2025
Jan 14, 2025
Apr 17, 2025
Apr 17, 2025
Jan 14, 2025
Apr 17, 2025

Repository files navigation

markitdown-rs

markitdown-rs is a Rust library designed to facilitate the conversion of various document formats into markdown text. It is a Rust implementation of the original markitdown Python library.

Features

It supports:

  • Excel(.xlsx)
  • Word(.docx)
  • PowerPoint
  • PDF
  • Images
  • Audio
  • HTML
  • CSV(UTF-8)
  • Text-based formats (.xml, .rss, .atom)
  • ZIP

Usage

Command-Line

Installation

cargo install markitdown

Convert a File

markitdown path-to-file.pdf

Or use -o to specify the output file:

markitdown path-to-file.pdf -o document.md

Rust API

Installation

Add the following to your Cargo.toml:

[dependencies]
markitdown = "0.1.9"

Initialize MarkItDown

use markitdown::MarkItDown;

let mut md = MarkItDown::new();

Convert a File

use markitdown::{ConversionOptions, DocumentConverterResult};

let options = ConversionOptions {
    file_extension: Some(".xlsx".to_string()),
    url: None,
    llm_client: None,
    llm_model: None,
};

let result: Option<DocumentConverterResult> = md.convert("path/to/file.xlsx", Some(options));

// To use Large Language Models for image descriptions, provide llm_client and llm_model, like:

let options = ConversionOptions {
    file_extension: Some(".jpg".to_string()),
    url: None,
    llm_client: Some("gemini".to_string()),
    llm_model: Some("gemini-2.0-flash".to_string()),
};

let result: Option<DocumentConverterResult> = md.convert("path/to/file.jpg", Some(options));

if let Some(conversion_result) = result {
    println!("Converted Text: {}", conversion_result.text_content);
} else {
    println!("Conversion failed or unsupported file type.");
}

Register a Custom Converter

You can extend MarkItDown by implementing the DocumentConverter trait for your custom converters and registering them:

use markitdown::{DocumentConverter, MarkItDown};

struct MyCustomConverter;

impl DocumentConverter for MyCustomConverter {
    // Implement the required methods here
}

let mut md = MarkItDown::new();
md.register_converter(Box::new(MyCustomConverter));

License

MarkItDown is licensed under the MIT License. See LICENSE for more details.

About

A Rust library designed to facilitate the conversion of various document formats into markdown text.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages