Skip to content

Reads a markdown file, or reads many md files in a big directory, and writes all urls contained in these files to a CSV file

Notifications You must be signed in to change notification settings

file-tools/file-links-to-csv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

REAMDE

Ideas

  • Instead of "DirectoryToCSV" call this "MDReports" or "DirReports" or "MDStats"or "FlatStats"
  • Realizations - Notion is nice because you habe tables and text ... markdown docs are limited to mostly text but this atleast gives powerful survery-of-your text capabilities

More ideas

  • AUTHOR REPORTS -- by RK, by Tim, by Jane ... if Authors name stored in frontmatter could do this
  • Word count - added a Wordcount feature but this is best with just a report on PAGES instead of LINKS
  • Backlinks - Could calculate this ... find ones without any
  • TAG REPORTS and sorted - Also from frontamtterYou'r
  • JSON Export as well ...

Process ..

Script Overview

This script will:

  1. Define configurable parameters:

    • $rootDirectory: Directory to scan for markdown files
    • $orderBy: Sort order ("default", "creation_newest", "modified_newest", or "domain")
    • $linkType: Type of links to extract ("internal" or "external")
  2. Create a CSV file with appropriate headers

  3. Recursively scan for all markdown files (.md and .markdown extensions)

  4. For each markdown file:

    • Extract links based on $linkType:
      • External: Both markdown-style links and plain URLs starting with http(s)
      • Internal: Only markdown-style links to local files
    • Extract root domains from URLs (for external links)
    • Get accurate file creation dates (specifically for macOS)
    • Get file modification dates
    • Store all information for sorting
  5. Sort the collected data based on $orderBy:

    • "default": Original scan order
    • "creation_newest": Newest files first
    • "modified_newest": Most recently modified first
    • "domain": Alphabetically by root domain

CSV Output Columns

  1. Domain: Root domain of the URL (for external links)
  2. File: Just the filename
  3. URL: The complete extracted URL
  4. Link Name: The text of the markdown link (if different from URL)
  5. Source File: Full path to the file where the link was found
  6. Creation Date: Accurate creation date of the source file
  7. Last Modified Date: Last modification date of the source file

Usage

  1. Save as Run.php
  2. Configure parameters:
    $rootDirectory = __DIR__ . '/docs';
    $orderBy = "default";  // or "creation_newest", "modified_newest", "domain"
    $linkType = "external";  // or "internal"
  3. Run: php Run.php

Notes

  • Creates /docs directory if it doesn't exist
  • Uses stat command on macOS for accurate creation dates
  • Groups links by domain when using domain sorting
  • Filters internal/external links based on $linkType
  • Outputs to extracted_links.csv in the same directory as the script

About

Reads a markdown file, or reads many md files in a big directory, and writes all urls contained in these files to a CSV file

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages