GitHub - file-tools/file-links-to-csv: Reads a markdown file, or reads many md files in a big directory, and writes all urls contained in these files to a CSV file

REAMDE

Ideas

Instead of "DirectoryToCSV" call this "MDReports" or "DirReports" or "MDStats"or "FlatStats"
Realizations - Notion is nice because you habe tables and text ... markdown docs are limited to mostly text but this atleast gives powerful survery-of-your text capabilities

More ideas

AUTHOR REPORTS -- by RK, by Tim, by Jane ... if Authors name stored in frontmatter could do this
Word count - added a Wordcount feature but this is best with just a report on PAGES instead of LINKS
Backlinks - Could calculate this ... find ones without any
TAG REPORTS and sorted - Also from frontamtterYou'r
JSON Export as well ...

Process ..

Script Overview

This script will:

Define configurable parameters:
- $rootDirectory: Directory to scan for markdown files
- $orderBy: Sort order ("default", "creation_newest", "modified_newest", or "domain")
- $linkType: Type of links to extract ("internal" or "external")
Create a CSV file with appropriate headers
Recursively scan for all markdown files (.md and .markdown extensions)
For each markdown file:
- Extract links based on $linkType:
  - External: Both markdown-style links and plain URLs starting with http(s)
  - Internal: Only markdown-style links to local files
- Extract root domains from URLs (for external links)
- Get accurate file creation dates (specifically for macOS)
- Get file modification dates
- Store all information for sorting
Sort the collected data based on $orderBy:
- "default": Original scan order
- "creation_newest": Newest files first
- "modified_newest": Most recently modified first
- "domain": Alphabetically by root domain

CSV Output Columns

Domain: Root domain of the URL (for external links)
File: Just the filename
URL: The complete extracted URL
Link Name: The text of the markdown link (if different from URL)
Source File: Full path to the file where the link was found
Creation Date: Accurate creation date of the source file
Last Modified Date: Last modification date of the source file

Usage

Save as Run.php

Configure parameters:

$rootDirectory = __DIR__ . '/docs';
$orderBy = "default";  // or "creation_newest", "modified_newest", "domain"
$linkType = "external";  // or "internal"

Run: php Run.php

Notes

Creates /docs directory if it doesn't exist
Uses stat command on macOS for accurate creation dates
Groups links by domain when using domain sorting
Filters internal/external links based on $linkType
Outputs to extracted_links.csv in the same directory as the script

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
Run.php		Run.php

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

REAMDE

Ideas

More ideas

Process ..

Script Overview

CSV Output Columns

Usage

Notes

About

Uh oh!

Releases

Packages

Languages

file-tools/file-links-to-csv

Folders and files

Latest commit

History

Repository files navigation

REAMDE

Ideas

More ideas

Process ..

Script Overview

CSV Output Columns

Usage

Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages