Some Awk scripts to generate documentation from Markdown-formatted comments in source code.
The d.awk script creates documentation for languages that use /* */
for multiline comments, like C, C++, Java, C#, JavaScript.
The file hashd.awk does the same, but for languages that use # symbols
for comments, like Perl, Python, Ruby, and others.
For example, add a comment like this to your source file:
/**
 * My Project
 * ==========
 *
 * This is some _Markdown documentation_ in a `source
 * file`.
 */
int main(int argc, char *argv[]) {
    printf("hello, world");
    return 0;
}Then use Awk to run the d.awk script on it like so:
# Run the script on a file:
./d.awk file.c > doc.html
# alternatively: awk -f d.awk file.c > doc.htmlThe text within the /** */ comment blocks are parsed as Markdown, and
rendered as HTML. Comments may also start with three slashes: /// Markdown here.
A typical use case to bundle the d.awk script with your project's source and
to then add a docs target to the Makefile:
docs: api-doc.html
api-doc.html: header.h d.awk
    $(AWK) -f d.awk $< > $@
The script can also generate HTML from a normal Markdown document using the -v Clean=1 command-line option:
./d.awk -v Clean=1 README.md > README.htmlThere are additional scripts in the distribution:
- hashd.awk - Like d.awk, but for languages that use#symbols for comments
- mdown.awk - Generates HTML from a normal Markdown file.
- xtract.awk - Extracts the Markdown comments of a source file.
- wrap.awk - Formats a Markdown text file to fit on a page.
It supports most of Markdown:
- Bold, italic and monospacedtext.
- Both header styles
- Horizontal rules
- Ordered and Unordered lists
- Code blocks and block quotes
- Hyperlinks and images
- A large number of HTML tags can be embedded in a document
- The output has a dark mode toggle.
It also supports a number of extensions, mostly based on GitHub syntax:
- ```-style code blocks- You can specify a language according to Github's Syntax Highlighting
rules, for example ```java- It uses highlight.js library for the syntax highlighting.
- This causes the generated HTML to pull in a third-party script.
It can be disabled by specifying -vHighlight=0on the command line.
 
 
- You can specify a language according to Github's Syntax Highlighting
rules, for example 
- Tables, using the same syntax as GitHub-flavoured markdown.
- Mermaid diagrams are supported through the same ```mermaidsyntax as in GitHub-flavoured markdown- This causes the generated HTML to pull in a third-party script.
It can be disabled by specifying -vMermaid=0on the command line.
 
- This causes the generated HTML to pull in a third-party script.
It can be disabled by specifying 
- MathJax support for rendering mathematical expressions, using the same sytax
as GitHub-flavoured markdown.
- This causes the generated HTML to pull in a third-party script.
It can be disabled by specifying -vMathjax=0on the command line.
 
- This causes the generated HTML to pull in a third-party script.
It can be disabled by specifying 
- [x]GitHub-style task lists
- MultiMarkdown-style footnotes and abbreviations.
- GitHub-style alerts
- Definition Lists
- Backslash at the end of a line  
 forces a line break.
- There is a special \![toc]mode that generates a Table of Contents automatically.
The file demo.c in the distribution serves as an example, user guide and test at the same time.
d.awk is a inspired by the Javadoc and Doxygen tools which generate
HTML documentation from comments in source code.
It is meant for programming languages like C, C++ or JavaScript that use the
/* */ syntax for comments (it will work with Java and C#, though the
existence of bundled documentation tools for those languages makes it
redundant).
It has two distinguishing features:
Firstly, it is written in the ubiquitous Awk language. You can distribute the
d.awk script with your project's source code and your users will be able to
generate documentation without requiring additional 3rd party tools.
Secondly, the documentation use Markdown for text formatting, which has several advantages:
- It is well known and widely used.
- It reads easily and won't clutter your code comments with markup tokens.
The included Makefile demonstrates what the different scripts in the repository are and how they're meant to be used.
Comments must start with /**, and each line in the comment must start with a
* - this is so you can control which comments are included in the
documentation.
To generate documentation from a file demo.c, run the d.awk script on it
like so:
./d.awk demo.c > doc.htmlOr to use it in clean mode, which treats the input file as a normal Markdown file:
./d.awk -v Clean=1 README.md > doc.htmlThe file demo.c in the distribution provides a demonstration of all the
features and the supported syntax.
Configuration options can be set in the BEGIN block of the script, or passed
to the script through Awk's -v command-line option:
- -v Title="My Document Title"to set the- <title/>of the HTML
- -v Clean=1to treat the input file as a normal Markdown file. Use this option to create HTML documents from your project's README.md and related files.
- -v StyleSheet=style.cssto use a separate file as style sheet.
- -v TopLinks=1to have links to the top of the document next to headers.
- -v Highlight=0disable syntax highlighting.
 By default a- ```lang-style block will cause the library to pull in the highlight.js library to syntax highlight the block in the language- lang.
 This switch disables that functionality.
- -vMermaid=0disable Mermaid diagrams.
- -vMathjax=0disable MathJax mathematical expression rendering.
- -v HideToCLevel=nspecifies the level of the Table of Contents that should be collapsed by default. For example, a value of 3 means that headers above level 3 will be collapsed in the Table of Contents initially.
- -v classic_underscore=1words_with_underscores behave like old markdown where the underscores in the word counts as emphasis. The default behaviour is to have- words_like_thisnot contain any emphasis.
The stylesheet for the output HTML can also be modified at the bottom of the script.
Like d.awk, but generates documentation for programming languages that uses
# symbols for comments.
For example, to generate an HTML file from the comments at the top of the d.awk script use the this command:
./hashd.awk d.awk > d.awk.htmlThe first comment must start with two # symbols. The following is an example
in Python:
##
# My Project
# ==========
#
# This is some _Markdown documentation_ in a `source
# file`.
#
print("Hello, World!")If you have a language that uses a different symbol for comments, you can use this file and modify the regular expressions at the top to match your language's comment syntax.
Creates an HTML document from a Markdown file.
It is functionally equivalent to using d.awk with the -v Clean=1 command
line option.
For example, to generate HTML from this README.md file, type:
./mdown.awk README.md > README.htmlThe command line options are the same as d.awk's.
This script extracts the comments from a source file, without processing it as Markdown.
./xtract.awk demo.c > demo.mdA use case is to extract the comments from a source file into a new Markdown document, such as a GitHub wiki page.
wrap.awk makes a Markdown document more readable by word wrapping long lines
to fit into 80 characters.
For example, to use it on this README.md file, run
cp README.md README.md~
./wrap.awk README.md~ > README.mdTo specify a different width, use -v Width=60 from the command line.
The license is officially the MIT-0 license (see the file LICENSE for details), but the individual scripts may be redistributed with this notice:
(c) 2016-2025 Werner Stoop
Copying and distribution of this file, with or without modification,
are permitted in any medium without royalty provided the copyright
notice and this notice are preserved. This file is offered as-is,
without any warranty.
The reasoning is that if you're just using one of the scripts in this repository to create documentation for your projects then I'd like for you to be able to include the script in your project without worries.
- https://en.wikipedia.org/wiki/AWK
- https://en.wikipedia.org/wiki/Markdown
- https://tools.ietf.org/html/rfc7764
- http://daringfireball.net/projects/markdown/syntax
- https://guides.github.com/features/mastering-markdown/
- http://fletcher.github.io/MultiMarkdown-4/syntax
- http://spec.commonmark.org
r-lyeh's stddoc.c also generates HTML documentation from Markdown comments in source code, but takes a very different approach to achieve it: It simply extracts the comments, and appends Markdeep's tags to the output.
Here is an Awk script that more or less achieves the same thing:
#! /usr/bin/awk -f
BEGIN { print "<meta charset=\"utf-8\">" }
/\/\*\*/ {
    sub(/^.*\/\*/,"");
    incomment=1;
}
incomment && /\*\// {
    incomment=0;
    sub(/[[:space:]]*\*\/.*/,"");
    sub(/^[[:space:]]*\*[[:space:]]?/,"");
    print
}
incomment && /^[[:space:]]*\*/ {
    sub(/^[[:space:]]*\*[[:space:]]?/,"");
    print
}
!incomment && /\/\/\// {
    sub(/.*\/\/\/[[:space:]]?/,"");
    print
}
END {
    print "<!-- Markdeep: -->";
    print "<style class=\"fallback\">body{visibility:hidden;white-space:pre;font-family:monospace}</style>";
    print "<script>markdeepOptions={tocStyle:\"auto\"};</script>";
    print "<script src=\"https://morgan3d.github.io/markdeep/latest/markdeep.min.js\" charset=\"utf-8\"></script>";
    print "<script>window.alreadyProcessedMarkdeep||(document.body.style.visibility=\"visible\")</script>"
}Markdeep has significantly more features than d.awk, but the tradeoff is that it
has some incompatibilities with GitHub-flavoured Markdown and it requires the
markdeep.js file to be distributed with the documentation.
There is also TeXMe as an alternative to Markdeep.
yiyus' md2html.awk is an Awk script that generates HTML from Markdown with a much cleaner parser. I only discovered it long after I wrote my own Markdown parser.
Things I'd like to add/fix in the future:
- wrap.awkadds too much whitespace to code blocks...
-  It is known to not work with versions of mawk prior to 1.3.4.
(The default Awk on some Debian-derived distros is version 1.3.3).
Please upgrade mawk, or use Gawk instead.
- The issue seems that the older Mawk was created while POSIX was
still in draft and misses some regex features like character classes,
i.e. [:space:], which are quite important ford.awk(see here)
 
- The issue seems that the older Mawk was created while POSIX was
still in draft and misses some regex features like character classes,
i.e. 
-  Speaking of which, I've been using gawk mostly and I should try using the
other Awks (mawk, nawk etc) to just make sure it's portable.
- It worked with Mawk (specifically mawk-snapshots-t20250131)
- The scripts also worked with the Windows version of Mawk I found here
- It worked with One True Awk (specifically tag 20250116)
- It worked with the Windows version of Nawk from here
- The output in the README-alt.htmlfile was broken
 
- The output in the 
- It worked with Ben Hoyt's goawk
- I tried it with Raymond Gardner's wak
- It had a problem with hyperlinks that I didn't get around to investigating
- There was also an issue with stray "000" strings ending up in the output that I couldn't explain
 
 
- The table of contents is in a <div>that ends up inside a<p>, which is incorrect.
- I've considered adding support for typograms but it seems it is no longer being maintained.