Literate Programming in HTML

What Is Literate Programming

In Literate Programming, Tim Daly wrote:

The best programming language is English. Everything else is notation.

Why? he went on to explain:

Consider the best possible world. You've been hired at a company and join a team that is already working on a program. They hand you a book, tell you to go home and read it over the next two weeks. At the end of the two weeks you can work on the program as effectively anyone on the team. The team has successfully communicated from one human to another.

What is in the book? Remember our calculus textbook? It started from the ideas like limits and gradually developed the ideas until they could be expressed in equations. By the time you got to the equations you already understood the concepts. You could look at the equations and see why they matched the text. It is the why that is the important part. It is the part that our programs are missing.

The book you took home uses the same method. You started with the problem in chapter 1. Chapter 2 expresses the ideas needed to solve the problem. The next few chapters expand on each idea, gradually becoming more specific until the idea is reduced to code. By the time you get to the code it should be perfectly clear what the code should look like. Any part of the code you don't understand means that the book needs some additional words.

Literate programming is like when you are explaining your code to a colleague. You have to declare what you were thinking when you wrote the code, step by step, piece by piece. You are telling the whole story: which step is essential, why the order must be so, or what is this used for. All materials are organized in a structure that makes sense to you (not to the compiler).

Technically, literate programming has two components: weave and tangle. By weaving, the source code is converted to printable document, such as PDF or HTML. By tangling, the codes involved in the document are extracted and organized in order.

The basic building blocks are chunks and chunk references. So, for example, we can define a chunk named hello.c:

include int main() { say hello return 0; }

where the chunk references include and say hello are defined elsewhere. As an example, #include <stdio.h> and printf("Hello, world!\n"); Then, by tangling, code are extracted and organized in order:


        #include <stdio.h>
        int main() {
          printf("Hello, world!\n");
          return 0;
        }

The original literate programming designed by Donald Knuth is independent of specific programming languages. It is the pseudo code that represents the algorithm instead of a language specific implementation that is presented in a chunk. The tangling process also translates the pseudo code into a language specific implementation, such as PASCAL (using WEB) or C (using CWEB). In this document, however, we write language specific implementation instead pseudo code in chunk. The reason is that different languages solve the same problem in intrinsically different ways. For example, the trick that uses dictionary to reduce temporal complexity is common in object oriented languages. But, this trick fails in pure functional languages like Haskell where mutable variable such as dictionary implemented by hash-table is not allowed. For this reason, we suggest to use language specific implementation instead of pseudo code in chunk. After all, it is the idea underlying the code (that is, the context) instead of the code itself that is really matters for our understanding.

Why Using HTML

The main reason is that the HTML file can be displayed directly in a browser. So, there is no need to weave. Yet another benefit is that HTML provides a standard for markup languages. Any other markup formats (such as markdown) can be safely converted to HTML.

Comparing with TeX, HTML is much modern and widely used. It can be displayed on any device including mobile phones. There is no need to install a huge TeX Live environment (the minimal version is about 1GB). All you need is a browser.

The only issue is that some languages, such as HTML itself and C++, heavily use characters "<" and ">" in the same way as HTML, such as vector<string> in C++. We have to escape these "tag characters". But this issue is not overall. For example, the "<" character in x < 0 is not "tag-like", thus there is no need to escape.

Chunk in HTML

HTML Elements for Chunks

As usual, we use the "class" attribute to indicate chunk and chunk reference. Thus, block chunk is represented by div element with class "chunk" and a name attribute. Chunk name shall be plain text. For example:


        <div class="chunk" name="hello.c">

          ...code block...

        </div>

And inline chunk is represented by span element with class "chunk" and a name attribute. For example:


        <span class="chunk" name="say hello"> ...one line code... </span>

Chunk reference, which is always inline, is represented by span element with class "chunkref". The referred chunk name locates in the content of the span element. Thus,


        <span class="chunkref">say hello</span>

Different chunks may share the same name. For example:


        <div class="chunk" name="include">
          #include <stdio.h>
        </div>

and then after some explanation,


        <div class="chunk" name="include">
          #include <math.h>
        </div>

While tangling, chunks with the same name are stacked in due order. Thus, the previous example is equivalent to:


        <div class="chunk" name="include">
          #include <stdio.h>
          #include <math.h>
        </div>

Appending Newline

Additionally, we can add a "append-newline" attribute to block chunk. If a block chunk has this attribute, a newline character will be appended to the end of the code block when it is tangled. For example:


        <div class="chunk" name="hello.c" append-newline>
          void say_hello () {
            printf("Hello, world!\n");
          }
        </div>

followed by


        <div class="chunk" name="hello.c">
          void say_hello_again () {
            printf("Hello, world!\n");
          }
        </div>

will be tangled into


        void say_hello () {
          printf("Hello, world!\n");
        }

        void say_hello_again () {
          printf("Hello, world!\n");
        }

There is an empty line at the end of the first function. It makes the code much more readable. Otherwise, functions will be densely packed together. The second block chunk does not have the "append-newline" attribute, so there is no empty line at the end of the second function.

If "append-newline" is set to a number, multiple empty lines will be appended. For example, append-newline="2" means appending two newline characters. This is the case for Python (see PEP8).

How to Use

While you are reading through this document, you may meet the block chunks named literate.js and literate.css. Then, click the head of each block chunk, and the tangled code will popup in a new tab. This is the JavaScript and CSS code for writing literate programming in HTML, just like this one.

If you find this document too long to read, then just download the JavaScript and CSS code from the repository hosted on GitHub (or Gitee for Chinese users). They locate in the literate.js and the literate.css respectively.

After obtaining the two files, put them in your HTML head. For example,


        <link rel="stylesheet" type="text/css" href="literate.css"/>
        <script type="text/javascript" src="literate.js"></script>

Then, remember to weave when onload. For example,


        <script>
          window.onload = weaveAll;
        </script>

Now, you can start your own journey of literate programming!

How to Read this Document

In the rest of this document, we implement the JavaScript code for our purpose. We also add some style to make it pretty. You can try to click the head of each block chunk, and get a surprise (a new tab will popup). Clicking chunk reference is fun too.

Let us continue and (try to) enjoy this trip.

Weaving

Chunks are woven to display in browser. We use JavaScript to convert the HTML elements of chunks and chunk references into the those that are more suitable for display and cooperating with CSS.

Unweaving

Weaving will in-place modify the innerHTML of chunks and chunk references. Then, when we want to tangle them, we need to unweave them first. For avoiding this complexity, we store the original chunk or chunk reference in a new element (the same type as the original) with the class "unwoven" and set it as hidden for not displaying in browser. Therefore, unweaving is nothing but extracting the element with class "unwoven" from the woven. (If not found, throw an error.)

Weaving Chunk Reference

We first deal with chunk reference. Chunk reference is displayed in the format


        ⟨<a href="chunk-name">chunk name</a>⟩

within the original <span class="chunkref"> element. The hyperlink links to the first chunk with the referred chunk name. To do so, we have to add an id attribute to that first chunk.

function weaveChunkRef(chunkRef) { create unwoven span for weaving chunk reference create id and link to the first chunk replace innerHTML for weaving chunk reference }

We create a span element for the unwoven, which stores the original innerHTML before weaving. For not displaying the unwoven, we set to be hidden.

/* Create unwoven span */ var unwoven = document.createElement("span"); unwoven.setAttribute("class", "unwoven"); unwoven.setAttribute("hidden", true); unwoven.innerHTML = chunkRef.innerHTML;

Then, find the first chunk with the referred chunk name, and add an id to it for linking. Since there may be whitespace in the chunk name, which is not valid in ID, we use "-" for replacing.

/* Create id and link to the first chunk */ var id = null; var chunkName = chunkRef.innerHTML; var chunks = document.getElementsByName(chunkName); for (var i = 0; i < chunks.length; i++) { if (chunks[i].getAttribute("class") != "chunk") continue; id = chunkName.replace(" ", "-"); chunks[i].setAttribute("id", id); break; }

If there is no chunk with the referred chunk name, there must be something wrong, we raise an error.

if (id == null) throw new Error(`[weaveChunkRef] no chunk found for "${chunkName}"`);

Then, we create the link to the first chunk.

var link = document.createElement("a"); link.setAttribute("href", `#${id}`); link.setAttribute("class", "chunkref-link"); link.innerHTML = chunkName;

Now, we clean up the chunkRef element and add the link and the unwoven span to it.

/* Replace HTML */ chunkRef.innerHTML = null; chunkRef.appendChild(document.createTextNode("⟨")); chunkRef.appendChild(link); chunkRef.appendChild(document.createTextNode("⟩")); chunkRef.appendChild(unwoven);

Weaving Inline Chunk

Then, consider inline chunk. Inline chunk is displayed in the format


        <span class="inline-chunk-head">⟨chunk name⟩≡</span>
        <code>... one line code ...</code>

within the original <span class="chunk"> element. We add a span element for the head:

/* Create chunk head span */ var chunkHead = document.createElement("span"); chunkHead.setAttribute("class", "inline-chunk-head"); chunkHead.innerHTML = `⟨${chunk.getAttribute("name")}⟩≡`;

And the one line code is wrapped in a code element:

/* Create code */ var code = document.createElement("code"); code.innerHTML = chunk.innerHTML;

Again, we shall append an unwoven span to the inline chunk. So, we simply follow the code in weaveChunkRef with a little adaptation.

function weaveInlineChunk(chunk) { /* Create unwoven span */ var unwoven = document.createElement("span"); unwoven.setAttribute("class", "unwoven"); unwoven.setAttribute("hidden", true); unwoven.innerHTML = chunk.innerHTML; create head for inline chunk create code for inline chunk /* Replace HTML */ chunk.innerHTML = null; chunk.appendChild(chunkHead); chunk.appendChild(code); chunk.appendChild(unwoven); }

Weaving Block Chunk

Finally, consider block chunk. Almost the same as inline chunk, block chunk should be displayed in the format


        <span class="block-chunk-head">⟨chunk name⟩≡</span>
        <pre><code>
          ... code block ...
        </code></pre>

within the original <div class="chunk"> element. But, we add a link to the head, clicking which will tangle the chunk. Thus, the head span turns into


        ⟨<a onclick="tangle(chunkName)">chunkName</a>⟩≡

For the tangle link, we have:

/* Create chunk head span */ var tangleLink = document.createElement("a"); var chunkName = chunk.getAttribute("name"); tangleLink.setAttribute("href", "javascript:void(0)"); tangleLink.setAttribute("onclick", `tangle('${chunkName}')`); tangleLink.innerHTML = chunkName;

Then, we create the head and append everything in order.

var chunkHead = document.createElement("span"); chunkHead.setAttribute("class", "block-chunk-head"); chunkHead.appendChild(document.createTextNode("⟨")); chunkHead.appendChild(tangleLink); chunkHead.appendChild(document.createTextNode("⟩≡"));

The unwoven now becomes a div element. So, the code for weaving block chunk simply an adaptation of weaveInlineChunk.

function weaveBlockChunk(chunk) { /* Create unwoven span */ var unwoven = document.createElement("div"); unwoven.setAttribute("class", "unwoven"); unwoven.setAttribute("hidden", true); unwoven.innerHTML = chunk.innerHTML; chunk head span for block chunk /* Create pre */ var pre = document.createElement("pre"); var code = document.createElement("code"); code.innerHTML = chunk.innerHTML; pre.appendChild(code); /* Replace HTML */ chunk.innerHTML = null; chunk.appendChild(chunkHead); chunk.appendChild(pre); chunk.appendChild(unwoven); }

Regularize Code Block

A widely known problem is the whitespace before the "code block". Browser will display these whitespace, which is not what we intend. To solve this, we have to regularize the code block wrapped by a pre element. For example, code block like this:


        <pre>
          if (i = 0) {
            i++;
          }
        </pre>

has the raw text (\n for newline and \s for whitespace):


        <pre><code>\n\s\s\s\s\s\sif\s(i\s=\s0)\s{\n\s\s\s\s\s\s\s\si++;\n\s\s\s\s\s\s}\n\s\s\s\s</code></pre>

There are six extra whitespace in front of each line, and four whitespace after the while code block. There is also an extra newline in front of the code block.

We are to remove these extra characters.

function regularizeCodeBlock(text) { define regularization }

We need to figure out the indentation (the six extra whitespace in front of each line). We use the whitespace in front of the first line of code as the indentation of the whole code block. We also need to count the number of whitespace in front of each line (whitespaceCount) and subtract indentation from it. Consider the line i++; in the previous example, it has whitespaceCount = 8 and indentation = 6, so this line is indented by 2 whitespace (whitespaceCount - indentation). In addition, a flag for the state of counting is found helpful.

var result = ""; var indentation = 0; var whitespaceCount = 0; var counting = false; for (var i = 0; i < text.length; i++) { walk through text for regularization }

Let us go through the text. When we have not meet any code character, the result is empty. In this case, we omit the newlines. If we have not encountered any character of the code block, but whitespace, indicating that this maybe the first line of code, we increase indentation.

if ((result == "") && (text[i] == ' ')) { indentation++; }

But, it maybe an empty line, a line with only whitespace. So, after a newline break, we reset the indentation, expecting for the first line of code.

else if ((result == "") && (text[i] == '\n')) { indentation = 0; }

Now we have considered the case when result is empty, and only encountering the newline and whitespace. Now, if a code character appears we append it to result.

else if (result == "") { result += text[i]; }

Now, result is not empty any more, and thus indentation is fixed. We continue appending new code character to result until we meet a newline. In this case, we set counting to true, and start counting whitespace. The counting ends when we encounter a code character. In the end, do not forget to append this newline character to result.

else if (text[i] == '\n') { whitespaceCount = 0; counting = true; result += text[i]; }

Then, we continue counting whitespace until we meet a code character.

else if ((counting == true) && (text[i] == ' ')) { whitespaceCount++; }

When we meet a character which is neither newline or whitespace, we find a code character. Then, we stop counting, and append the correct number of whitespace in front of the line of code. (We also check that whitespaceCount is greater than indentation, as it should be.) In the end, we append the code character to result.

else if (counting == true) { counting = false; if (whitespaceCount < indentation) throw new Error("[regularizeCodeBlock] indentation error"); for (var j = 0; j < (whitespaceCount - indentation); j++) result += " "; result += text[i]; }

We have dealt with the whitespace in front of a line of code. Before going into the next line, all we need to do is appending new code characters to result.

else { result += text[i]; }

Up to now, we have solved the problem in front of each line of code. The previous example is regularized to


        <pre><code>if\s(i\s=\s0)\s{\n\s\si++;\n}\n</code></pre>

Notice that there is still an extra newline character in the end of the code block. And there may be multiple ending newline characters in other examples. To remove it, we iterate the result in backward direction and remove newlines until encountering another character.

for (var i = (result.length - 1); i >= 0; i--) { if (result[i] == '\n') { result = result.substring(0, i); } else { break; } }

Now, it becomes what we intend to write. It is safely to return the result.

return result;

We have implemented the function for regularizing code in block chunk. Other code blocks in the HTML file wrapped in <pre><code> element also need regularization. So, we regularize both of them; the order is irrelevant.

function regularizeAll() { regularize block chunk regularize code in pre }

wherein

var divs = document.getElementsByTagName("div"); for (var i = 0; i < divs.length; i++) { if (divs[i].getAttribute("class") == "chunk") { divs[i].innerHTML = regularizeCodeBlock(divs[i].innerHTML); } }

and

var pres = document.getElementsByTagName("pre"); for (var i = 0; i < pres.length; i++) { var codes = pres[i].getElementsByTagName("code"); for (var j = 0; j < codes.length; j++) { codes[j].innerHTML = regularizeCodeBlock(codes[j].innerHTML); } }

Weave All

Now, we have had the all the functions for weaving. The order of weaving chunk references and chunks is irrelevant. So, after regularization, we weave chunk references and chunks one by one. The order of weaving is irrelevant. Here, we weave chunk references first.

function weaveAll() { regularizeAll(); weave chunk references weave chunks }

Weaving chunk references is straight forward. We use a stupid loop to implement it.

/* Weave chunk references */ var chunkRefs = document.getElementsByClassName("chunkref"); for (var i = 0; i < chunkRefs.length; i++) { weaveChunkRef(chunkRefs[i]); }

Weaving chunks is a little more complicated, since chunks can be inline (a span) or block (a div).

/* Weave chunks */ var chunks = document.getElementsByClassName("chunk"); for (var i = 0; i < chunks.length; i++) { var chunk = chunks[i]; if (chunk.tagName == "SPAN") { weaveInlineChunk(chunk); } else if (chunk.tagName == "DIV") { weaveBlockChunk(chunk); } else { throw new Error(`[weave] unknown chunk type: "${chunk.tagName}"`); } }

Tangling

Tangling Each Chunk

The main process of tangling is implemented by a function _tangle, which accepts a string chunkName and returns a string for the tangled code:

function _tangle(chunkName) { var code = ""; define _tangle return code; }

To define _tangle, we first collect all chunks that have the name chunkName. As stated in a previous section, there can be multiple chunks with the same name.

var chunks = document.getElementsByName(chunkName); for (var i = 0; i < chunks.length; i++) { var chunk = chunks[i]; if (chunk.getAttribute("class") != "chunk") continue; tangle each chunk }

For each chunk we find, we first unweave it. Since we will replace chunk references in it with the corresponding tangled code, that is, we will manipulate it in-place, we need to make a copy of it, a deep copy.

var unwoven = unweave(chunk).cloneNode(true);

To find a chunk reference and replace it with the code in the referred chunk, we must take care. Replacing the chunk reference with the corresponding code will modify the children of the unwoven chunk, so using for-loop like


        for (int i = 0; i < unwoven.getElementsByClass("chunkref").length; i++)

leads to mistakes. Instead, we employ a function that get the first child of the unwoven chunk that represents a chunk reference. After replacement, the first child of chunk reference will automatically turns to the next.

/* Deal with chunk references */ var chunkRef = getFirstChildByClass(unwoven, "chunkref"); while (chunkRef != undefined) { replace chunk reference chunkRef = getFirstChildByClass(unwoven, "chunkref"); }

After replacing all chunk references within the unwoven chunk, we append it to the code string. Besides, we also add a newline character between different chunks that share the same chunk name.

code += unwoven.innerHTML; if (i < chunks.length - 1) code += "\n";

If the chunk has an "append-newline" attribute, we append a newline character to the end of the code string according to its value.

/* Append newline */ var appendNewline = chunk.getAttribute("append-newline");

If there is no such attribute, appendNewline will be null. And we set appendNewline to zero (append zero newline).

if (appendNewline == null) { appendNewline = 0; }

And if it has no value (thus appendNewline is an empty string), we append a single newline character.

else if (appendNewline == "") { appendNewline = 1; }

Otherwise, we parse the value as an integer and append that many newline characters.

else if (!isNaN(appendNewline)) { appendNewline = parseInt(appendNewline); }

Finally, if it has incorrect value, we raise an error.

else { throw new Error(`[_tangle] the value of append-newline must be an integer, but got "${appendNewline}"`); }

After determine the value of appendNewline, we append that many newline characters to the end of code.

for (var j = 0; j < appendNewline; j++) code += "\n";

The implementation of getFirstChildByClass is straight forward.

Replacing Chunk Reference

For the replacement, we first get the referred chunk name. To distinguish with the input chunkName, we call it subChunkName.

var subChunkName = unweave(chunkRef).innerHTML;

Then, we get the tangled code by recursion! Just like subChunkName, we call it subCode.

var subCode = _tangle(subChunkName);

Very simple, isn't it? Remark that, if the referred chunk is a block chunk, which involve multiple lines, then we have to indent the second, third, ... lines of the subCode so that they share the same indentation. (The first line has correct indentation, since it is written by you.) So, for a block chunk, we have to determine the indentation and add it to each line.

/* Block chunk needs indentation */ if (chunk.tagName == "DIV") { var indentation = getIndentation(chunkRef); subCode = indent(subCode, indentation); }

Everything is ready. Now we replace the chunk reference with the tangled subCode. Since replaceWith method requires an element (node) as input, we shall wrap the string subCode by a span element.

var subCodeNode = document.createElement("span"); subCodeNode.innerHTML = subCode; chunkRef.replaceWith(subCodeNode);

Display in Browser

By clicking the "⟨chunk name⟩≡", function tangle(chunkName) is triggered. We popup a new window to show the tangled code, thus:

function tangle(chunkName) { var code = _tangle(chunkName); create HTML for the tangled code display HTML in a new window }

We use the following HTML to display the tangled code:

html = `<!doctype html><html>` + `<head><meta charset="utf-8"><title>${chunkName}</title>` + `<style>body { background-color: #c7edcc; }</style></head>` + `<body><pre><code>${code}</code></pre></body>` + `</html>`;

This HTML is to be displayed in a new window:

var win = window.open('about:blank', '_blank'); win.document.write(html); win.document.close();

Utility Functions

Before ending this section, we have to show how the functions getIndentation and indent are implemented. They turns out to be simple. The getIndentation function just returns the indentation of the first line of the code. For example, in the following code,


        void main() {
          <span class="chunkref">say hello</span>;
        }

the chunk reference (the second line) has indentation 2. We have to figure it out. To do so, we first extract all the string before the chunk reference, which, in this example, is


        void main() {
          (end here)

This is done by the following function (We follow this answer. This implementation, however, cannot deal with HTML tags properly. Precisely, any HTML tag in the returned textBefore are excluded. But since it is called for tangling, there is not HTML tags in the text before the node, this implementation is still valid. Temporally, I cannot figure out a better implementation.):

function getTextBefore(node) { const rangeBefore = document.createRange(); rangeBefore.setStart(node.parentNode.firstChild, 0); rangeBefore.setEndBefore(node); const textBefore = rangeBefore.toString(); return textBefore; }

Then, directly, we count how many whitespace between the front of the chunk chunk reference and beginning of the line.

The indent function is also straight forward. We simply prepend indentation many whitespace before each line.

On Loading

Congratulations! You are in the end of this journey. Everything left is to weave this HTML file when it is loaded in your browser. (Recall that regularization has been included as a part of weaveAll.) So, we shall write in literate.js:


        onload = weaveAll;

But, what if you want to load more JavaScript functions while loading the document? Who knows what you are going to do. So, it would be left to you, the user, to determine what the onload should be like.

Up to now, we have completed the JavaScript code for literate programming.

Style

Wait a minute for adding some style to make the document pretty. We use CSS to style the chunks and chunk references.

Inline Code

We add a thin border to inline code and pad it a little to make it stand out.

code { border-style: dashed; border-width: 0.1em; padding-left: 0.3em; padding-right: 0.3em; }

Code Block

We put code block into a block, of course. The block is wrapped by a dashed grey border. We also adjust the position of the code, to move it toward the center of the border.

pre code { display: block; border-style: dashed; border-color: grey; border-width: medium; overflow-x: auto; margin: 1em; padding-left: 3em; padding-right: 3em; padding-top: 1em; padding-bottom: 1em; }

Chunk Reference

For chunk reference, we delete the ugly underline of hyperlink, which links to the first chunk with the referred name, and pad a little for making it pretty. The original hyperlink color is too bright, we use dark red instead. (Comparing with dark blue, red color is more distinguishable.)

.chunkref a { color: darkred; text-decoration: none; padding-left: 0.1em; padding-right: 0.1em; }

Chunk

For each chunk, we add a narrow margin to make it appear a little isolated from the content.

div.chunk { margin-top: 1em; margin-bottom: 1em; }

Block Chunk

For block chunk, we make the head bold, indicating a definition (we always make something bold when it is defined).

.block-chunk-head { font-weight: bold; }

For the link in the block chunk, we use the same style as chunk reference.

.block-chunk-head a { text-decoration: none; padding-left: 0.1em; padding-right: 0.1em; color: darkred; }

Inline Chunk

For inline chunk, we simply make the head bold, indicating a definition, and add a very narrow padding to isolate it from the content.

.inline-chunk-head { font-weight: bold; padding-left: 0.1em; padding-right: 0.1em; }