diff --git a/changelog b/changelog index 5d04a1e..d2a5e23 100644 --- a/changelog +++ b/changelog @@ -1,3 +1,6 @@ +20111118 wxh src/axiom-website/patches.html 20111118.01.tpd.patch +20111118 tpd src/axiom-website/litprog.html added +20111118 tpd src/axiom-website/documentation.html add litprog.html 20111117 wxh src/axiom-website/patches.html 20111117.01.tpd.patch 20111117 tpd src/interp/c-doc.lisp treeshake compiler 20111117 tpd books/bookvol9 treeshake compiler diff --git a/src/axiom-website/documentation.html b/src/axiom-website/documentation.html index f6a5831..c9cbd91 100644 --- a/src/axiom-website/documentation.html +++ b/src/axiom-website/documentation.html @@ -123,6 +123,8 @@

Why Literate Programming?



+An example literate program in HTML +

I believe that the time is ripe for significantly better documentation of programs, and that we can best achieve this by considering programs to be diff --git a/src/axiom-website/litprog.html b/src/axiom-website/litprog.html new file mode 100644 index 0000000..04b3383 --- /dev/null +++ b/src/axiom-website/litprog.html @@ -0,0 +1,966 @@ + + + + Example of Literate Programming in HTML + + + + + +
+
+ 1.0    Literate Programming - A Gentle Introduction
+  1.1     What is the problem?
+  1.2     Is there a better way?
+  1.3     Literate Programming -- The basic idea
+  1.3.1     Literate Programming is NOT documentation
+  1.3.2     Literate Programming is a change in mindset
+  1.3.3     Literate Programming reduces bugs
+  1.4     Understanding the literate programming tangle tool
+  1.4.1     Motivation, or getchunk
+  1.4.2     All of the literate programming tools
+ 2.0    The Tangle Program
+  2.1     Understanding the story of the program
+  2.2     The main task
+  2.3     A complication with HTML
+  2.4     Finding the chunk
+  2.5     Finding the next line in the buffer
+  2.6     At last, finding the chunk
+  2.7     Printing the chunk we found
+  2.8     The simple case, print the line
+  2.9     Checking chunks for getchunk tags
+  2.10    Getting a new chunk name from the getchunk tag
+  2.11    Finding the end of the chunk we are printing
+  2.12    Final cleanup and housekeeping. The preamble
+ 3.0    The tangle source code
+ 4.0    What have we learned?
+ 
+
+
+ +

1.0 Literate Programming - A Gentle Introduction

+

1.1 What is the problem?

+

+In the 1970s I worked on a PDP11/40, which is the same class of +machine that was used to develop UNIX. My PDP had 8192 bytes of +memory. The editor used 4096 bytes and left the other 4096 bytes for +an edit buffer. +

+This meant that the largest file I could create at any time was 4096 +bytes. As a result, C programs were limited in size. To get around +this problem we used include files, libraries of code, +linkers, and overlay loaders. Segment registers exist on the x86 +processors to help overlay loaders. +

+People program today the same way I programmed in the 1970s. We write +programs that are made of tiny pieces of sand in hundreds of little +files spread across dozens of directories. In order to find anything +we use grep to search files, or we create tools to pre-search +the files such as Eclipse and IntelliJ. +

+Imagine taking the same approach to calculus. Create a directory for +each chapter. Take each equation in the chapter and put it in a file. +Do this for every chapter. Throw away the textbook. Now try to learn +calculus. +

+That's the way we program today, just like I did in the 1970s. +

+We program to communicate to the machine, not to communicate to people.
+That's the problem. +

+ +

1.2 Is there a better way?

+

+Consider the best possible world. You've been hired at a company and +join a team that is already working on a program. They hand you a book, +tell you to go home and read it over the next two weeks. At the end of +the two weeks you can work on the program as effectively anyone on the +team. The team has successfully communicated from one human to another. +

+What is in the book? +

+Remember our calculus textbook? It started from the ideas like limits +and gradually developed the ideas until they could be expressed in +equations. By the time you got to the equations you already understood +the concepts. You could look at the equations and see why they matched +the text. It is the why that is the important part. It is the +part that our programs are missing. +

+The book you took home uses the same method. You started with the problem +in chapter 1. Chapter 2 expresses the ideas needed to solve the problem. +The next few chapters expand on each idea, gradually becoming more specific +until the idea is reduced to code. By the time you get to the code it +should be perfectly clear what the code should look like. Any part of +the code you don't understand means that the book needs some additional +words. +

+

1.3 Literate Programming -- The basic idea

+

+It is rather pointless to have code in a book, right? Someone could +change the code and then the book would be out of date. +

+The way to solve this problem is to extract the code directly from the +book. That way, every time you change the code in the book you change +the actual program. +

+How hard is it to write an extraction program that can find the code in +the book and extract it to some files? Well, lets suppose we name each +section of code and call it a chunk. +

+We need a program that lets us find a chunk in a document and extract it. +The traditional name for such a program is tangle. The tangle +program takes two arguments, the name of the book and the name of the +chunk: +

+   tangle mybookname chunkname
+
+

+With this simple tool you can now write books that contain the actual +source code. +

+

1.3.1 Literate Programming is NOT documentation

+

+The first reaction is that this is a new, painful form of documentation. +It is not. Documentation is a how construct. It is a way of +explaining how a program works or how an application interface should +be used. +

+Literate programming is about why. It explains the ideas. It is +communication from one person to another. +

+The why becomes important when you have to maintain and modify +a program. You can perfectly understand how a subroutine, +module, or class works. You can explain its input and outputs. What +you don't understand is why it exists. Nobody writes that +down. So when you are joining a group and given the directory +containing thousand of pieces of code, you have no idea why the +program is structured the way it is. +

+

1.3.2 Literate Programming is a change in mindset

+

+In order to do literate programming you have to change the way you +think. You have to write english text to communicate ideas. You have +to move from ideas to implementation in a way that make the story +clear. +

+If programmers were writing a James Bond novel they would create a +file for JamesBond, MissMoneyPenny, Q, and TheBadGuys. They would +create a directory for Scenes with subdirectories for Scene1, Scene2, +and so on. Then they would box up the whole tree, send it to the +audience, and tell them +

+   When this program is run, the bad guy dies
+
+

+We live by stories. We tell stories and we remember stories. +Characters do things and the author tries to convey their motivations +(the why). Scenes, flashy cars, trick watches, and other items +are added to solve local problems in the story. The whole story moves +from the beginning to the end. You can remember the story and you can +find places where you think it is weak or cheesy or bad. +

+Subroutines, modules, or classes are like characters in a story. They +need motivation. They need the why. The factory classes, the +singletons, the garbage collection routines, and other "items" exist +to solve local problems in the story of the program. +

+Literate programming is about writing the story of your program. What +are you trying to solve? How are you solving it? Why did you introduce +piece of code? Is it part of the main story or is it an item that helps +solve a local problem? +

+

1.3.3 Literate Programming reduces bugs

+

+We are all conditioned to follow stories. We do it all the time. +If I present you with a piece of code without the story then you +have to struggle to understand it. Take, for example, this: +

+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/mman.h>
+#include <sys/types.h>
+#include <fcntl.h>
+
+/* forward reference for the C compiler */
+int getchunk(char *chunkname);
+
+char *chunkbegin = "<pre id=\"";     int chunkbeginlen = 9;
+char *chunkend = "</pre>";           int chunkendlen = 6;
+char *chunkget = "<getchunk id=\"";  int chunkgetlen = 14;
+
+/* a memory mapped buffer copy of the file */
+char *buffer;
+int bufsize;
+
+/* return the length of the next line */
+int nextline(int i) {
+  int j;
+  if (i >= bufsize) return(-1);
+  for (j=0; ((i+j < bufsize) && (buffer[i+j] != '\n')); j++);
+  return(j);
+}
+
+/* output the line we need */
+int printline(int i, int length) {
+  int j; 
+  for (j=0; j<length; j++) { putchar(buffer[i+j]); }
+  printf("\n");
+}
+
+/* handle <pre id="chunkname">            */
+/* is this chunk name we are looking for? */
+int foundchunk(int i, char *chunkname) {
+  if ((strncmp(&buffer[i],chunkbegin,chunkbeginlen) == 0) &&
+      (strncmp(&buffer[i+9],chunkname,strlen(chunkname)) == 0) &&
+      (buffer[i+chunkbeginlen+strlen(chunkname)] == '"') &&
+      (buffer[i+chunkbeginlen+strlen(chunkname)+1] == '>')) return(1);
+  return(0);
+}
+
+/* handle </pre>   */
+/* is it really an end? */
+int foundEnd(int i) {
+  if (strncmp(&buffer[i],chunkend,chunkendlen) == 0) {
+    return(1); 
+  }
+  return(0);
+}
+
+/* handle <getchunk id="chunkname"/> */
+/* is this line a getchunk?          */
+int foundGetchunk(int i, int linelen) {
+  int len;
+  if (strncmp(&buffer[i],chunkget,chunkgetlen) == 0) {
+   for(len=1; ((len < linelen) && (buffer[i+chunkgetlen+len] != '\"')); len++);
+   return(len);
+  }
+  return(0);
+}
+
+/* Somebody did a getchunk and we need a copy of the name */
+/* malloc string storage for a copy of the getchunk name  */
+char *getChunkname(int k, int getlen) {
+  char *result = (char *)malloc(getlen+1);
+  strncpy(result,&buffer[k+chunkgetlen],getlen);
+  result[getlen]='\0';
+  return(result);
+}
+
+/* print lines in this chunk, possibly recursing into getchunk */
+int printchunk(int i, int chunklinelen, char *chunkname) {
+  int j;
+  int k;
+  int linelen;
+  char *getname;
+  int getlen = 0;
+  for (k=i+chunklinelen+1; ((linelen=nextline(k)) != -1); ) {
+    if ((getlen=foundGetchunk(k,linelen)) > 0) {
+       getname = getChunkname(k,getlen);
+       getchunk(getname);
+       free(getname);
+       k=k+getlen+17;
+    } else {
+    if ((linelen >= chunkendlen) && (foundEnd(k) == 1)) {
+      return(k+chunkbeginlen);
+    } else {
+      printline(k,linelen);
+      k=k+linelen+1;
+    }
+  }}
+  return(k);
+}
+
+/* find the named chunk and call printchunk on it */
+int getchunk(char *chunkname) {
+  int i;
+  int j;
+  int linelen;
+  int chunklen = strlen(chunkname);
+  for (i=0; ((linelen=nextline(i)) != -1); ) {
+    if ((linelen >= chunklen+11) && (foundchunk(i,chunkname) == 1)) {
+      i=printchunk(i,linelen,chunkname);
+    } else {
+      i=i+linelen+1;
+    }
+  }
+  return(i);
+}
+
+void fixHTMLcode() {
+  int point = 0;
+  int mark = 0;
+  int i=0;
+  for(point = 0; point < bufsize;) {
+    if ((buffer[point] == '&') &&
+        (strncmp(&buffer[point+1],"lt;",3) == 0)) {
+      buffer[mark++] = 60;
+      point = point + 4;  
+    } else
+    if ((buffer[point] == '&') &&
+        (strncmp(&buffer[point+1],"gt;",3) == 0)) {
+      buffer[mark++] = 62;
+      point = point + 4;  
+    } else
+      buffer[mark++] = buffer[point++]; 
+  }
+  bufsize = mark;
+}
+
+/* memory map the input file into the global buffer and get the chunk */
+int main(int argc, char *argv[]) {
+  int fd;
+  struct stat filestat;
+  if ((argc < 2) || (argc > 3)) { 
+    perror("Usage: tangle filename chunkname");
+    exit(-1);
+  }
+  fd = open(argv[1], O_RDONLY);
+  if (fd == -1) {
+    perror("Error opening file for reading");
+    exit(-2);
+  }
+  if (fstat(fd,&filestat) < 0) {
+    perror("Error getting input file size");
+    exit(-3);
+  }
+  bufsize = (int)filestat.st_size;
+  buffer = (char *)malloc(bufsize);
+  read(fd,buffer,bufsize);
+  fixHTMLcode(); 
+  getchunk(argv[2]);
+  close(fd);
+  return(0);
+}
+
+
+

+Odds are good you just skipped over it, right? It is pretty much just a +long pile of noise. It really doesn't matter if you are a C programmer +or not. It is just noise. +

+

1.4 Understanding the literate programming tangle tool

+

+We write literate programs to communicate to both people and machines. +Since they are combined in a single document we need a markup language +to distinguish human text from machine text as well as a markup language +to write readable english. +

+Literate programs don't care what you use as a documentation markup +language. I prefer latex but for this example we'll use HTML. +

+HTML has a set of markup tags. One of the markup tags is the pre +tag. Anything between the beginning of a <pre> tag and the +trailing </pre> tag is quoted and +printed exactly as written. +

+The pre tag also has an id field which lets us put a +name on the tag. So we can have many named pre tags in a file. +

+The tangle program reads the HTML file and extracts the text in a +pre chunk. That's almost all of the magic. +

+

1.4.1 Motivation, or getchunk

+

+Remember, though, that we wanted to motivate ideas in a program. +Motivations get introduced in an order that humans can understand. +So ideas have to be presented in a particular order for humans and +a different order for machines. We need a way to reorganize the +program into pieces. The trick is getchunk. +

+We introduce a special tag called getchunk. It only occurs +within a pre tag block. The getchunk tag has an +id field which names another chunk. When we find a +getchunk tag, we read the id name and replace the +getchunk tag with the named chunk. +

+This getchunk tag allows us to "clip out" a piece of a +program from the middle of a subroutine, put it into its own chunk, +and wrap some text around it. +

+This is useful if you have some tricky loop in the middle of a large +subroutine that you need to explain. You can clip the loop out, put it +in a pre tag of its own and replace it with a getchunk. +

+

1.4.2 All of the literate programming tools

+

+So we have a single program, called tangle. It works with three +HTML tags, the <pre> tag, the </pre> + and the <getchunk> tag. They +look like: +

+   <pre id="main.c">
+   </pre>
+   <getchunk id="main.c">
+
+

+That's all there is. +

+Now you can embed your subroutines in an HTML file, surround it by +pre tags, replace nasty sections with getchunk tags +so you can explain them separately, and you can extract the whole +set of subroutines using the tangle program. +

+The big loop of program development is now: +

+ do forever {
+    edit the html file
+    tangle thehtmlfile sub1.c >sub1.c
+    tangle thehtmlfile sub2.c >sub2.c
+    tangle thehtmlfile main.c >main.c
+    gcc -o myfile sub1.c sub2.c main.c
+    ./myfile
+ }
+
+

+When you are finished for the day, the week, or the project your +whole program is properly explained as well as properly maintained. +

+If you did it right you can just point someone at the HTML file, +let them go away for a while, and when they return they understand +enough about your program to take over the whole project. +

+But wait, you say... you skipped over the hard part. +

+Indeed, I did. The hard part is that you need to communicate your +ideas to another human. And you have to do it in a way that reduces +your idea to practice. In fact, you need to clearly reduce your +ideas to practice. +

+

2.0 The Tangle Program

+

2.1 Understanding the story of the program

+

+We will try to illustrate literate programming using the tangle +program as an example. However, if you want to see a real +literate program I strongly recommend the book: +

+  Lisp in Small Pieces by Christian Queinnec (ISBN 0-521-54566-8)
+
+

+We would like to write programs embedded in documents which surround +the program text with explanations. Since the compiler cannot +understand the markup language we need a program, which we will call +tangle, that wil extract code from a document in a given markup +language. +

+The tangle program is specific to the markup language used, whether it is +latex, HTML, or some other language. For this example we are assuming +that the markup is HTML. +

+The markup language has to have some sort of verbatim quoting +mechanism which allows us to put code inline. It also has to have a way +to name the quoted pieces of code. We call the quoted pieces of code a +chunk. We call the label the chunkname. +

+HTML uses the pre tag as its quoting mechanism. Items between +the opening pre tag and the closing /pre tag are +unchanged by HTML processors. The pre tag has an id +parameter. We will use the id parameter as a chunk name. +

+The tangle program takes two arguments, the name of the document and +the name of the chunk. It must print the chunk on standard output. +

+

2.2 The main task

+

+The main thing we have to do is find a chunk somewhere in our book. +We specified the book and the chunk name on the command line so we +ought to check that we have the right inputs. +

+There can only be two inputs and they are both requires so first +we check that condition. If there are not two then the user might not +understand what was required so we print the usual usage message. +

+We have huge amounts of memory so we will simply load the file in a +single chunk using the read function. This +requires that we can open the file, which we check. It requires a +handle to the file, which we get from the open call. It requires the +size of the file so it can allocate memory, so we use fstat. +

+Now that we have all of the read parameters we can load the +file into memory. The buffer variable is described as a pointer to +a character which is C-speak for a string. +

+We also need to know the size of the buffer at all times so we +allocate the bufsize variable at global scope. +

+
+
+/* a memory mapped buffer copy of the file */
+char *buffer;
+int bufsize;
+
+
+

+Since the buffer is being used everywhere we make it global by +defining it at the top of the program, outside the scope of the +functions. +

+
+/* memory map the input file into the global buffer and get the chunk */
+int main(int argc, char *argv[]) {
+  int fd;
+  struct stat filestat;
+  if ((argc < 2) || (argc > 3)) { 
+    perror("Usage: tangle filename chunkname");
+    exit(-1);
+  }
+  fd = open(argv[1], O_RDONLY);
+  if (fd == -1) {
+    perror("Error opening file for reading");
+    exit(-2);
+  }
+  if (fstat(fd,&filestat) < 0) {
+    perror("Error getting input file size");
+    exit(-3);
+  }
+  bufsize = (int)filestat.st_size;
+  buffer = (char *)malloc(bufsize);
+  read(fd,buffer,bufsize);
+  fixHTMLcode(); 
+  getchunk(argv[2]);
+  close(fd);
+  return(0);
+}
+
+
+

+If the read succeeds we move on to the problem of finding +the named chunk in the book. +

+

2.3 A complication with HTML

+

+HTML is not clever about ignoring HTML symbols in <pre> +blocks of code. This causes us three complications which do not normally +occur with a tangle program. +

+The first complication can be seen by viewing the source for this page. +We need to replace the angle brackets with the HTML escape codes +everywhere, including in the verbatim sections. So we have to be careful +using certain language constructs. Reasonable markup languages such +as latex do not have this problem. +

+The second complication is that the string search for the pre, +/pre and getchunk tags need to use the escape code +sequence for matching. +

+The third complication is that the tangle program has to reverse +this translation when printing the program. +

+There is no such thing as a simple job. +

+The trick here will be to walk the buffer and replace every use of +the HTML code with the corresponding character, appropriately adjusting +the space. Since HTML codes are always longer than the character they +replace we can just compress the buffer a bit. And since the buffer is +never used for anything else we can even replace the HTML codes in parts +of the buffer where it might be inappropriate. +

+The point variable is where we are looking in the buffer. +The mark variable is where we are writing in the buffer. +Normally we just walk them in parallel. They diverge when they hit an +HTML code. The point is copied to the mark. In the +absence of any codes this is a copy of the buffer to itself. +

+When we hit a code we place the corresponding character at the mark, +bump the point over the HTML code, and continue the copy. +

+When we hit the end of the buffer we rewrite the bufsize variable +to reflect the new, shorter buffer size. +

+There is another really subtle point here. Note that we have to do the +comparison in two pieces otherwise the combined string is a valid HTML +code and we end up destroying our own search string. If we break it +into two parts then the substring "lt;" is not a valid HTML code and +will not be replaced. This is the kind of information that +literate programming preserves. A clever programmer would combine the +two tests into the single strncmp and everything will fail. +

+void fixHTMLcode() {
+  int point = 0;
+  int mark = 0;
+  int i=0;
+  for(point = 0; point < bufsize;) {
+    if ((buffer[point] == '&') &&
+        (strncmp(&buffer[point+1],"lt;",3) == 0)) {
+      buffer[mark++] = 60;
+      point = point + 4;  
+    } else
+    if ((buffer[point] == '&') &&
+        (strncmp(&buffer[point+1],"gt;",3) == 0)) {
+      buffer[mark++] = 62;
+      point = point + 4;  
+    } else
+      buffer[mark++] = buffer[point++]; 
+  }
+  bufsize = mark;
+}
+
+
+

2.4 Finding the chunk

+

+We are given the chunkname as a string and we have to search the +file for the
<pre id="name"> tag. +

+Since the string came from the command line we know that it ends +with a null character so we can just call strlen to get +the length. +

+By design, we require the <pre> tag to start in the first +character of the line. We could remove this restriction if we +felt like being clever but code is usually left-aligned in the +output so left-aligning the tag is reasonable. +

+Given this design decision we can walk the book one line at a time. +We delegate the task of finding the line length linelen to +the nextline function. The "magic number" 11 exists because +the smallest pre tag contains 11 characters: +

+<pre id="">
+
+

+If the line is long enough to contain the <pre> tag and its +associated id tag and we find the named chunk then we +print it. +

+Another design decision is that we will allow multiple chunks with +the same name and we will print them in order. This allows us to +break a big block into smaller blocks and insert text in the middle. +So you can write +

+
+    <pre id="somename">
+       the first part of a function
+    </pre>
+    some explanation
+    <pre id="somename">
+       the next part of the function
+    </pre>
+    more explanation
+
+When you run tangle filename somename you will see the output: +
+  the first part of a function
+  the next part of the function
+
+

+This is trivial to implement. All we have to do is keep searching the +document and printing the same named chunk. +

+
+
+/* find the named chunk and call printchunk on it */
+int getchunk(char *chunkname) {
+  int i;
+  int j;
+  int linelen;
+  int chunklen = strlen(chunkname);
+  for (i=0; ((linelen=nextline(i)) != -1); ) {
+    if ((linelen >= chunklen+11) && (foundchunk(i,chunkname) == 1)) {
+      i=printchunk(i,linelen,chunkname);
+    } else {
+      i=i+linelen+1;
+    }
+  }
+  return(i);
+}
+
+
+

2.5 Finding the next line in the buffer

+

+The nextline function just walks the buffer starting at any +character position and returns the character position after the next +newline. We only have to check that we don't run off the end of the +buffer, otherwise we just keep a count of the character position. +

+
+/* return the length of the next line */
+int nextline(int i) {
+  int j;
+  if (i >= bufsize) return(-1);
+  for (j=0; ((i+j < bufsize) && (buffer[i+j] != '\n')); j++);
+  return(j);
+}
+
+
+ +

2.6 At last, finding the chunk

+

+We are positioned at the beginning of a line so we are looking at +something that looks like: +

+<pre id="somechunkname">
+
+

+We need to check the format of the line. The first 9 characters must be: +

+
+<pre id="
+
+

+The next substring must be the chunkname we seek and the rest of +line must be the +

+">
+
+

+So if we find a properly formed line with the chunk name then we +return a 1 indicating success, otherwise we return a 0 indicating failure. +

+In a fit of hasty generality we allocate the chunkbegin +information at the global scope. In theory we would like to be able to +just change the patterns for different markup languages. +

+char *chunkbegin = "<pre id=\"";     int chunkbeginlen = 9;
+
+
+/* handle <pre id="chunkname">            */
+/* is this chunk name we are looking for? */
+int foundchunk(int i, char *chunkname) {
+  if ((strncmp(&buffer[i],chunkbegin,chunkbeginlen) == 0) &&
+      (strncmp(&buffer[i+9],chunkname,strlen(chunkname)) == 0) &&
+      (buffer[i+chunkbeginlen+strlen(chunkname)] == '"') &&
+      (buffer[i+chunkbeginlen+strlen(chunkname)+1] == '>')) return(1);
+  return(0);
+}
+
+
+ +

2.7 Printing the chunk we found

+

+So we found our chunk and all we have to do is print it. +We just print each line until we find the </pre> tag, +a task we delegate to the foundEnd function. +

+Of course, there is no such thing as a simple job and, indeed, +there is a complication here. We made a design decision +that we will allow a getchunk tag to have special meaning +within the pre tag. +

+We delegate getting the id from the getchunk tag +to the getChunkname function which we will examine next. +

+Recall that getchunk will search the document for the +pre tag named by the getchunk's id label. +This means that we need to +

+  1) stop printing the current chunk
+  2) find the chunk named by getchunk and print it
+  3) continue printing the original chunk where we left off
+
+

+We already have a function to do step 2, the +getchunk function mentioned above. +We can just call that function which handles both finding and +printing a chunk. +

+The magic number 17 occurs because we need to account for all of +the characters in the line which are not part of the id +string in the getchunk. So an empty string, including +the trailing newline is: +
+<getchunk id="">\n   a total of 17 characters
+
+
+/* print lines in this chunk, possibly recursing into getchunk */
+int printchunk(int i, int chunklinelen, char *chunkname) {
+  int j;
+  int k;
+  int linelen;
+  char *getname;
+  int getlen = 0;
+  for (k=i+chunklinelen+1; ((linelen=nextline(k)) != -1); ) {
+    if ((getlen=foundGetchunk(k,linelen)) > 0) {
+       getname = getChunkname(k,getlen);
+       getchunk(getname);
+       free(getname);
+       k=k+getlen+17;
+    } else {
+    if ((linelen >= chunkendlen) && (foundEnd(k) == 1)) {
+      return(k+chunkbeginlen);
+    } else {
+      printline(k,linelen);
+      k=k+linelen+1;
+    }
+  }}
+  return(k);
+}
+
+
+ +

2.8 The simple case, print the line

+In the simple case we simply output the line. +
+/* output the line we need */
+int printline(int i, int length) {
+  int j; 
+  for (j=0; j<length; j++) { putchar(buffer[i+j]); }
+  printf("\n");
+}
+
+
+

2.9 Checking chunks for getchunk tags

+

+Because we treat the getchunk tag as a special case when we +are printing we need a predicate to look for it. We are positioned at +the beginning of a line when we are called so we only need to check +for the string we call chunkget, that is: +

+char *chunkget = "<getchunk id=\"";  int chunkgetlen = 14;
+
+ +This predicate returns the length of the chunkname or 0 if it is not +a getchunk tag. +
+/* handle  */
+/* is this line a getchunk?          */
+int foundGetchunk(int i, int linelen) {
+  int len;
+  if (strncmp(&buffer[i],chunkget,chunkgetlen) == 0) {
+   for(len=1; ((len < linelen) && (buffer[i+chunkgetlen+len] != '\"')); len++);
+   return(len);
+  }
+  return(0);
+}
+
+
+ +

2.10 Getting a new chunk name from the getchunk tag

+

+We have encountered a <getChunkName> tag. We need to strip out +the id string. We malloc a small piece of memory to contain +the name. Note that this is a potential memory leak since we are +letting allocated memory leave this function. The caller must remember +to free it. If we check the caller, the printchunk function +we see that the buffer is freed and memory does not leak. +

+
+/* Somebody did a getchunk and we need a copy of the name */
+/* malloc string storage for a copy of the getchunk name  */
+char *getChunkname(int k, int getlen) {
+  char *result = (char *)malloc(getlen+1);
+  strncpy(result,&buffer[k+chunkgetlen],getlen);
+  result[getlen]='\0';
+  return(result);
+}
+
+
+ +

2.11 Finding the end of the chunk we are printing

+

+The printchunk routine delegated the task of finding the +</pre> tag to foundEnd. If we find the +tag we return 1 otherwise we return 0. +

+The chunkend and chunkendlen variables are allocated +in the global scope. +

+char *chunkend = "</pre>";           int chunkendlen = 6;
+
+ +
+/* handle </pre>   */
+/* is it really an end? */
+int foundEnd(int i) {
+  if (strncmp(&buffer[i],chunkend,chunkendlen) == 0) {
+    return(1); 
+  }
+  return(0);
+}
+
+
+ +

2.12 Final cleanup and housekeeping. The preamble

+ +C compilers aren't smart enough to look up symbols everyone uses so we +have to tell it where to find them. We also need to forward reference +the getchunk function because it is used before it exists. +Compilers in 1970 couldn't handle such complexity. It is terribly +hard, you know. + +
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/mman.h>
+#include <sys/types.h>
+#include <fcntl.h>
+
+/* forward reference for the C compiler */
+int getchunk(char *chunkname);
+
+
+ +

3.0 The tangle source code

+

+The big chunk of code at the beginning of this book is a working +tangle program. You can clip it out of a web page, save it as +tangle.c, type +

+   gcc -o tangle tangle.c
+
+and you now have a working tangle program for HTML files. If you want +to improve it just modify this web page and run tangle again to extract +your changes. +

+Alternatively you can get the tangle source code from +here +

+When we want the whole program we output this chunk with the command: +

+
+    tangle litprog.html tangle.c >tangle.c
+
+and it recreates the tangle.c function. + +
+<getchunk id="include">
+<getchunk id="chunkbegin">
+<getchunk id="chunkend">
+<getchunk id="chunkget">
+<getchunk id="buffer">
+<getchunk id="nextline">
+<getchunk id="printline">
+<getchunk id="foundchunk">
+<getchunk id="foundEnd">
+<getchunk id="foundGetchunk">
+<getchunk id="getChunkname">
+<getchunk id="printchunk">
+<getchunk id="getchunk">
+<getchunk id="fixHTMLcode">
+<getchunk id="main">
+
+ +

4.0 What have we learned?

+

+We have walked through a fairly simple program in great detail. You +now understand why HTML causes problems, you understand what the magic +numbers in the program mean, and you understand the design decisions +such as always have pre tags in the first column. +

+We have learned that we only need a simple 160 line tangle program to +handle all of the literate programming tasks. You can make this more +complex if you like. You could create a tangle plugin for Eclipse. You +could add automatic indexing. Many people have done similar things +with their literate tools. +

+Ultimately, though, the important point is that you need to +write programs to communicate with other people. We do not do that +now. The result is that there are tens of thousands of programs on +Sourceforge and Savannah and Github that are never going to be used +because nobody understands them. +

+We can do better. We can make programs that live. We can communicate +our ideas in ways that others can understand without talking to us. +

+We can be better programmers. +

+Tim Daly, November 18, 2011 +

+

+ + diff --git a/src/axiom-website/patches.html b/src/axiom-website/patches.html index fef254b..91bde71 100644 --- a/src/axiom-website/patches.html +++ b/src/axiom-website/patches.html @@ -3686,5 +3686,7 @@ books/bookvol9 treeshake compiler
src/interp/i-spec2.lisp fix AN has sqrt: % -> %
20111117.01.tpd.patch books/bookvol9 treeshake compiler
+20111118.01.tpd.patch +src/axiom-website/litprog.html added