ExcelPOD

This is the system I talked about at LPW.

Slides (Open Office Impress - ODP) 73K

External: video of the talk (if & when available)

Every download below comes with a manual.

Source only (14K)
Compiled only (2,191K)
Source + compiled (2,194K)
Source + example (44K)
Compiled + example (2,221K)
Everything (2,223K)



The compiled version comes in a tree. The lowest level is called "build", which has two children, "bin" and "lib". ExcelPOD.exe lives in "bin". The directory names are created by Cava, and I have not tried changing them.

And here's the manual!



Introduction

Purpose

ExcelPOD is a Perl programme converted to .exe using Cava. It reads through Excel files and generates POD and HTML files from comments in the code.

System Requirements

Excel
A standards compliant browser to read the resulting documentation

Copyright

All aspects of ExcelPOD, including but not restricted to layout, formats, formulae, code and this manual, are my copyright. It may not be distributed without this manual. The icon I have used is described as "Author unknown - free for non-commercial use" and was downloaded from http://icone.goldenweb.it/index_file/l/en/d2/miscellanea/c/misc1/default.html. Using ExcelPOD is free, and any documentation you produce using it is of course your copyright. Apart from this, I don't care what people do with it for fun or internal use, but if anyone is making money from my code or any development they may have made from it, I want my cut. This includes, but is not restricted to, putting it on a disc for which "only a distribution charge" (or similar weasel words) is charged, or which is free if you buy something else, like a magazine. If you put it on a web site for free download (provided always the manual is is with it), that's fine, but if you charge for access, I want my cut, even if you don't charge specifically for ExcelPOD. If in doubt, contact me first. Don't assume that I will accept your payment scales. Don't assume that you can email me - I have aggressive spam blocking in place. If contacting me isn't worth a stamp to you, it's not worth bandwidth and disc space to me. If you don't like any of this, don't use it. I can be contacted at:

 Dr. John Davies
 51 Elephant Lane
 London SE16 4JD

Version History

Changes from version 1.0.2

Fixed a known issue (see the slides) with functions like LOG10 that look like cell addresses.

Changes from version 1.0.1

Fixed a bug in handling local range names used in non-local context - used by SAP.
Fixed a bug in handling continuation lines in multiple files.

Changes from version 1.0.0

Fixed a bug that could cause an infinite loop with certain named ranges.


Usage

Prepare an Excel file with the necessary documentation

ExcelPOD documentation is built into the spreadsheet being documented. It is written as comments in the code modules of the spreadsheet. Every spreadsheet, whether including macros or not, has at least two code modules. Users familiar with writing macros should skip the next two paragraphs.

To open Excel's Interactive Development Environment (IDE), open Excel and press {Alt F11}. Don't be intimidated by the screen that appears. The first thing to do is to look for a section of the screen officially called the "Project Explorer". On my system, it looks like this:

If you can't see it, bring it up with {Ctrl R}. Then click on whichever object (sheet, module etc) you wish to document (expanding the tree by clicking on + signs if necessary), press {F7} and the contents of the relevant "code module" will appear. If this is a module that has no code, it may appear blank. Alternatively, it may appear with "Option Explicit" as the top line. In this case, thank the person who set your machine up for saving you from evil. Don't worry if there is code in the module. It is possible to add documentation with no risk - but it might be better to get the writer to do the documentation of the code himself.

Officially, there are two designators of comments in VBA. They are Rem and ' (a single quote). However, the use of Rem is extremely rare. It is this that ExcelPOD subverts to allow the entry of POD. Any line of any module that begins with Rem is treated by ExcelPOD as containing POD. To force Rem to retain its "comment" meaning, see Pod and Cut.

Official POD Directives

Note: this is only the briefest of introductions to POD. The official documentation (co-written by Tim Toady himself) is at http://search.cpan.org/~jesse/perl/pod/perlpod.pod. POD is intended to be convertible to a number of formats. However, the most useful for documenting accounting spreadsheets (and I am an accountant, writing this for my own purposes) is HTML. Therefore, HTML output has been assumed throughout. Intermediate files are created containing pure POD, and these are left on the machine in case the user wants to process them differently. All official POD directives begin with an = sign, are case sensitive (the extensions are not) and should be in a paragraph by themselves (the extensions need not, but must start a line). Paragraphs have blank POD lines above and below, i.e. lines with "Rem" and nothing else in them.

Headers

Four levels of headers are allowed, =head1, =head2, =head3 and =head4. They are not allowed in HTML blocks or over ... back regions (see Indents and Lists). The text that follows them is put into the index that appears at the start of the HTML document as a hyperlink. It is therefore important that they should be unambiguous. If, for example, there are two forms, each is likely to have a UserForm_Initialize event handler. Each header should therefore have the sheet name in front, so that the hyperlinks go to the correct explanation. This should be done using the ExcelPOD extension SheetName described below. But please read Continuation lines, as you may need to use them for this purpose.

Indents and Lists

=over indents the following paragraph. It may be followed by a number indicating how many spaces are to be used. The default is four. =back undoes this. These commands can be nested.

After at least one =over, a list may be created. There are three formats, and the first used will apply to the whole list until a =back is encountered. The three formats are:

  1. =item 1 text - your number followed by your text
=item text - the text preceded by neither number nor bullet

Formats

POD contains a number of formats. The most useful are:

 Beginning a paragraph with a space or tab causes it to be rendered exactly as it is. This
 is called a verbatim paragraph. No directives will be processed. Lines will break
 wherever they are broken in the
 original text. No matter how long a line is, it will not be wrapped automatically. This means users may have to use the slider at the bottom of the browser to read it all. As a result, your document may be hard to read if you are not careful.
 Such lines won't print properly, either. Traditionally, all lines in such a paragraph
 begin with a space. If this is done to extensions, they too will be ignored. By omitting
 the space, an extension can be embedded within a verbatim paragraph. Since extensions
 must start a line, this may be inconvenient for the formatting of the paragraph.

=begin html ... =end html creates an HTML block in which text is treated as HTML and hyperlinks, embedded graphics and other tools are permitted, but headings and lists are prohibited. Extensions are permitted.

C<text> is officially described as "code". It is a verbatim block (like this) within a standard paragraph.

Z<> is a null formatting string. This is useful if you want to use something that would otherwise be treated as a format or other directive. It confuses the parser into treating what would otherwise be a directive as normal text. It has been used extensively in the spreadsheet used to generate this documentation. Every time a line starts with any of the POD directives or extensions, this string has been needed to make sure that you can read it rather than having the directive or extension executed.

Many other formats are available. To find out about them, RTFM at http://search.cpan.org/~jesse/perl/pod/perlpod.pod

Pod and Cut

Those familiar with POD and those who have read the official documentation carefully may be tempted to insert =pod and =cut commands. While these will do no harm, they are not necessary. The intermediate POD files created by the system never contain anything that is not POD and an initial =pod directive is inserted automatically. The only possible purpose is to delimit Rem statements that are actually intended to be comments rather than POD.

ExcelPOD Extensions

These extensions are what make POD useful for spreadsheets. Without them, it would be easier to write the HTML directly, or simply use MessWord. They must, without fail, come at the start of a line, immediately after the "Rem ". They are not preceded by = signs. They are not case sensitive. They must be separated by one or more spaces from their parameters or any text that follows.

Title

Syntax: Title <text>

This is passed to the POD processor as the title of the HTML document. If it is omitted, the file name is used. There should be as many Title commands as there are documents to be produced. Additional ones will be ignored. All data on the same line will be treated as part of the title.

Order

Syntax: Order <number>

There should be no more than one of these commands per code module. Any additional ones will be ignored. They specify the order in which code modules will appear in the HTML file. If omitted, the modules will be processed in Excel's internal order, which may well differ from the order in which they appear in the project explorer. This command is generous if you are not silly. It will accept negative and fractional numbers but not text, and very complicated numbers may confuse it. No other data should appear on this line.

NameTable

Syntax: NameTable <on|off>

This will be ignored if it restates the current condition. It controls how name commands are processed. When an "on" command is processed, all named ranges are flagged as undocumented. As each is documented, this is flagged. When an "Off" command is processed, all named ranges still undocumented are added to the table with the description <Undocumented>.

Name

Syntax: Name <named range> <description>

The description is optional. If outside a name table, it will return the address of the named range (strictly, its "RefersTo" property) followed by the description, if any. Within a name table, it will precede this with the name itself. If an undefined name is processed, <Undefined> will be substituted for the address.

Docs

Syntax: Docs <number>

All other data on the line will be ignored. This command will be ignored if it is not processed before any commands outputting data to a POD file are processed. Only the first such command will be processed. If not present, only one document will be produced. The command tells the system how many different versions there will be of the documentation produced.

Doc

Syntax: Doc n1 [n2 [n3 ...]]

This states which documents should receive the following documentation. Document numbers greater than that specified in Docs will be ignored. The command will be ignored if no valid Docs command has been processed. It will remain in force until the next Doc statement.

Doc and Docs in Action

These commands are typically used when, for example, technical and user documentation is needed. Much of this documentation will be the same, but there will be differences. The first sheet to be processed would, in this example, begin with the following statements:

 Docs 2
 Doc 1 2
 Title User Documentation for <system>
 Title Technical Documentation for <system>

The introduction, which would normally be the same, would follow. Then, when the documentation became specific to the users, the command "Doc 1" would restrict the following documentation to the user file. Later, "Doc 2" would switch it over. There are no restrictions on how many Doc statements may appear in a file.

It is possible to write something like "Doc 1 1". This would have the effect of putting two copies of the following text into document 1. Note that having two "Doc 1" commands on separate lines would have no more effect than a single command.

SheetName

Syntax: SheetName [Description]

The description is optional. The command returns the name of the current sheet, form or module followed by the description, if any.

Sto

Syntax: Sto <0|Reference>

Sto 0 ends the process of storing. "Sto" followed by anything else records the text up to the next "Sto" in the reference, appending it if the reference already exists. Text being stored will not be sent to any documents automatically; that must be done explicitly by means of a Rcl command. Everything within a sto reference will appear as a single paragraph - blank lines cannot be stored except as single spaces. Directives can be stored, and will retain their usual meanings. Extensions vary, as follows:

Title: Cannot be stored (will be treated as text).
Order: Cannot be stored (will be treated as text).
NameTable: Cannot be stored (will be treated as text).
Name: Can be stored.
Docs: Cannot be stored (will be treated as text).
Doc: Cannot be stored (will be treated as text).
Sheetname: Can be stored, but will return the name of the sheet in which the "Sto" command was issued.
Sto: Ends storage. May start a new storage process if the parameter is not 0.
Rcl: Cannot be stored (will be treated as text).
Deconstruct: Can be stored.

Rcl

Syntax: Rcl <Reference>

Sends to the current documents whatever text has been stored as <Reference>. This must have been stored already. If multiple objects recall the same reference and the order is changed, the user must ensure that the lowest numbered object to recall a reference has the storage to that reference before the recall. It is the processing order that matters, not the apparent order in the project explorer.

Sto and Rcl in Action

 Sto Last
 This appears last
 Sto 0
 This appears first
 Rcl Last

would produce

 This appears first
 This appears last

Another example:

 Sto Last
 This appears last
 Sto 0
 This appears first
 Rcl Middle
 Rcl Last
 Sto Middle
 This should be in the middle, but won't work
 Sto 0

will cause problems, because Rcl Middle appears before anything has been stored in that reference.

But this will work:

 Sto Last
 This appears last
 Sto Middle
 This appears in the middle
 Sto 0
 This appears first
 Rcl Middle
 Rcl Last

Sto will end the current storage operation before starting a new one if it is called when already storing.

Deconstruct

Syntax Deconstruct <Cell> <Range No> [Description]

The description is optional. The cell may be specified either as a reference in A1 format (R1C1 is not permitted for this purpose) or as a range name. Named ranges are advised, as they will move if rows and/or columns are inserted or deleted. The address of the named range can still be included in the documentation by means of the Name command.

A range number of zero will return the entire formula from the specified cell. A number greater than the number of ranges in a formula will return a description of the problem saying how many ranges actually exist. Any number from 1 to the maximum will return that range.

A range will be included if it is a named range in the current file or if it is an Excel reference in A1 format (including multiple cells like A1:B2). If a range appears several times in a formula, it will be returned by all its numbers. A formula like SUM(A1:B2,C3) will return two ranges.

There is no feature for returning references to external files, as these can be changed when the file being documented is closed, meaning that any links will not be maintained. References to ranges in external files will be included in the deconstructed ranges if they are in A1 format or if they have the same name as a range in the deconstructed file (but any attempt to get the address would be likely to return a misleading address). Otherwise, named ranges in external files cannot be identified, and should be documented manually.

A cell can be deconstructed only in the code module of the sheet where it is found. There are two main reasons for this. The first is that Excel's handling of names is very strange, with names being an application object but ranges (which contain the formula) being a sheet object. The code for linking the two is more complicated than I want to consider at this point. The second is that different cells can have the same name, provided that they are on different sheets. Therefore, unless the sheet is specified by some means other than the cell name, it is impossible to know which version of the name (or even the direct reference - consider "A1") is meant. If you want to put deconstructions in another module, you can use the sto and rcl commands. Attempting to deconstruct a cell from a different code module will produce the error "Can't call method "Range" on an undefined value at <path>\ExcelPOD.pl line 196". and an orphaned instance of Excel. This can also be caused by having the word "deconstruct" at the start of a row where it is intended to be used verbatim, not as an extension. To get around this, put Z<> at the start of the line. To find out where the problem occurred, look at the .pod file that is generated. Your problem is immediately after the last text in that file.

Continuation lines

When using =head commands, POD will work normally if the header text is all on the same line or all on the line immediately following. However, it has a problem if the heading text is split over two lines. If a tilde (~) is added to the end of a line, the following line will be appended to it to make a single line. This does not apply when NameTable is on or within HTML blocks.

Process the file

From the command line

If not included in your computer's path, change directory to the one containing ExcelPOD.exe. Type in excelpod followed by the fully qualified file name. "Fully qualified" means including the drive and directory where the file is to be found. If Excel can work out the extension for itself, it will, but it is safer to enter it explicitly. If your computer is set up with the Bill Gates default that hides extensions, that is no excuse. Ignorance isn't strength, Winston, it's just ignorance.

An instance of Excel will be opened briefly. If it stays open for more than a few seconds, you have found a bug and I would like to know more. There will usually be a description in the command shell. It is the top of this that is more likely to be helpful than the bottom, but it's quite unlikely to happen. Assuming that all goes well, the POD file(s) will be created, the Excel instance will be closed and the HTML will be generated using the "Verbose" option. This should ensure that you are shown any problems that the system has had interpreting your POD. If you are producing multiple files from a single Excel file, you will get generation messages for each file produced.

The process leaves behind some files in the same directory as ExcelPOD.exe. This is the intended behaviour, although from what I read, there are plans to change it so that the temporary files are deleted after the HTML files are generated. Files containing pure POD are also left behind in the same directory as the spreadsheet and the resulting HTML files. This is also intended behaviour, but this time it is my intention. If it causes more work than it saves, I may well add code to delete them in a future version. However, I find them useful for finding mistakes that have led to error messages.

On my 1.6GHz Celeron, generating this documentation takes under 3 seconds.

From an icon or short cut

You are advised to use the command line, as there is no danger of the command prompt disappearing automatically without you seeing errors. However, it is acknowledged that many people prefer clicking on icons, and this is possible with ExcelPOD. You will be prompted for a file name, which must, as before, be fully qualified. The danger is that you mis-type it and do not see the error message. Otherwise, the process will be identical to that described in the previous section.

Review it to make sure it looks sensible

Any standards compliant browser should render the documentation. Try playing around with the width of the browser window to make sure that it looks sensible at other screen resolutions.


Tips

VBA is fine for entering code, but it was never intended to be a word processor. Use a real WP and then copy the text into VBA. The WP can be used to do things like check spelling.

If you are using a WP, you may want to keep two versions, one with the Rems and one without. You should be able to write a macro to insert the Rems. Test the copy & paste operation first to be sure that the Rems appear at the start of the lines.

If you want more than one set of documentation, rename the files immediately after they are created.

This documentation has been prepared within an Excel spreadsheet and converted to HTML using ExcelPOD. The Excel file should be with the documentation. Have a look at it to see how I have done things. Everything is in the modPOD module. But don't be limited by what I have done - I really don't know POD all that well.

Although the conversion to HTML will mean that line wrapping within paragraphs is done automatically, you should try to keep each POD line to 96 characters. On my systems, this will not wrap when printed, making any printout of code containing POD easier to read. I apply this to my code, too - if the VBA needs more than 96 characters, I split the lines manually to retain control of indenting.

When deconstructing formulae, give the cell a range name. This means that if rows and/or columns are inserted or deleted, your deconstruction will always point to the correct cell. But the documentation should not give the name, as casual users will find that difficult. The documentation should give the A1 format cell address. This can be extracted from the cell name by using name followed by the name of the range.

While on the subject of named ranges, they generally have two purposes. One is to ease navigation, the other being for macros. Given this, I always start non-navigation range names with z, so that they appear at the end of the list in the name box and don't overburden or confuse casual users. They can see the name they want to navigate to at the top. By the same token, the names I create for deconstructing formulae using ExcelPOD always begin with zPOD. It means that they are all together in an alphabetically sorted name table, so that users can skim over them easily.

Always include a name table in your documentation and review it before you issue the documentation or the spreadsheet. It is very easy to forget to document a named range or to abandon one, yet leave it in the definitions. The name table will give you a single, readable source that will identify such acts and omissions.

Get a friend or colleague to proofread your documentation. It's best if it is someone who uses or will use the spreadsheet you are documenting, as then they will be able to tell you when your documentation is unclear, besides identifying nonsense that has arisen if you have used an extension keyword at the start of a line but want its natural meaning, as well as similar problems.

Test all your hyperlinks - internal and external - on a machine other than the one on which you wrote and processed the documentation.

There is a Perl option that will make the file contain "Back to Top" references automatically. However, these will appear only at =head1 boundaries. If I want to have these, I prefer:

 sto linkback
 L<Back to Index|/__index__> L<Back to Introduction|/Introduction>
 sto 0
 rcl linkback

This allows me several things. First, I can choose what to link back to. "__index__" is automatically generated, but I prefer to give the user a choice. Second, I can put rcl linkback wherever I like, not just where someone else thinks I should like. Third, I can have a series of them, linking back to =head1 commands should I choose. YMMV.