Web Generator

[email protected]

1 Description

This document describes my ideas for an ultimate website generator. I haven’t decided on a name yet.

2 Goals

The whole goal of this system is to make creating correctly-formatted websites and pages as painless as possible, applying the same transformation techniques used by software developers to generate complex pieces of software.

2.1 Automated Features

Computers are all about automation. If you wanted to do things manually, you wouldn’t need a computer.

automatically check for dead links (see linkchecker and other link checking software)
automatically verify correctness of HTML content (see w3c quality assurance tools) or possibly even fix it (see tidy)
potentially create a changelog of some kind - possibly a blog - which indicates what parts of the website(s) have been updated
potentially create index.html files that link to all of the things in the source tree; creating a new project shouldn’t require manually editing HTML indices

2.1.1 Program Support

export things (code, possibly HTML) from VC (CVS, svn, hg, git, etc.)
make code tarballs, debian archives, RPMs, etc.
GPG-sign code and possibly even webpages (see gpg-agent)
create convenient links for README and INSTALL files (in txt and HTML formats, maybe others)
extract a DESCRIPTION file and use it as the text for the link to the project on some kind of index page
optionally create mailing lists or VC repositories for the projects

2.2 Source Formats

Support static blog generators
Convert from LyX to HTML and PDF (or whatever suits your fancy). This is a little more difficult than it seems, since the output of lyx->HTML generation can create multiple files, making tools such as “make” difficult to use for this purpose.
Support other popular markup formats that make life easier than writing raw HTML (e.g. Plain Old Documentation)

2.3 Output Formats

It should be capable of generating any kind of file format, really.

It should be capable of creating structured web hierarchies such as Debian/Ubuntu repositories.

It should not put “junk” files (temporary files, etc.) in the staging area; everything it puts there is something that you want to publish. Littering the staging area with things you don’t want published is a privacy hazard, and attempting to filter it (such as rsync’s “-C” cvs-exclude option) simply makes the sync tool more complicated, limits your choice of synchronization tools, and prohibits you from generating the files on your web server itself.

It should allow for arbitrary post-processing of the output files; this is an idiosyncracy of mine but can be useful if, for example, you want to change certain pieces of data such as embedded email addresses and URLs without going back and altering all the input files.

2.3.1 Hierarchies

It should be capable of generating web pages, directory hierachies, web sites, or hierarchies of web sites; in developing something, it is often the case that it starts out as a note or list item, becomes a web page, becomes multiple web pages, becomes a group of related things (often in a directory hierarchy), becomes a homepage, becomes a web site, becomes a hierarchy of virtual domains, etc.

You will note when editing a web page that you will often want to bring a section “up” in the hierarchy, group it with related items (bringing it “down”), or move it around; with HTML, this can be annoying since the heading tags are numbered, and so you must tediously change all the heading tags. Similarly, when editing text documents or source code with nested indentation, you often want to change the level of indentation. A sensible system will make such a common change easy; text editors such as vi and EMACS have commands for changing indentation levels. This web generation system should make restructuring the data in this manner as simple as possible.

In many cases, Makefiles are a good counter-example; typically, a Makefile will list subdirectories into which it should recurse. If you move a subdirectory to another location, you have to (remember to) edit two Makefiles, or else it doesn’t get built. We should avoid that kind of design.

2.4 Transfer Tools

It should emit everything into a staging area where it can be automatically transferred to the server using tools such as rsync or unison.

I have a couple of problems with rsync:

If you have two hierarchies that you want synchronized to different servers, you need to run two seperate rsync commands. This doesn’t scale well - as you add different destinations, you have to go back and modify the script which performs the transfers and add additional rsync commands.
If you want to transfer to a web server and have the files owned by user “htdocs”, then you have two choices, neither of which is fully satisfactory:
- make the remote user htdocs, which may not be possible if htdocs doesn’t have a valid login shell, or the home directory is the webroot, in which case it’s hard to put a .ssh directory there without exposing it
- make the remote user root, in which case rsync attempts to make the remote owner the same as the local user, instead of htdocs

2.5 Free/Open Source

The tool should be free/open-source software, although it should be flexible enough to support any command-line tool, the way “make” can run any Unix program.

It should run on open-source operating systems, especially (but not exclusively) Linux.

2.6 Re-use

This is a huge undertaking, and therefore it should be capable of using existing tools which do part of the job.

It should leverage tools which are already familiar to software developers, such as make, shell scripts, and scripting languages such as python or ruby.

2.7 Flexibility

Since different users will have different desires in terms of input formats and tools, it should be an eminently modular system, that can be customized, extended, and used by a wide variety of authors. As such, it should be easy to modify - no more difficult than a makefile or script.

The one constant is change - particularly in software, and particularly in web technologies. Therefore, it should be designed for the future, to adapt to new formats and transformation programs. There is no way to know exactly what the needs of users will be in, say, five years. There is a reason why nobody writes in Wordstar format any more, but makefiles and shell scripts are still around.

2.8 Self-Promotion

The marketing term for this might be “branding”, but that word makes me gag. Most people decide what programs to use based on what other people are using, especially people whom they respect and admire, or who are doing similar work. By (optionally) emitting a small, tactful link to this program, people reading web pages/sites generated by this system will become aware of it, saving them from re-inventing the wheel, or using inferior systems for authoring and generation. You’ll note that the tool I use for creating this page does a similar thing in the footer, and I think it’s great.

3 Anti-Goals

Do not be oriented towards GUIs... it is a website generator for software developers, not graphic designers.

It should work as automatically as possible, perhaps from a cron job, although some things (like PGP/GPG signing and rsync) may require minimal interaction for inputting passphrases - in that case, it should be a “fire and forget” system.

It should be designed to emit static content: HTML and other files designed to be stored in a file system, not a database. Writing secure server-side software is hard (really really hard), and I am a security weenie.

It should not be a huge, monolithic, hard-to-modify, “closed” system.

It should not involve any kind of domain-specific language or syntax unless the gains in efficiency outweigh the learning curve required.

It should not require writing excessive amounts of code; for example, it should not require mostly-similar Makefiles for every document you want to create. Efficiency is a primary goal, and doing a lot of repetitive work is stupid.

This software shouldn’t assume that you do everything its way. If you want to write raw HTML for part of your site, and the rest of it in something else, fine. People tend to start writing in one source format or with one system and migrate to others over time. This system shouldn’t make you convert all your previous stuff to its way of doing things, and it shouldn’t prevent you from easily migrating to another system in the future. In other words, it shouldn’t be a dick.