Meta/Elixir

From TDN

Image:Alert.png This article or section has been flagged for archiving.

Archive candidates contain old and irrelevant information that is of no current use.
Archive material will be preserved for reference purposes.

Reason: Abandoned project?

If you wish to contest the plan to archive this page, please use the Discussion link above.

Note: This is an internal project description. Development of this project is nearly complete; these docs will be update when Elixir enters Beta.

Contents

Introduction

Wikis are a great medium for self-organizing, community driven documentation. Change and maintenance in a well designed wiki is easy. For most articles, tutorials, howtos, and even reference documents, wikis are great.

However, there are some types of documentation crucial to TDN's success that wikis are not very good at. Specifically, we need to have automatically updated reference documentation for our source code. Basically, this is an evolution of the automated documentation produced by Doxygen.

First, let's describe the current system used for documentation. Every so often (usually for a new release of the engine), we run doxygen to produce HTML documentation with images. Someone semi-manually applies some PHP "secret sauce" for authentication, so that only licensees can read the docs. Then the new build is uploaded. This process usually ends up taking several days to complete, as several people need to interrupt their primary tasks to each do a step.

This has several problems. First, it's very hard to make iterative improvements to the docs, since it's a pain to put new versions up. Second, we have no provision for reviewing old versions of the docs, or multiple branches of a single repository, or different repositories. The ONLY docs we provide in this system are for some semi-recent version of TGE - no TSE, no T2D, nor anything for other codebases.

Optimally, we would like to provide archival versions of the documentation, documentation for every branch, and be able to have it rebuilt on, say, a nightly basis, so that the docs are always up to date with the content in the repositories. Furthermore, we need a way to annotate the docs - to allow people to make connections between the docs and relevant information in the wiki proper.

Why Is A Wiki Not A Good Choice?

In particular, there are a few issues that prevent us from using a traditional wiki to solve this problem:

  • Wikis are malleable. But we're generating and more importantly regenerating content. Which conflicts with the basic nature of a wiki - it means that user edits will get clobbered. Also, we have a hard time distinguishing content that we've created from user made content. What if we delete a class from the codebase - do we delete the corresponding topic from the wiki? How can we safely insert our documentation into existing user topics? What if a user introduces a conflict - a change that directly affects something from doxygen?
  • Limited namespace. Because everything in a wiki is pretty much in the same namespace, we have a lot of opportunity for rich interconnections. But how can we distinguish the TGE 1.2 SimObject documentation from the TGE 1.3 or 1.4 or the TSE 1.0 SimObject? What about classes that do wildly different things in different codebases? We would probably have to use some clever naming scheme.
  • Difficulty of searching/browser. Since the wiki is unaware of the interrelationships, it's a bit tricky to find, say, the EA2 version of SimObject docs in the T2D codebase. Or really, to search for a specific bit of Doxygen documentation at all. We'd like to make it so people can write applications that can search and link into TDN for their help reference, for instance. That means we need a pretty structured environment. Similarly, we can't easily have links that say, "show all instances of this symbol in the docs" or other such things.

Also, the VAST majority of the pages that Doxygen produces for an automated doc run are not richly populated by human-generated content. It products about 3,000 pages from a run on the TGE codebase. Only a few hundred have significant human content in them. If we are going to allow browsing of multiple codebases, we can easily have ten or twenty thousand pages. This is a huge amount of content to add to the wiki. We intend to have a high SNR; having "real" content outnumbered by autogenerated content by a factor of a thousand is not a good way to further that goal.

ELIXIR: Joy of Man's Desiring

The solution to this problem is a dedicated source browsing engine, which would perform these tasks:

  • Offline Generation. On a regular basis, accept the output from an XML Doxygen run on a codebase and convert it to a rich internal representation.
  • Storage. Store multiple instances of this data in a database for later retrieval. Specifically, the system should be able to handle multiple projects (TGE, TSE, T2D, RTS SK, etc.), multiple versions of each project, and it should also be able to handle separate script and C++ documentation for each project.
  • Browsing. Provide a web interface for browsing this information. It should be loosely similar to the pages Doxygen generates.
  • Authentication. Limit access based on the products that people own. If they try to access a product's docs and they're barred they should be presented with a link to the purchase page for that product.
  • Search. Users should be able to perform a variety of searches for classes, members, functions in one or more versions of one or more projects.
  • Annotation. A small wiki area on each page that is persistent across content drops allows the community to annotate with links to wiki-hosted documentation, suggest fixes or enhancements to the documentation built into the codebase, and so forth.

The following sections outline in more detail how each of these broad areas should function.

Offline Generation

This is a process which will ultimately be run nightly on our internal build system. Some sort of "cooked" results will be uploaded to the server on which ELIXIR runs, and then an update command issued to ELIXIR, indicating it needs to reimport the relevant data. During this reimport phase, the site should provide a notice to the effect that a lot of data is being loaded into it and it may run slowly for a while.

Basically, the XML content from the Doxygen run needs to be processed into a lot of rows to be placed in the database.

One issue to deal with is what happens to annotations when the corresponding element in the database goes away (for instance, we remove a class).

Storage

Storage should be in MySQL. The offline generation should take care of most of the data - the only thing that ELIXIR should be doing is updating wiki tables and maybe tracking statistics.

Caching should be done against the file system for maximum speed, and in general should be done for all Doxygen derived content so that pages may be served with the minimal load.

The database schema development represents a significant amount of the work for this project. It should reflect closely the information that Doxygen gives in its XML output. Remember, also, that the schema must be able to keep multiple projects/versions in a single set of tables, and that the system should be aware of how versions/projects are related.

Browsing

Users should be able to browse:

  • By file. (Show me everything in foo.h)
  • By class hierarchy. TGE Class Hierarchy
  • By version. (What did this look like two revisions ago?)
  • By project. (What does this look like in TSE, TGE, T2D?)

The system should be able to display all the information found in a typical Doxygen generated page, like the one for SimObject. To reduce database load, it's probable that we'll want to have a class overview view, and then a member view, rather than one big page.

In practice, most of the heavy duty documentation from SimObject will get moved into the wiki once we deploy ELIXIR, but it's a good test case.

ELIXIR doesn't need to support the "page" functionality in Doxygen, as any content using that feature will get moved into the wiki.

As much as possible should be linked (ie, to related symbols). Doxygen already does this most of this for you.

Authentication

Authentication will be done against the GG login system. This provides product ownership information, so that viewing content can be restricted by username. Some documentation, like the script docs for TGE, will be public, and not require ownership to browse. In general, if you don't own anything you can't edit. There should be a mechanism to give specific accounts permissions above and beyond what their product ownership implies, so we can, for instance, "bless" non-licensees and allow them to edit script docs.

Search

There should be a rich search interface. It should be possible to filter searches by project, version, file, namespace, etc. There should also be a "find best match" feature for tools, so that you can provide a symbol or file and immediately get redirected to appropriate docs.

Annotation

Annotation is how ELIXIR will be anchored with our other content offerings. It provides a way for community comment and feedback on Doxygen driven content, as well as for them to share knowledge. The annotations are also clearly seperated from the Doxygen driven content, meaning we can easily update the Doxygen content without clobbering annotation.

Why a wiki area instead of a forum thread? Because as we update the documentation we're likely to render big chunks of commentary by the community moot. Say the first twenty posts on a forum thread are on ways a given piece of doc could be improved. Then the docs are improved - suddenly all that commentary becomes noise. It's much easier to simply remove that part of the wiki content for that documentation.

The wiki versioning also ensures that content isn't lost in a deletion, as it would be in a forum system. If there was a nugget in an old revision (or if a vandal blasts something), you can review the history or even revert to an older version.

The annotation system should use one of the open source PHP wiki engine components for formatting. It should be versioned, with recent change tracking (esp via RSS feed, so that people can easily know when things happen). There needs to be a revert feature.

The preferred TDN wiki formatting is basically filtered HTML with some wiki-esque conventions (ie, implicit p-breaking).

In addition, some other behavior should be provided. For instance, older versions of docs should have a big "these docs are outdated" notice so people don't inadvertantly use old docs. Old docs should be available, though, because once people start on a big project they may not be able to upgrade immediately to the latest release.

It should also be possible to redirect people to the "canonical" version of a particular piece of documentation; for instance, all project's core Sim code should have a note that says that the canonical version can be found in the TGE project. This acts to keep comments and improvements focused, rather than scattered across a different possible sites.

It should also be possible to lock older versions from being annotated, or at least have a firm warning, so that people don't submit improvements for old docs. Since the docs are in source control, they are versioned along with the code.

General Notes

ELIXIR should be written in PHP and be backed by MySQL. This common denominator enables us to easily find people to maintain or extend it, as well as meshing well with our existing hosting choices.

LXR (http://lxr.linux.no/source/) is the closest package in spirit; it is a tool to allow people to better understand big complex codebases (specifically, you can find examples for Mozilla and the Linux kernel). It deals with multiple interrelated codebases. The interface is fairly intuitive. As it happens, we don't specifically want the source browsing ability in ELIXIR (though it would be a very good 2.0 feature), but the goals of the two applications are similar.