Friday, July 17, 2009

Reworking the Sandcastle Help 1.x Build

As part of the new Sandcastle Assist update we will be uploading soon, we have completely reworked the Help 1.x (CHM File) building process of the Sandcastle for mainly two reasons:
  • Improve its use of memory, and
  • Control the build process to add more options.
For anyone new to the Sandcastle tools, I will give a little background so that you can understand why we wanted to satisfy the above requirements.

Background

Microsoft uses the Sandcastle tools to build its own documentations, and as we all know, these are mainly in Help 2.x format. The building of Help 1.x as in Silverlight documentations does not involve large outputs.
Sandcastle, therefore, is designed to produce Help 2.x compatible HTML format, which includes keywords, attributes etc.
To build the Help 1.x compatible HTML format, Sandcastle uses two console tools
  1. ChmBuilder.exe, which converts the HTML files to the Help 1.x format, generates the projects, table of contents and keyword files. See Building CHM using CHMBuilder.
  2. DBCSFix.exe, which is used to fix localization issues with the compiler.
    See CHM Localization and Unicode issues.
Memory Issue

This will not be an issue for the Help 2.x, since that help format supports easy plugin system, enabling the developer to split the help building process into parts to avoid large memory requirements. See Componentization - Building Assembly level HxS using Sandcastle.
Now, componentization is available in the Help 1.x too, but it is not that easy and not easily supported by the current Sandcastle GUI frontends.
Another reason why this memory issue may not arise in the case of the Help 2.x is that, the HTML file generated by the Sandcastle contains the keywords and attributes, and so no separate keyword list file is created for that compiler.
In the case of the Help 1.x, the memory requirement is high, due to the following reasons
  • Both the ChmBuilder.exe and DbcsFix.exe tools use the .NET Directory.GetFiles() to retrieve all the files in an array for the conversion processes, and for very large projects this could be high.
    NOTE: Sandcastle's HxfGeneratorComponent could be used to eliminate this since this outputs the generated HTML files to a *.HxF file.
  • To generate the table of contents, the ChmBuilder.exe uses a .NET Dictionary for a mapping of topic and title.
  • Again, ChmBuilder.exe uses a .NET List of a structure to store all the keywords retrieved from the Help 2.x HTML compatible file in the conversion process.
Our solutions:
  • Use the Windows API FindFirstFile and FindNextFile to iterate over the files in a output directory, it is the same used by the .NET Directory.GetFiles() to pack all into an array.
  • Use a file-based dictionary, a BTree implemenation (hopefully, it is faster than complete database, we will be testing) to store the topic-file map,
  • Allow the Help 1.x compiler to retrieve the keywords from the HTML files, just like the Help 2.x compiler by rewriting the keywords in the format used by Microsoft, google for MS-HKWD and MS-HAID. The tests so far produce the same results as the ChmBuilder tool, but we are looking at ways to improve this.
Control Over Output

We simply needed more control, and wanted to modify the output project files from the ChmBuilder tool, but these are not valid XML or XHML files. We will have to rewrite those files or write a parser to modify them.
We used the memory issue above to work on this :)

Conclusion

With the Sandcastle Assist, we are trying to find means to improve upon the Sandcastle tools, and on memory requirements, this is just the first and easier step. The main is with the reference link resolving component, ResolveReferenceLinksComponent. This will involve a little bit of work and research, we are working on it.
We have supported grouping in the build process to enable you to separate the documentation project into parts for componentization. We will continue to work on this too.
Thanks for reading, we will love any input in these efforts. May God bless you.

In the beginning...

Sandcastle Assist is an open source effort to enhance and provide easier developer access to the Microsoft Sandcastle product, which is a documentation system for managed class libraries.
The project is currently hosted on CodePlex: Sandcastle Assist.
In this page we will provide information on the develop of this library, tips and useful information on using the Sandcastle in various projects.

Please join us.