WWW94: converting formatted documents to HTML


presented by jon stephenson von tetzchner, norwegian telecom research
the author presented a program called "fm2html" that converts documents from FrameMaker Interchange Format (MIF) into HTML.

document formats

a MIF document can roughly be divided into four sections:

HTML documents contain only very little information about the layout of the document, because it is up to the client software to decide how the data shall be presented.

the conversion process

the conversion process is broken up into the following stages:

  1. convert the FrameMaker file into a FrameMaker MIF file using fmbatch
  2. convert the MIF file into a HTML file and at the same time, extract figures and convert them into GIF files. also a table of contents is automatically generated.

the conversion of plain text is quite straight forward. one of the problems is that HTML does not support tabulators. therefore tabs are removed except in the case where paragraphs are bound to the HTML construct "preformatted".

hypertext links are also converted except those that are page based. since HTML is not page oriented, these links are ignored.

the process of converting figures has seven stages:

  1. extract each figure from the original document into a separate MIF file
  2. convert the figure to postscript format
  3. convert the figure to ppm format
  4. remove excess space
  5. add a border
  6. convert to GIF
  7. include figure into resulting HTML document

tables are converted using the HTML-"preformatted" tag.

in the future the proposed HTML+ format would allow more accurate automated conversions of formatted documents.


i have no link information for this paper on the web.
3rd_day_fm2html / 13-jun-94 (ra) / reto ambühler