School of Computer Science

Propping Open the Document Trapdoor

Steven Bagley and David Brailsford were recently invited to give a 'Google Tech Talk' at the Google campus in Mountain View, CA. They chose to outline the reasons for their current research into developing new mechanisms for describing documents to support eBook Readers and other alternative methods of reading documents. Their talk can be found on YouTube at:

http://www.youtube.com/watch?v=aOkxl3g-OuI

Computer document processing often starts with an abstract, structural, representation before entering a processing pipeline which creates a desired layout and appearance. But unfortunately the whole system resembles a series of steps in a one-way chemical reaction, or the successive irreversible stages of creating assembler code using a compiler.

This `one-way function' behaviour is most obvious with PDF, which is tied to a completely fixed appearance once a document passes through a one-way 'trapdoor' like Adobe Distiller. Some formats, such as XHTML, allow for a little more wriggle room but even this breaks down if the appearance changes dramatically (such as displaying a Web page on a large monitor). In essence, any attempt to reflow a document, or view it at some other size, is either frustrating, or simply impossible, without regenerating the document from a more abstract, higher-level representation.

This limitation has not had much effect over the past 25 years, but it is now hitting us hard. In a world of iPhones, eBook Readers, 10" netbooks, laptops, 30" Cinema Displays—and not forgetting the humble printed page— it is no longer safe to assume that a document will be viewed in one fixed presentation. `Repurposing' (without the need for total re-processing) needs to be the watchword for a modern document format. However, this leads us to the heart of the problem:
current formats don't lend themselves to having their presentational properties partially unpicked and re-engineered.

In this talk, we outline the current state of the art in document formats, and their limitations when it comes to repurposing. We describe our attempts at making PDF be a more repurposable format and we outline some necessary features, and open questions, for future document formats. 
Posted on Tuesday 28th February 2012

School of Computer Science

University of Nottingham
Jubilee Campus
Wollaton Road
Nottingham, NG8 1BB

For all enquires please visit:
www.nottingham.ac.uk/enquire