DAISY Pipeline 2, Presented at the eBook Accessbility Symposium at NFB – December 2011

DAISY Pipeline 2, Presented at the eBook Accessbility Symposium at NFB – December 2011


I am Romain Deltour, I am working for the
DAISY Consortium, and I am going to spend the first half hour talking about the DAISY
Pipeline 2 project, which is an open framework for automatic document processing. I am going
to give an introduction of the tool and then briefly describe some possible workflow for
production, and then if we have some time left for questions I’d be happy to answer
them. So let’s first talk about the background – there is an ever-existing demand for accessible
content. This demand comes from a wide range of user groups
and content must be published through a wide
variety of distribution channels – online content, CDs, SD cards – and to an increasing
variety of devices. Visually impaired users use different kinds of devices, like Braille
displays, hardware DAISY players, laptops, iPads, tablets, whatever. Because there is
a wide range of user groups and a variety of devices, there are a lot of different output
formats. This project is the followup to the Pipeline
1 project which was started in 2002 by the DAISY Consortium and is now in maintenance
mode. It has been quite successful and used a lot in the DAISY community to produce accessible
content. But at the time we were creating the projects, some technologies and standards
were not ready. Now, we’ve decided to come up with the Pipeline 2 project and totally
redesign the software to better rely on open standards and new technologies. Our high-level objective is to be efficient
– enable the tool to produce many documents in a short time. When a publisher wants to
produce a newspaper for the next day, it has to be quite efficient. The tool has to be
low cost, which means it is easy to develop, adapt to a new publishing workflow, and easy
to maintain over time. And it needs to be versatile, which means it can adapt quite
well and it can be interoperable with different systems and it needs to be able to produce
different output formats. Our approach is to come up with a modular
system – why a modular system? Because if your system is modular, based on several components,
it’s easier to extend. If I want to augment this EPUB3 production with MathML processing,
I just add a MathML-aware module to the system. It must be easy to customize – if I have some
special needs in my organization to produce the content, I must tweak the production workflow
to meet these needs. It makes it more easy to integrate and open the tool to commercial
and non-profit use. The module system is a plugin system itself, so commercial companies
can come up with their own plugins for the tool. The other big item in our approach is to promote
single-source publishing – that’s not a requirement, but it is recommended. What is single-source
publishing? It means that we use an XML master document in order to produce different output
formats. Markus said earlier that when you have a reasonable, satisfying level of good
structure in your document and good semantic inflections in your document, then you almost
have the accessibility features for free, and that’s what we are talking about now.
If we have a rich XML master, rich enough, then with automatic production we can transform
it into a variety of accessible output formats such as DAISY digital talking books, EPUB3
books, Braille content, large print. Of course, this is not a requirement nor limitation of
the tool. It is what we’re suggesting, but the production workflow can be adapted to
different use cases and workflows. We can transform input formats into these XML masters,
then into other output formats, or we can go a totally different route. And as for the
XML master, currently we are focusing on DAISY A.I. also known as the DAISY 4 Authoring and
Interchange standard, which is the successor to the DTBook format. It’s an authoring format
with an XML schema that can be used to describe almost every document. The third big item of our approach is that
we focus on accessibility and quality. A valid EPUB book is not necessarily accessible. We
strive to produce some content that is inherently accessible and inherently well-structured,
which makes it a quality publication. Now as for the architecture, I won’t dive
into the tiniest technical details here, but just to give you an overview of what kind
of technologies we are using. We are relying on W3C standards, notably XProc, XSLT, and
XPath, which are all open recommendations and native XML processing technologies. We
do that because it’s easier to manipulate XML to produce XML with XML tools with technologies
that have been made for the job. XProc is the XML pipeline language. It’s a language
that has been develop to orchestrate XML processing steps in a workflow. It has been a recommendation
since May 2010, and there are already many open source and commercial engines available.
Then, on top of this core XProc engine, we’re adding a module system. Again, each step in
the production workflow is an independent cohesive software component, which we call
a module. It can be implemented in several technologies like XSLT, XPath, and Java code,
and it is all orchestrated by XProc. Then we have the runtime framework, it is like
the glue code that ties all these components together. It makes the XProc engine aware
of the modules. It makes it feasible to run these modules with the XProc engine and it’s
based on the Java technology and the OSGI module system, which helps us to come up with
a service oriented approach where we can plug in different pieces of functionality like
job management, logging, web services, you name it. So that’s it for the architecture, we have
this core XProc technology and the core processing technologies are implemented with open standard
recommendations and then everything is run in a Java-based open source runtime which
implements a module system. So what are the deployment options for our
tool? We have the possibility to use the tool as a command line tool. This is already available
and the tool will be revamped early next year. The tool can be called by a RESTful web service
API. This tool is already available, it’s going to be gradually enriched and improved
based on feedback. We also want to develop a web application for the tool, a web UI that
you can access with your browser. The target release for this web application is June 2012.
And ultimately, we’ll also come up with a lightweight standard desktop UI. It’s going
to be a sequence of dialogs to guide you through the conversion process. The goal is to be
able to embed that into third party applications. For instance, if you have ever used the Word
Save as DAISY plugin, it calls the Pipeline under the hood and it pops up a sequence of
dialogs to invoke the Pipeline process. So that’s the kind of user interface we’re looking
at. This desktop UI is planned for the first half of 2013. Now I’m going to describe, rather than demoing
an automated tool (It’s not very interesting. I just start the process, it runs, and it
gives me a file so there’s no real point in showing that) but instead I’m going to describe
briefly some sample workflows that are available or in the works for the tool. First, I’m going
to briefly talk about EPUB production, how we do that, and what it takes. I won’t dive
into every step of this workflow, but basically we have a generic process when we talk about
EPUB production. We look at the input file set, we determine the reading order of the
file set, and then based on this reading order we process the content to convert it into
HTML5, possibly add some media overlays. When we’ve done that, we extract the metadata from
the documents, we automatically create a navigation document, then we package the file set (zip
it). What’s interesting here is that we try to have each of these steps as independent
as possible from the previous ones which means that they are interchangeable (if possible)
and reusable for different production workflows, depending on the input and output. We try
to automate, of course, as much as we can. For instance, the navigation creation is fully
automated based on the structure of the content document. We look at the HTML markup, and
if it is well structured, if it has the proper markup and top-level sections, semantic inflections,
and things like that, we can automatically generate the navigation files. I’m now displaying another workflow diagram
that shows an instance of the workflow applied to a DAISY 3 to EPUB 3 conversion. It basically
shows how we use some of the components when we have a complete conversion requirement.
For instance, when we have a DAISY 3 file set, to determine the reading order of the
final EPUB 3 publication, we are looking at the DAISY navigation file for that. When we
want to generate media overlays in an EPUB 3 production being published out of a DAISY
3 DTBook, we are taking the existing .smil files and audio from the DAISY 3 file. This next workflow is very high-level. I’m
going to briefly describe how we can use this tool to add some advanced TTS notations to
an EPUB publication. This workflow is like this: Start with the XML master, for instance,
in DAISY authoring interchange format, and then, depending on the original markup of
the document, it may need to be improved. For instance, I give here an example of a
sentence, which is: “Have you seen the movie ‘La Vita e Bella?'” It’s just a paragraph,
it’s tagged as a paragraph, and “La Vita e Bella” is not marked up. So we can have some
preprocessing tools to enrich this markup and make it better by tagging “La Vita e Bella”
within a name element. This kind of markup enrichment can be either fully automated or
needs human interaction. Sometimes there are things that an automated tool cannot do, and
at this time we need some human interaction. Here, for instance, to identify movie titles
or proper names, e-mail addresses, whatever, we can query some databases to do that. So
once we have this properly tagged XML master, we then transform that into an EPUB. During
this transformation, we can plug in a module that will talk to a remote lexicon in one
of the available and the lexicon will know how to pronounce this foreign proper name.
So it will be able to add the EPUB annotations to the produced EPUB file. It will say, I
am using this phonetic alphabet, and the phonetic description of the title is this. The interesting
parts here are that this is really a good use case for automated production because
we can automatically query some organization-wide lexicons and data to improve the publication
– either improve the document we intend to archive, or improve the live publication process.
The Pipeline 2 project has built-in support for remote services – we can basically make
some HTTP requests and call some web services. So if an organization is making their lexicons
available for the public or parters as an online service, this service can be called
from the automated production tool to enrich the outcome of the EPUB 3 product with these
text to speech annotations or other accessibility features. It’s particularly interesting for
text-to-speech annotations because building a comprehensive and rich lexicon is very time
consuming and costly, so usually users are maintaining large lexicons and they gradually
enrich and enhance the lexicons over time. So here in this very example I showed how
to add some inline pronunciation hints, but you can also convert the big organization
lexicon into a small subset of it that you integrate natively within your EPUB. To summarize, the Pipeline 2 project is an
open platform, it’s all based on open source software either from third party developers
or from the DAISY Consortium, it’s currently licensed as LGPL which is a commercial friendly
license, but we are maybe looking at other licenses, we are discussing other license
possibilities such as the Apache License. It’s a collaborative project, it’s maintained
and led by the DAISY Consortium but it also involves DAISY Consortium members: the National
Library for the Blind of Norway, the Swiss Library for the Blind, RNIB, and other organizations.
It’s a really collaborative project. It also has built-in accessibility and customizability,
which means that we intend to make the tool extensible for special purposes and customizable
to special organization workflows. What is available today in terms of concrete
conversion? We can go from DTBook (DAISY XML) to DAISY A.I., the new version of DAISY XML.
We can go from DAISY A.I. to EPUB 3. And we can go to DAISY 2.02 (Digital Talking Book)
to EPUB 3 plus optional media overlays. But that’s just for the first iteration of the
software. We have many things in the works, at the top of the list is improved EPUB 3
support. We’re going to come up with additional input formats that will be transformable into
EPUB books. I’m thinking of other DAISY formats like DAISY 3, HTML, RTF, things like that.
By improved EPUB 3 support, I’m also talking about these TTS annotations. This is not already
available, but it will be deployed, it’s already planned for the next year. In the works is
also Braille production, being able to produce Braille. There will be several prototype solutions
developed by an independent working group, working on this Braille topic. We’ll also
work on TTS based production. Usually users prefer when the text is narrated by a human,
but sometimes we don’t have the time to have the text narrated by humans, when you want
to deliver newspapers for instance. More and more reading devices and reading systems will
have the built-in ability to speak text so they will have their own TTS engines built
in. At the same time, if you do the TTS processing upfront, you can rely on more server resources
and processing power and better lexicons, which all makes for a better TTS based production.
So we are still targeting TTS based production. Okay, that’s it. Thank you.

Leave a Reply

Your email address will not be published. Required fields are marked *