Standardizing Standards 1: Introducing NISO STS

Webinar Transcript

CHANDI PERERA: Hello. Welcome to the Standardizing Standards webinar.

This is a webinar series presented by Typefi and Inera. It’s the first webinar in the series and we’re going to be introducing NISO STS.

My name is Chandi Perera, I’m the CEO of Typefi Systems. Typefi Systems is an Australian company, headquartered in Australia with customers in 35 countries. We have offices in Australia, US, UK, Netherlands, and Sri Lanka, and we have about 10,000 users around the world.

We produce a single-source publishing platform that fully integrates print, online, and mobile. We take single-source content in any format—specifically for this webinar in NISO STS format—and then output it to all 30 different formats.

We leverage industry standards for fast implementations and ease of use.

And now I want to introduce you to my co-presenter, Bruce Rosenblum, CEO of Inera Incorporated. Bruce has been working in the standards space for a long time but he was also the Co-Chair of the NISO STS Standards Committee.

BRUCE ROSENBLUM: Thank you very much, Chandi. I’m Bruce Rosenblum, CEO of Inera Incorporated, and also Co-Chair of the NISO STS Working Group, a working group that I’ve been honored to co-chair with Rob Wheeler of ASME for the past two years. And it’s that working group that’s created the NISO STS standard that you’ll be hearing more about today.

Inera is located just outside of Boston in Belmont, Massachusetts. We have customers, however, in 27 countries, so we really have a global focus, and our offices are in the US, UK, and Canada.

Inera serves thousands of publications and hundreds of publishers worldwide every single day with our eXtyles and Edifix software.

What we do is provide a suite of editorial and XML tools for Microsoft Word called eXtyles, and we’ve just introduced our brand new product, eXtyles STS, which is a special version of eXtyles for standards publishers.

eXtyles allows publishers to automate the most time-consuming aspects of document cleanup, formatting, and editing, and at the end of doing all of that, they’re able to produce high quality XML—such as XML according to NISO STS—at the click of a mouse.

And like Typefi, eXtyles leverages industry standards, wherever possible, for fast implementation, ease of use and, of course, to have a standardized workflow.

CHANDI: Typefi and Inera have many joint customers in the standards space. We started working with ISO in 2010, in the initial implementation of what has become STS. We have joint customers like BSI and Standards Australia who are national standards bodies, and we also work with many standards development organizations, like IEEE.

In this webinar series, we are going to be covering how to implement either an ISOSTS workflow, or a NISO STS workflow.

So in this webinar, we’re going to tell you a little bit about STS, a brief history, the difference between ISOSTS and NISO STS, and why we should be looking at implementing an XML—and specifically an STS­—workflow for your standards publishing.

In the second webinar, Bruce is going to be talking about where to introduce your XML process. What are the different options? How do you convert content to XML? How do you publish it? What are the different options available for doing all of that?

In the third webinar, we are going to be focusing very specifically on covering how to convert Microsoft Word documents to STS, because we understand most standards bodies and their committees use Microsoft Word as their authoring platform. And you’re going to see a quick demonstration of the Inera product set.

In the fourth webinar, we are going to be talking about how to get your STS—which you have now converted from your Microsoft Word—into your publication in multiple distributable formats, be it PDF, EPUB, or HTML. We’re also going to be seeing a quick demonstration of Typefi in webinar number four.

Standards work with adoptions, so quite a few national standards bodies, or all the standards bodies, will often adopt standards created by other standards bodies. So adoptions is a big and important part of publishing standards.

And webinar number five is going to be focused on how adoptions work in an STS workflow; how an XML workflow will simplify and ease your adoptions process.

Webinar number six is going to be focused on what is now possible after you have adopted STS. Right now you might be publishing from Microsoft Word, or InDesign, or some other single output platform, but after going to STS and XML you’ll be able to do more with your content.

Webinar number seven and eight are going to be presented by guest presenters from ISO and Standards Finland. Both of them are going to be case studies.

The first one is going to be a case study on how ISO implemented their XML workflow, and the issues they’ve run across, and the benefits they’ve received.

The second case study is going to talk about publishing multilingual standards from SFS, or Standards Finland.

Over this webinar series we are going to help you understand NISO STS and how you can implement it, what the benefits are, and help you with some case studies so that you can make an informed decision if this is the right way for you to go.

BRUCE: Standards are everywhere. Unless you’re a standards professional, you may not recognize that standards are all around us. Whether it’s the size of shipping containers, or the thickness of railroad track, or even in fiction.

For example, in the popular Harry Potter series, Percy Weasley in book four is worrying about trying to standardize thickness of cauldron bottoms because there have been problems with leaks. So even J.K. Rowling was privy to the fact that standards are important for us in everyday life.

Of course, there’s one place where we haven’t had standards up until now, and that’s actually in the publication of standards.

Sure, standards have standard editorial styles, such as the ISO Directives or the IEEE Standards Style Manual, but that’s more about exactly how you put the standard together in terms of how it reads and how it’s structured, but there’s never been a standard for the XML markup of standards.

At least, until 2017 there wasn’t.

But now we have NISO STS. STS stands for Standards Tag Suite and what it is, is a standard for XML markup of standards, or another way to think about that is: How meta have we become?

It’s officially known as ANSI/NISO Z39.102-2017.

Where did this standard for standards come from? Well the start of it is actually in the ISO project that Chandi mentioned a few slides back. In 2010, ISO still had a very basic workflow for publication, which was to take Word files and make PDFs out of them. They had no full text XML, they had very slow publication times—often from FDIS to publication times in excess of six months for a standard.

And so they started a new XML project that was really very inwardly focused. They were not setting about making a standard. They were just trying to improve their own publication processes and, in the process, also try to have multiple output formats—not just PDF, but HTML, and EPUB, and have XML that they could load into a database.

What came out of that project is what’s known as the ISO XML model, or ISOSTS. Because when they started they didn’t have any DTD or Schema, and what they did as they started up their project is they contracted with Mulberry Technologies, a firm outside of Washington DC, to develop an XML model for them based on JATS.

JATS is the Journal Article Tag Suite, and this is also an ANSI/NISO Standard that originally dates from 2012, but the origins of it go back to 2002.

In fact, JATS has become pervasive in the journal publishing world. It’s pretty much the standard for full text tagging of journal articles.

But you may ask: “Why did ISO decide to base their XML model on JATS?”

Well they actually looked at several alternatives, including some standard models such as DITA, DocBook and TEI, and they also looked at building something from scratch. And when they compared all of these different models, they first of all decided that building something from scratch would be far more expensive, and really not necessary, and second of all, that all of these models had a lot of the structures they needed.

But in particular what JATS had, in addition to similar structures between journal articles and standards for things like sections, tables, figures and mathematics, JATS had far and away the strongest model for bibliographies and reference lists.

And it’s a very mature foundation; it was already an internationally recognized standard. It was actually already in use by many other standards organizations for publishing their journal material, such as IEEE.

And JATS also has extensive vendor and software support. So it almost became a no-brainer for ISO to go ahead and select JATS as their foundation, and have Mulberry take that foundation and modify it specifically to meet the needs of ISO.

So, with ISOSTS successful for the ISO project, and you’ll hear more about that in one of the later webinars when there’s a case study about ISO, why is there a need to create NISO STS?

Well, it turns out that ISOSTS has actually been quite successful for ISO and other national standards bodies, but it’s actually too limited for certain bodies outside of the national network—for example, many of the American standards development organizations or SDOs.

So what NISO STS provides to a much larger group of standards bodies is a stable standard that they can all use, and much enhanced guidance in the form of great documentation to both tool and conversion vendors about how to use the standard.

More importantly, it becomes a common format for sharing both metadata and full text, and a common XML model that can be used across publication types.

For example, many American SDOs publish not just standards, but journals and books and conference proceedings, and because STS is related to JATS, and also the BITS book model for XML, you can have a common core model that’s in use across all of those, and have a much greater leverage of your XML technology across all publications and groups in the organization.

And what all of this leads to is a lower barrier to entry for XML publication, for any standard body who now wants to get involved with XML.

What were the goals of the NISO STS project?

Well having established a need first for ISOSTS with ISO and their national standards bodies, and then NISO STS as a greater set of needs for standards development organizations, the working group that created NISO STS went ahead and moved to align the existing ISO work with the JATS 1.1 standard, because JATS had continued to evolve.

And by aligning more carefully, JATS and STS will be able to stay aligned into the future for updates to both JATS and STS.

What specifically was added to NISO STS? Well first of all, richer metadata structures. A tremendous amount of work went into this part of the project, to make sure that metadata requirements could be covered for just about any standard that we saw. And you’ll understand the breadth of this in a minute when we talk about the working group.

The second is much richer support for adoptions. The original ISO model was a flatter model, and the new NISO STS model has a much richer hierarchical model that can work not just for adoptions from the country level adopting an international or regional standard, but also for two standards organizations working together, where one may adopt the work of another.

And another very important addition is support for Digital Object Identifiers, or DOIs, because many American SDOs use DOIs to not only identify their standards, but to be able to link between standards.

And, of course, all of this was done with complete backwards compatibility for existing users of ISOSTS so that they can continue to work in a smooth way with the pre-existing model, and transition very smoothly to the new NISO model.

The timeline for the project was a two-year-long project. We had a call for working group members in August 2015, we had our first committee call two months after that, and we had a tremendous outpouring of interest.

We actually had so many people interested in the project that we broke the project into two groups, a steering group and a technical group, with the steering group setting policy and the technical group actually working out the technical details.

It took a year to get to a committee draft, and then we got tremendous feedback from the committee review of that draft, and ultimately brought the work to public comment in May 2017.

After a month of public comment we had just a few additional comments. We addressed those, brought the standard to NISO vote in September 2017 and, just weeks ago on October sixth, it became an official ANSI/NISO standard.

To give you an idea of the range of interest in this work, this slide shows the members of the steering group—so not the technical committee, just the steering group—and you can see it’s a wide range of people from national standards bodies, standards development organizations, vendors, libraries, other interested parties.

So it really was a tremendous range of people that we had. It wasn’t just a project of a few small people working on a team.

We also had a smaller technical group, and this group really focused in on the technical details of the work. And you’ll notice that the asterisks indicate people that are members of the JATS and BITS working groups as well, so this has created wonderful cross-pollinization between these different standards, to make sure that they can move properly forward in parallel.

And you can see that there are already a lot of adopters. These are many early adopters—of course ISO, with ISO being the organization that was behind ISOSTS. Many national standards bodies, but also many American SDOs are involved in this early adopting list.

And this list is not complete, we are aware of quite a number of other standards organizations at this point that are in the process of moving to either ISOSTS or NISO STS, or have actually started up projects already but aren’t yet ready to have their names publicly announced.

CHANDI: There are many benefits of adopting STS. One of the biggest benefits is it will help you streamline your production workflow.

Using STS, many of our customers and the standards organizations we’ve mentioned above have automated their production process.

But also, as part of that automation, they have started validating content much earlier in their production process. So traditionally, validation happens at either an editorial stage, or ideally at a proofreading stage.

So, for example, if there’s a figure citation for Figure 6, and figures stop at Figure 5, quite often it would get picked up at a proofreading stage. But when you have XML standards you can actually check if there’s a figure citation for Figure 5—does Figure 5 actually exist?

Or their might be a citation for an ISO standard, you know, ISO 1234:2013, and when you’re drafting the standard you’re expecting the other standard to be published in 2013, but something slipped and it went to 2014. Well, during the process, especially in the final cleanup phase, it didn’t get picked up, but XML validation can tell you very quickly—actually there is no 2013, there’s 2014.

A lot more of this is going to be explained in webinar number three when we’re talking about converting to STS, but you can do a lot more validation much earlier in the process by putting content into STS and actually validating that semantically.

You can reduce your time to publish. We mentioned ISO significantly reduced their time to FDIS—it used to be about a few months, about six to eight months, and now they’ve got it down to two weeks, with most publications being under one week. So that’s a significant time saving.

And it’s a significant benefit to the people who are using the standards, because standards are getting published faster.

There are improved tools. So not every single standards body can publish to every single platform out there. Fifteen to twenty years ago you published print—doesn’t matter where you were in the world, you got the same page size, same document.

Today there are many many many platforms. Obviously there is still print, but there is online PDF with different options, there’s HTML, there’s e-book. On top of e-book there are platforms like iBooks and Kindle, there are proprietary platforms, there are many many distribution systems, and not every single standards body can afford or be up to speed with all these different platforms.

But by going through to something like STS, you can actually publish to all of these using a set of tools, and do it at a lower cost than you could do by using a proprietary system or a system that is going to a single output.

By using a standard, it also means you’re sharing your risk of your workflow across multiple other organizations. So if you have a workflow that is proprietary to you, you are going to be fully invested in that, you are going to carry the full risk of that.

But if you use a workflow that’s used across the industry—a standard, for example—that means your implementation is going to be very similar to the implementation of other similar organizations. That means you can use the tools that are developed across all of these organizations, and therefore share the risk and the costs from these tools and the vendors.

STS is also going to help you with new products. Again, there’s going to be a later webinar that’s going to be covering this in much more detail, but going to XML, overall, will significantly reduce your effort and cost and complexity of producing HTML, EPUB, and mobile output.

XML and STS is also significantly going to enhance or help assist in producing accessible content, be it accessible PDF, DAISY, EPUB, or any other accessible formats.

It’s also going to help you develop new products. I would recommend after this webinar you go to your favourite search engine and type in ISO Online Browsing Platform, and go and take a look at that.

We are going to cover this browsing platform, the OBP, in a later webinar in a little bit more detail, but it is something that you can look at that is possible according to XML and STS, which is not possible or it’s going to be very difficult if you don’t have any underlying XML framework for your content.

It’s also going to help your customers, especially organizations that are going to be using standards from different organizations. So, right now, if you’re a customer and one person’s sending you PDF, another person’s sending you Microsoft Word, somebody else is sending you HTML, it is going to be difficult for you to have these standards in a single repository, a single interface.

By having a common standard across all standards bodies, it is going to help your customers access the standards and discover them much more easily.

Co-publishing standards and adopting standards are going to be a lot easier with ISOSTS or in NISO STS. You will see this in a later webinar.

And it’s going to help you interchange standards with your partners. So right now if you’re distributing standards in a proprietary format, or a closed format, you and your partner need to agree on exactly what’s being sent. Am I sending a Word file? Am I sending a PDF file? Am I sending an HTML file?

With STS there’s a defined set of metadata that you interchange, and it is a well understood standard.

And, of course, inter-standard linking is going to be very useful. Not only will you be able to link through to links within the document, so wherever you see ‘see Figure 5’ or ‘see Table 9’ you’ll be able to click through that and jump to the actual object, being the table or a figure.

You will also be able to link between standards, and those will be validated live links, and these things are going to be very useful. Again, there’s going to be another webinar on new products, but I want to get across—I want to stress—that going to STS will help you build new products faster and quicker.

If you’re a standards body that does a lot of adoptions, or your standards get adopted, using STS is going to significantly improve your adoption workflow.

You’re going to have consistent metadata in delivery to your partners who are doing adopting, or if you’re the organization adopting the standards you’re going to know exactly what you’re going to get, and it’s going to be really quick for you to adopt the standard and publish the adopted standard compared to dealing with proprietary formats.

Indexes and site-crawling are going to benefit because you’re going to have known metadata in known locations.

And it’s going to encourage adoption across industries that are currently maybe not thinking of adopting because it is a much harder workflow. As in if you are sending a Word file you need to know which type of Word file to send. Have you got the right fonts, because if you don’t have the exactly same set of fonts you open that Word file and everything is reflowing.

None of these are going to be an issue if you get an XML file and you render that based on your own set of requirements.

A much much bigger benefit for the wider community, and for you as a standards publisher, is it’s going to make your standards more discoverable. You will be able to deliver your products to many more devices. Today you might be producing a PDF file that might work very well on a big screen or a laptop, but probably does not work very well on a phone.

Or with new platforms coming you will be able to deliver your content either in a DRM format or a DRM-free format specific to that platform, using STS. So that means people will discover your standards from many many different platforms, not just having to come to your website and search there.

It’s also going to make aggregation of standards much simpler—so there are systems and portals where you can subscribe to or you can put your content on, who will aggregate your standards and cross-link your standards.

So, for example, if you’re from Standards Development Agency A, and you create a standard that refers to a standard from Standards Development Agency B, right now the only way to do that is go to your favourite search engine, find the URL for the other standards agency, go there, find that standard, and try and get a hold of it.

Now imagine if these standards are cross-linked. A, it’s going to be much more useful for the user of the standard, but B, it’s going to be much more commercially attractive to the standards agencies, because anybody reading the first standard might want to buy the second standard. So there is that benefit.

So, how can this impact your business? Well, it’s really up to you. STS is not a silver bullet, but it is enabling technology. It is going to help you update your workflow, modernize your workflow, it’s going to help you create new products, and it’s going to create a common new platform for vendors and users of your standards.

It is going to significantly increase the interoperability between your standards and your customers’ internal workflows, and your cooperating agencies and other people who produce and use standards.

But, at the end of the day, STS does not drive your business decisions. It is up to you how you want to implement it, and you will see some case studies later on standards bodies who have implemented STS, and their experiences and the benefits they have gained.

But, at the end of the day, it is a business decision you need to make on how to update your workflow, what new products you need to produce, and what your customers are demanding from you.

Typefi and Inera stand here ready to help you with making that decision.

BRUCE: So if you’re intrigued about NISO STS, where can you go to learn more about it? Well there are a variety of really really good sources.

First, the official standard has been published on the NISO site at this URL, it’s quite simple—www.niso.org/standards/z39.102-2017.

In addition, the STS Working Group has a public area on the NISO site—they’re workrooms/sts—and all of the minutes of the STS Working Group meetings have been made publicly available, so anyone who’s interested can follow the progress of the working group.

Finally, and actually most importantly, all the non-normative materials—including the DTDs, Schemas, RELAX NG models, and wonderful documentation—are available at www.niso-sts.org, which is the site where all of the non-normative materials can be found. This site is a wealth of information.

In addition to all of this there is best practice information, there are sample documents to show how to take advantage of STS or there’s a sample adoption. There’s actually even a copy of STS in STS XML on that site.

So I recommend that you check out all of these resources, and also at the niso-sts.org site you’ll find links off to a couple of full-day symposia that NISO has hosted about NISO STS.

So one of the great things about this is yes, it’s a brand new standard, but because it derives from work that’s already been publicly used in the standards world since 2010, and is derived from a pre-existing standard, JATS, that goes back to 2002, all of this work is mature. There’s a tremendous amount of information and best practice information about how to use this standard already.

So, have at it, and go use the standard.

CHANDI: Thank you for taking the time to watch this webinar. You can see the rest of the webinar series at the URL below.

I will be presenting some of the webinars, Bruce will be presenting some of them, and some of our consultants will be presenting the other webinars. This will help you understand the STS world.

If there’s any feedback, please send us feedback in that URL below. Thank you.

Standardizing Standards 2: XML Workflow Choices

Webinar Transcript

BRUCE ROSENBLUM: In this session, we’re going to be talking about workflow choices and when and where in your workflow you can introduce XML.

My name is Bruce Rosenblum. I’m the CEO of Inera Incorporated and Co-Chair of the NISO STS Working Group.

Transformative technologies such as the iPhone, the Kindle and the iPad have changed the way that we interact with content.

Instead of just working with paper, or a PDF on a large screen, we’re now reading our content on many many different devices and all kinds of different screen sizes.

These transformative technologies require new product features to take advantage of the capabilities of the technology. Responsive design so that, for example, we can read text efficiently on a small screen such as a smartphone; automatic reflowable text; richly hyperlinked content; content that’s dynamically updated; and we can add in now accessibility for the visually impaired.

Reading a static PDF is just no longer good enough when we have all these new technological capabilities. Users expect a much more dynamic experience from our information.

Let me show you a brief example of what I mean by this. This is the ISO Online Browsing Platform, where ISO now hosts of all of their standards, and they’ve created a completely new user experience.

If you go to iso.org/obp, which is a freely-available website, you can try this out for yourself. In this example, we’re searching for standards about pasta. And we very quickly after the search discover that there arestandards that have the word ‘pasta’ in them, including one that has the word ‘pasta’ in the title, and there are thirteen additional standards that mention the word ‘pasta’.

If we click through on that first standard you can see that we now have a table of contents for the standard, and we can see the foreword—all of this freely available.

Within that table of contents, though, you’ll see that some items are black—the scope, normative references, and terms and definitions—but everything after that is in grey.

This is actually by design, because what ISO is doing is giving you more information for free than they could previously do—the information that will help you make an informed buying decision. But the heart of the standard, they’re keeping until you actually pay them for it. So, when we get to the end of Section 3, everything else isn’t available until we get to the bibliography.

This is a wonderful freemium model, and a perfect example of the kind of thing that you can do with high-quality XML, because with that XML you can choose to expose more information without exposing all of it. Imagine trying to do this with a PDF.

Within the standard itself, because we’re in an HTML view, we can have richly hyperlinked content, so each one of these red circles is actually a clickable link, both within the standard for things like 6.11 or Annex A, and also external. So, we can link from this standard, ISO 7304, to other standards such as ISO 24333 with just a simple click of the mouse.

So, you can see this has now opened a new tab in their search interface so that you can see this other ISO standard. This is an incredibly effective use of XML. You can’t get this in a workflow where you’re only generating PDFs.

So, the foundation of all of this—not just the online browsing platform, but premium content distribution, responsive design, automatic reflowable text, rich hyperlinks, dynamic updating, and accessibility—the foundation for all of this is XML.

How do you get to XML? That requires thoughtful choices.

XML doesn’t just happen. It’s not something that you can wave your magic wand and the next day you’ll have it. What XML does require is re-engineering your publication workflow, new software tools to go with that re-engineered workflow, and additional production training. So, to get to XML it requires deliberate and thoughtful choices.

But the good news is that there’s already a lot of history of how to do this well, and how to do it efficiently, so that you don’t have to invent a whole lot of new wheels in order to make this happen for your organization.

If you want to add XML into your workflow, there are four key steps where you can add it.

The first is at the authoring or drafting stage of a document.

The second is after the authoring and drafting is done, but before you do any copyediting.

The third is after you’ve done your editing of the material, but before you do your page layout.

And the last is post-publication, meaning you take your final PDF file and convert it to XML.

Each of these points has pros and cons. There is no perfect solution, but there are solutions that have more advantages than others.

So, let’s start with the original XML dream. In this model, committees would create XML documents natively, editors would edit those documents in XML, and then you’d have XML that you could use for what’s called “single-source publication”.

You could very easily take that same XML file, you could make a print or PDF version, an HTML version, you could make your e-books, you could do metadata feeds, and you could create derivative products.

All this, of course, requires you to work in an XML environment.

But the reality is that authors don’t work in an XML environment. Yes, some of you may say, “Well isn’t DOCX from Microsoft an XML format for Microsoft Word?” And the answer is, “Yes, under the hood it’s XML, but it’s XML about preserving the format and layout of the document, not about preserving the semantics and the structure of the document.”

And so most people are using Microsoft Word in a method that only is focused on format and not on the structure, and that’s not good enough for the kind of XML that can drive something like the ISO Online Browsing Platform.

Most people today do their authoring in Microsoft Word. A few still use WordPerfect—there are those lingering souls who are having trouble giving it up. We do know some standards that are created with FrameMaker, and finally some people are actually starting to move to Google Docs, because it’s an environment where they can easily work in a collaborative fashion.

But whichever of these tools you use, what you’re going to run into is an ‘author reality’. Which is, first of all, most authors don’t ‘think’ structure. They think about the information in which they are the world’s experts.

Furthermore, most authors don’t like production tasks. They just want to write up the text and maybe add a little formatting, but then getting it published? That’s not my problem.

What these authors are, are brilliant subject matter experts, often the smartest people in the world in their particular subject matter or their specialty. But that also makes them hard to train and support, and even harder to control. Because it turns out the more brilliant an author, it turns out the less sophisticated they often are about how to use Microsoft Word, or actually, the better way to say it is the more creative they are about how to use Microsoft Word.

Let me give you a quick example. We saw an author one day who knew that he needed to put a minus sign in front of negative numbers, and knew that the hyphen wasn’t the same as a minus sign, but he couldn’t figure out how to insert a minus sign. So, he finally took an underscore character on his keyboard and made it superscript, and that was his minus sign.

Did it look right? Absolutely! Was it the correct thing from a semantic perspective? Absolutely not.

So, these kinds of creative authors will, in fact, get in your way when it comes to trying to put them into an XML environment.

Some organizations such as ISO have actually tried to help the authors or the committees by creating Word macros. They’re really helpful, so long as they’re used properly. But they’re actually really hard to write, in part because smart authors will always try to outsmart idiot-proof macros.

So, what happens is you have an arms race where you try and make the macros more and more sophisticated, to make them easier and easier to use. And the more sophisticated you make them, the more complex they are, and we find that authors just find more and more ways to break them.

In addition, you’ve got to support multiple versions of Word—not just Word for Windows in five or six different versions, but also the Macintosh version of Microsoft Word.

And finally, macros—as active code—are getting harder and harder to install on users’ systems because of IT security requirements.

Some people have taken a different approach by trying to sidestep Microsoft Word altogether, and looked at online tools for authoring directly in XML. Is this the wave of the future? We’ve certainly seen some progress in tools that provide an HTML-like experience that have XML under the hood.

But what we’ve also seen in those attempts is that authors can just as easily break those environments in terms of the structure they’re trying to create, as though they were using Microsoft Word.

Also, these tools require continual online access—getting them to work equally well online and offline is a real challenge. And the math editors, for those standards that have any kind of display math in them, the math editors are still somewhat immature.

But finally, we come back to the same problem we had with Word, is that they may give you the words and some formatting, but structurally is it correct? Well the same thing can happen in online authoring. They might give you XML, but it may not be structurally correct, no matter how much guidance you give them.

So, at least for now, we think that online XML authoring will continue to have some fairly large challenges.

Let me move to the back end, and look at the concept of post-publication workflows for XML. In this, you actually keep pretty much the same workflow you have today.

The committee submits a draft Word document, it’s edited in Word, it’s typeset—most standards organizations are doing their typesetting in Word; a few might use InDesign or FrameMaker because you can get a better-looking result. But a lot of organizations have stayed in Word because ultimately, once the standard is published, they have to give a Word file back to the committee to work on the next revision, with the exact same text as the published version.

If you keep everything in Word and just make a PDF from Word, that becomes easy to give it back to the committee, though it doesn’t look as good because Word’s not the kind of page layout program that InDesign or FrameMaker is.

You then—whichever model you use—you proof and typeset your corrections, you publish the print version if you’re still doing print, and the PDF, and then you can make XML from that PDF file or sometimes from the Word file itself.

If you used InDesign or FrameMaker, at the end you have to convert the final typeset file back to Word, which is a fairly daunting prospect with both FrameMaker and InDesign—neither of them have good export capabilities back to Word. And finally, you return the Word file to the committee.

What works in this workflow? Well the biggest advantage is you have no workflow changes. But what are the disadvantages? There are many.

The biggest and first one is that the quality of the XML is unchecked. What do I mean by that? You may say, “I’m checking my XML quality.”

Well, if you are making the XML from your PDF, if you don’t sit there and check character-for-character—every single character in that XML—against the PDF, then your XML is unchecked, meaning you have no guarantee that it absolutely matches the PDF. That can be a huge liability.

Second, this workflow adds extra production time and cost, because it’s something you’re bolting on after the fact.

Third, errors can be discovered in creating the XML. For example, if you have a cross-reference to Section 5.9, but only at the point that you’re creating the XML do you discover that you only have Sections 5.1 through 5.8, you now have a content error and it may be too late to fix it.

XML production is actually great at catching those kinds of errors that might have been overlooked, even by the best of copy editors.

Ultimately, it’s not an integrated workflow, it’s almost essential to outsource, and what you’re left with is the choice of poor-quality typesetting from Microsoft Word, or doing some sort of a post-composition conversion back to Microsoft Word.

We tend to think of this workflow as putting a dinosaur behind the wheel of a car. Because what you’re not doing is looking at the best of what modern technology can do to solve a new technological problem.

The other point we need to make is that in this kind of a workflow it’s almost essential to outsource the XML conversion to a vendor. But vendors, like employees, need to be managed. If you don’t tell them exactly what you want them to do, and then check their work, the result may differ from your expectations.

We’ve seen many many cases over the years where firms outsourced an XML conversion to a vendor, and then what came back was never actually further reviewed by the publisher that outsourced the work.

And then they eventually look at the work four or five years later or, in some cases, they were doing it to futureproof themselves, to protect themselves for the future, and they discover when they finally open it up that all of the work is unusable, because they didn’t provide enough guidance upfront and they didn’t do any quality checking.

And we’ve unfortunately seen that story happen multiple times through the years.

So how do you manage these vendors for a successful project if you want to use this workflow?

First, you need to develop XML markup standards. You can’t just assume that the vendor knows what’s best for your content.

Second, you need to test several vendors, compare the results to tag the same standards—so give them all the same document and look at the results that come back—and select your final vendor based on quality, not on the cost.

You then need to develop quality assurance tools and provide those to your vendors, but you also need to constantly recheck the quality of what the vendors are doing for you.

Finally, for those of you who have a back-content conversion for which you’d like to use an outside vendor, I strongly recommend reading this paper, ‘Beware of the laughing horse’, which is available at this URL.

It was presented last spring at the annual JATS-Con meeting, and it’s an excellent story about how to make a successful offshore conversion project.

The third of these four workflows is an XML-first workflow. In this, the committee drafts the standard in Microsoft Word, they convert the standard immediately on entry to the editorial workflow to XML, and then they do all of the editing XML, typeset from that XML, and keep everything in XML to the very end.

And only once the standard has been published, you convert that XML back to Microsoft Word, and give it back to the committee for the next revision.

What are the advantages of this workflow? Well the biggest is that the file is continually validated to what’s called the DTD or Document Type Definition, in this case, the STS standard.

What are the disadvantages? First, it requires XML editing software for all editors. Training for that can be expensive, as all your editors now have to learn to work in XML. Freelance editors may not be practical, because you have to provide them the same XML software and, in some countries, providing software to a freelance copy editor can actually jeopardize their freelance status.

Editors have to work amidst the XML tags or, in some cases, you can customize the editing environment to minimize how much the editors are seeing of the tags, but that customization can be very expensive.

And finally, you have to do an extra conversion at the end to convert the XML back to Microsoft Word.

The fourth and last of these workflows is what we call the XML-middle workflow. In this workflow, the committee submits the standard as a Word document, it’s cleaned up in Microsoft Word and paragraph styling is applied if the committee hasn’t used a template that you may have provided to them. Then you edit in Microsoft Word, and you only convert the Word document to XML just before you’re going to do typesetting.

Here’s the beautiful part—you now typeset from the XML. So instead of creating a PDF from Word, you create the PDF from that XML. Then you proof that PDF that was created from the XML.

If the PDF is right, you know that the XML was right because the PDF has been created automatically from that XML.

If you need to make corrections you do them in Word, you regenerate new XML, and finally once you’ve proofed the PDF and blessed it, you can create your EPUB and your other derivative formats.

But ultimately, because you have a workflow that goes from Word to XML to PDF, and you know that the PDF is right because it was made from XML which is made from Word, you have a Word file you can give back to the committee and you know it has exactly the right content. So, you don’t have to worry about converting anything back to Microsoft Word at the end.

What are the advantages and disadvantages of this workflow?

In this workflow, you keep the editors working in Microsoft Word, which is an environment that virtually every editor knows and is comfortable with. You lower your training costs because you’re not introducing a lot of XML technology to most of the editors, and freelance editors are practical because they can still work on the document in Microsoft Word.

Structure is enforced prior to the final pages—what I mean by that is you’re creating XML before the content is final, so you’ll catch something like that error where you’re trying to cross-reference Section 5.9 and it doesn’t exist.

And ultimately the final content is ready in Word format for the next update by the committee.

What’s the disadvantage of this workflow? It typically requires some sort of an application in-house to create the XML.

So those are your four workflow choices. Authoring in XML, post-publication conversion, conversion to XML as soon as the document arrives in the editorial process, or conversion just in time for typeset—the XML-middle workflow.

As I said, all of these have their advantages and disadvantages, and hopefully the pros and cons of each are quite clear now.

I want to talk for a few minutes before closing about XML quality.

XML doesn’t come for free. As you’ve seen with all of these workflows, you have to have some sort of investment—whether it’s in software, whether it’s in an outsourced vendor to do your conversion. So, the XML isn’t free, but the quality doesn’t come for free either.

With a PDF-only output for your workflow, it’s simple. You create the PDF, you proof it, and publish it. XML is a little more complicated, because you create the XML, you proof it, and then you publish from it; or in this case, what you do is publish the PDF from the XML and you proof that.

The XML-first and XML-middle workflows definitely facilitate XML quality, and any workflow where you have the PDF created from the XML is a much more robust workflow.

This really is a huge liability with the post-publication conversion to XML, in that you just have no guarantee that that XML matches the final published content in the PDF.

But you may want to up the game a little bit more, and take a look at what we call XML quality plus. Because the content that’s in between the XML tags is important, but what’s often even more important is the metadata about the document, which may not be visible in the document but may be part of the XML for that document.

So, doing checks on this kind of information requires more quality checks. There are a variety of methods, and I won’t discuss them in detail, but the two most common are false color proofing where you create a sort of HTML view that puts different kinds of text in different colors so it stands out to make sure that the tagging is correct in each of those cases.

Or another alternative is a technology called a schematron, which is something you can do to write scripts to proof your XML.

Ultimately though, you should run your quality tools on every single XML file, and if you do use an outside vendor for any of your XML work, you need to make sure that you provide your quality tools to those vendors, that you require your vendors to run those tools, but then, finally, you rerun the tools when they submit the XML to you, using the old Russian motto: “Trust but verify.”

You can assume they’re going to do a good job, but it’s best to verify they actually have.

So, at this point, if you’re wondering where to go next, we have a few recommendations.

If you’re going to move ahead with an XML project, the first is to evaluate and set business goals. Make sure you understand, from a business perspective, why you want to bring XML into your organization, and what the business value is that you’re going to derive from it.

We certainly believe that there are many business values and you can build a strong business case for bringing XML in, but each organization is different, and so we recommend that you build your own business case.

Once you’ve built that business case, then you can start driving technology decisions, but never never let technology decisions drive your business requirements. It should always come from your business requirements.

Second, learn more about XML. There are a number of places you can learn about XML, and more about the STS standard for markup of standards. Mulberry Technologies in Rockville, Maryland, has all kinds of really great courses where you can learn more about XML, and they were part of the authoring group for the STS standard so they know STS inside out. If you’re going to get outside help to learn about XML, there’s absolutely no better place to do it.

Also, there are two key sites to learn more about STS itself. First is at the NISO site, where there’s a workroom for the STS project, and you can go there and see information about the standard, including all of the committee minutes.

But, more importantly, the NISO STS documentation is at niso-sts.org, and that’s where—if you’re implementing an STS project—you can learn about every last bit of detail that you need to know on this standard, including common markup practices and best practices.

Third, talk to XML-savvy standards publishers. Lots of our publishers have now been down this road, and they can give you a lot of insight into what their experiences were, what their challenges were, and they can help you out. Most of the ones that we know are more than happy to share information with you so you don’t learn the same lessons they did the hard way.

Finally, if you don’t have an XML expert in-house, we recommend that you either hire one or hire a consultant who can help guide you through the process. There’s a lot of history already on how to do this well. Don’t try to reinvent the wheel without getting outside advice.

So, to get an XML project started once you’ve gone through those steps, select what XML workflow that you’re going to use, and do this based on your business goals. Develop and document what your XML markup standards will be, and again, for this you’ll either need an in-house XML expert, or you’ll want to talk to an outside consultant.

Build some XML quality assurance tools that will work for your environment, and you may be able to use some tools that are starting to show up, including some that are available for free from ISO.

Start a pilot project. Don’t just step back and say, “We’re going to do the whole thing and, on Tuesday, we’re going to switch to doing everything in XML.” Start with a small pilot project, evaluate the results, fine-tune, re-evaluate and fine-tune.

We typically recommend that you allow at least a few months to do a fine-tuning project or a series of fine-tuning steps on a pilot project to make sure that you’ve got the XML exactly right for what your organization requirements are.

And finally, when you do have everything fine-tuned, then you can start your XML workflow. But you should always review that workflow once or twice a year so that you can continue to refine and improve, especially as you want to introduce new products based on that XML.

So, standards publishers! The good news is that we now can have standard XML. And what this will do is bring new production efficiencies, it’ll help you bring new products to your customers, it will bring you new business opportunities, and the solutions for all of this are improving daily.

So now is the time for you to move to XML.

If you have any questions please feel free to contact us at Inera Incorporated, and our information is on the slide. Thank you very much for listening.

Standardizing Standards 3: Getting Your Content from Word to STS XML

Transcript

BRUCE ROSENBLUM: In this session we’re going to talk about getting your content from Microsoft Word to NISO STS XML.

My name is Bruce Rosenblum, I’m the CEO of Inera Incorporated and Co-Chair of the NISO STS Working Group.

Inera Incorporated has been around since 1992. Our focus is to develop systems for editorial and XML to automate the process of getting from Microsoft Word to XML.

We have a combination of expertise in software, technology, publishing, and workflow, and we apply that to help publishers develop single-source workflows to publish their content, usually using XML as a key part of that single-source workflow.

Ultimately what we’re trying to do is help our customers gain efficiencies through the publishing process, and leverage industry standards such as JATS, BITS, STS, and MathML.

eXtyles is a suite of editorial and XML tools for Microsoft Word. It facilitates not just converting from Word to XML, but a lot of the editorial preparation that you have to do in your document whether or not you’re creating XML.

eXtyles helps to clean up and mould Word documents into a standard visual and editorial style, helps to automatically edit references to external content, integrate metadata into the file, check and link normative references and their designations to other databases, correct content in the document through looking up information in databases, validate URLs, check internal cross-references, and finally, after all that’s been done as an aid to helping the editor, all of that work also helps to convert the document from Word to valid XML according to NISO STS and other schemas.

And of course, all of this can be completely customized for any publisher requirements. So if you have unique requirements, eXtyles is an open framework that can be customized to meet those needs.

We have customers in 26 different countries on six different continents. This is a small sampling; you can see that we work with quite a number of standards organizations, including organizations like ISO and IEEE.

We also work with very high-profile publishers of content in other realms such as journal publishing, where we work with many of the top journals around the world, and also governments such as the US Government Printing Office.

Fundamentally we have three key design principles behind eXtyles.

The single most important one is that the content authors are subject matter experts, and that’s all. They aren’t editorial experts, they aren’t production experts, they aren’t going to do your production and publication work for you.

They know Microsoft Word, they’re used to working in Word—actually they’re often used to working badly in Word, so often the documents you get in from them are not exactly the best-constructed documents in Microsoft Word—but they’re not going to do the editorial and production work for you.

So, you’re going to have to do that yourself and, most importantly, they’re not going to give you XML.

But interestingly, the next one of our key design principles is that most editors actually prefer working in Microsoft Word over specialized XML tools. Again, just like the authors, they know Word, they’re used to it, and they can focus on the content rather than focusing on the underlying technology that you’ll be using in the publication process.

And our final key design principle is that what we’re trying to do is automate any repetitive process in the editorial and production stage, but to do so safely. Anything that risks making an incorrect change in the content is well beyond the scope of what we’re trying to do.

But what we are trying to do is help the editors get through a lot of the technical editing of the content so they can focus on making sure that the content is readable and correct. And ultimately, that’s what’s most important to the consumers of your content.

What our proven approach has shown is that it increases productivity, improves quality of the published output, and lowers overall costs. Many of our customers have seen their time to publication drop by more than half once they’ve implemented an XML workflow with eXtyles.

So now let’s move onto the fun part, which is a demonstration of eXtyles.

We have a Microsoft Word document that was published by CEN a couple of years ago. CEN is the European Union standards organization.

This one is about cereals and, like most standards documents, it has a foreword, scope, normative references, terms and definitions. Then we get into the body of the standard, and then in the back it has a bibliography, it has four annexes, A through D, so it’s a very typical standards document.

When eXtyles is installed in the environment, you have an extra ribbon within Microsoft Word that gives you the eXtyles options. And initially all of them are greyed out, because the main thing we want you to focus on first is the Activate and Normalization step. This is the first step in preparing the document.

And for CEN what we’ve actually done is automated much of their metadata collection. So what I can do is actually fill in, in this dialogue, the work item number for this document, and then I can click the Get Metadata button and what this is actually doing is looking up this metadata on a server located in Brussels at CEN and automatically adding it to the document.

So I click OK, and this information is now being stored with the document, and in fact there’s additional information that we didn’t even show onscreen. But this avoids rekeying of information—this avoids errors that can come with rekeying information—and provides a much easier mechanism for updating much of the front matter metadata in the document.

Once this is done, we can then move on to the next step of preparing the document, which is document cleanup.

Oh, and you’ll notice even before I move on, that all of a sudden the font has changed in this document because CEN changed styles a few years ago from using Arial for their documents to using Cambria, and automatically eXtyles has loaded and applied the new Cambria-based template.

So this is just a small example of how eXtyles can automate much of the formatting of your documents.

The next step is a Cleanup step where we have a large variety of options, including white space cleanup, we can control whether or not we’re removing white space from sections of the document that might have computer code—clearly we wouldn’t want to do that.

And we can remove some of Word’s typographic controls because InDesign has much better typographic controls. We can do a whole bunch of other cleanup operations that help prepare this document.

I’m just going to click OK with this group of settings, and this will take just a moment to run on the document. In addition to all of that, there’s one item that’s not a checkbox item, and that’s Recovery of Special Characters.

One of the biggest problems we’ve seen publishers have over the years, particularly if they move content from Word to InDesign, is loss of special characters because of oddball font issues.

eXtyles works to make sure that every character is automatically recognized and reinserted into the document with a Unicode value, although visually it appears as a special character, so you’ll see where we have any accented letters such as these e-acutes, they are visually correct.

But more importantly they are structurally correct so that you’ll always have the correct special character when you go to XML or when you go to InDesign, or to any other environment.

Having done that Cleanup step, the next step is the single most critical one for preparing the document for XML.

If we change to Draft view in Microsoft Word, you can see that we have some styles here, but some paragraphs in fact aren’t styled correctly. And if we were to start with the foreword we could, if we wanted to, go to Microsoft’s Home ribbon and use Microsoft style controls to actually set up styles for this document.

So, we can go looking through here for a Foreword Title, and there’s my Foreword Title, but that’s actually a challenging interface to use.

And so in fact what we do instead is we provide the end users with a palette organized by logical sections of the document or logical clustering of styles such as title styles, ten point body styles and so on. And each organization can have their own organization styles.

Each time you click a button it applies a style and automatically highlights the next paragraph. And so this palette makes it much much easier to accurately and quickly apply styles.

So here we have a Heading 1, here we have Body Text, another Body Text paragraph, four List Continue paragraphs, Body Text again, another Heading 1, more Body Text. So you can see on the Body Text styles have gone actually from Normal to Body Text. This paragraph is actually a normative reference, and so we have a special style for that.

And what we’re doing by adding these styles is providing all of the critical structural information that we ultimately need to create the XML.

Here we have terms and definitions, within this we have notes, and by the way there are hotkeys for all of these characters, so by pressing N here I also get a Note style applied.

I’ve actually pre-styled the rest of this document but you can see that this is a very easy and intuitive process, and in fact many of your organizations may have already been doing this kind of styling to prepare a document in Microsoft Word to make PDF anyhow.

So this is actually a natural extension of it, but it’s better because the styles are organized in a more logical fashion, number one. And number two, you may have occasionally through the years seen a case where you had a bit of italic or bold in the middle of a paragraph and applying a style with Microsoft Word tools actually obliterated that font change.

That’s actually a bug in Microsoft Word; Microsoft refused to fix that bug. But using the eXtyles palette you will never experience that bug because we’ve managed to program around it.

So this provides you a much better model for templating a document.

As I said, the rest of the document I’ve pre-styled so you don’t need to watch me do the entire document, but you can imagine that most documents would actually go through this styling process fairly quickly.

Now, once we have those styles we actually have the foundation to create XML, but there’s much much more that eXtyles can do before we create the XML.

The first thing, as we move from left to right across our menu or our ribbon, is we have a feature called Auto-Redact. And the idea behind Auto-Redact is that it’s a large Find and Replace with thousands of rules pre-programmed to your editorial style.

It will go through and not replace the copy editor, because a copyediting task is a human task which can’t be replaced by a computer even with artificial intelligence, but it will do a lot of technical cleanup on the document.

I won’t dive into all of these rules, but I’ll just click OK and let this run and describe, for example, that we have rules that can convert American English spellings to British English if that’s your style. Or the other way around, if you’re working with an international working group but you use standardized spellings.

We have rules that can clean up callouts to figures and tables. We have rules that can clean up units of measure so, for example, if your standard is to always abbreviate ‘Hour’ as ‘H’ we can automatically go through and do that.

But here’s the cool thing. We can do it in a context-sensitive fashion, so we would only do it if there’s a number preceding the unit of measure. So the expression ‘the experiment took 3 hours’ or ‘the test will take 3 hours’ can be converted to ‘the test will take 3H’. But a sentence that has ‘the test should take several hours’ will not be converted because that would be a nonsensical change.

So eXtyles can be very very context-sensitive in these rules. It can also be sensitive to specific paragraph styles when applying these rules. So it’s almost like having regular expressions that you can apply to the document.

Now for those of you who worry about what’s going on with automated changes, we actually make a backup copy of the document immediately before we run Auto-Redact, and then we can use Word’s Compare feature and we can see exactly what’s changed in the document.

So here, for example, you can see a non-breaking space has been added in the middle of the designation. And in fact you’ll see that throughout this document, that wherever we have a designation, a non-breaking space has been added just to make sure that you don’t get a bad break across lines.

We’ve actually added non-breaking spaces in lots of places. Per European style or Continental style, non-breaking spaces have been added between numbers and percent signs wherever those appear.

Here’s more of an editorial type of change. Fig abbreviated has been changed to Figure spelled out.

This is an interesting one that CEN asked us to make. Any time they have words like ‘recommend’, ‘must’, ‘shall’, ‘may’, those are all highlighted in yellow so that the editor’s attention is drawn to those expressions and make sure that they’re used correctly.

That’s absolutely critical with standards, that it’s clear what’s a recommendation versus what’s a requirement, and by doing this automatically throughout the document we guarantee that the copy editor will actually see all of those cases.

We have lots more cases of non-breaking spaces being added; that seems a be a lot of the work that’s been done here. We have spaces added around equals signs, so we’re cleaning up the content around mathematical operators.

And that looks like it’s about it, so not a huge number of changes, but you can see a lot of bulk cleanup being done. And if you can imagine with all of these non-breaking spaces having been put in, it automatically saves time during the typesetting phase of the operation because you don’t have to go in and manually start putting in non-breaking spaces everywhere you need them in order to avoid having bad line breaks.

Once we’ve reviewed the results from Auto-Redact, we can actually turn our attention to some other parts of the document, for example, the normative references and the bibliography.

What we have in eXtyles is actually an open hook for advanced processes, and you can do all kinds of really cool things with the editorial content.

The first one is Advanced Processing for Bibliographic and Normative References, and what this is going to do is go through and identify the normative references and bibliographic entries primarily by the paragraph style that was applied.

And then it’s going to both do some content cleanup—although in this case this reference was already formatted correctly—and also indicate what type the reference is, and add these character styles which are later used for creating XML that can be used for linking through to external sources.

So you can see in the bibliography as well that we’ve not only taken care of marking up and cleaning up these references to other standards, but also we’ve handled this reference to a book completely correctly, even recognizing the organizational author in it.

Once we’ve gone through and dealt with the normative references and the bibliography, the next thing that we can do we can check all of the in-text citations.

So, for example, FprEN, that’s an in-text reference to another standard, and we actually will go through and mark up all of those so that we know where we need to potentially have external links when this content might be posted online.

In addition we do some cross-checks of all of these items, and we’ll warn using Word comments if there are problems.

The first cross-check we do is we check to make sure that every item in the normative reference list is, in fact, cited at least once from within the body of the document. This standard was, because we didn’t receive a warning.

However, we did receive a different kind of warning. Let me just quickly close the Word Style palette so it’s easier to read the warning. And what we have here is that inline citation checking or matching detected that we have a reference to an object within a standard, but it’s an undated reference.

This is the kind of thing an editor might overlook but it’s actually quite important. If you’re referring to an object within a standard it should always be a dated reference so, for example, ISO 5223:2008 or whatever the year is. Because what Table 4 is may change over time if a new table is inserted or a table is deleted, and you want to make sure you’re referring to the right Table 4.

So these are the kinds of warnings that eXtyles can give during the editorial process that actually help make sure that the content is as accurate as possible.

We can actually go a step further with all of this, and for all of the ISO and CEN standards we can actually now take these references to them and check them against databases located at ISO and CEN to make sure that, in fact, these are valid references and current references to these standards.

So we’re actually going through right now and taking every reference to an external standard in this document, and doing a web service query against an ISO server in Geneva and a CEN server in Brussels to make sure that, in fact, these are correct and valid references to standards.

And it’s done, it says it’s checked thirteen standards, and it’s found problems with five. Let’s see, we have on this first one an invalid reference. If you look at the date this is 2013, and the Fpr means that this is a final proof, not the final standard, so in fact this should be modified, probably to be EN 16378. And the same right here.

So immediately we’ve been able to detect some problems with this and, of course, I could double check against the database; I do happen to know that that is final. And then we can delete the comment very quickly and easily once we’ve resolved the problem.

But again we have other kinds of problems. This is a reference to a withdrawn standard. In this case there’s a newer version, the 2011 version, and it’s warning us ‘gee, maybe the newer version should be cited’. So this is a case where as an editor you might actually want to go back to the working group and make sure that they specifically meant to cite the older version rather than the newer version.

So again, this is a great way that you can work with your working group with these automated tools to make sure that you’ve got the most current version of the citation available.

Another example here is that the reference came back with a different title. The year was missing in the original title, which we preserve in the comment, and the 2009 was added here, so this is a way in which the published document that’s in front of us is being made more accurate before it’s published.

And again a warning on this last one that we saw earlier in-text, that this is a reference to an out-of-date standard.

So this is where adding this kind of automation can really help with the process, to save editorial time and make sure your content’s as accurate as possible.

We do have a feature for going through the document and validating that all of the URLs are pointing to current and valid websites, that you’re not going to get any 404 errors. I’m not going to bother running it on this document because we don’t have any URLs.

The last thing I’m going to do before I create the XML, is a Citation Matching check, where we check all of the internal cross-references. So, for example, if we have Tables 1, 2, 3, and 4, is each table cited at least once and does any citation to a table resolve correctly?

So if we have four tables, but there’s a citation for Table 5, that will give us a warning that there’s a problem that we have a citation for Table 5 but in fact we haven’t found the right matching point for it.

This cross-checks not only figures and tables but also bibliographic references, and sections and equations. And it has found a few problems so we’ll go looping through our comments again here.

And we find a warning: ‘No section matches the in-text citation 5.9. Please supply the missing section or delete the citation.’

So if we go backwards here we can see we’re in Section 7, but in fact Section 5 only had sections up through 5.8.

So we very quickly caught a problem again that, in this case, you would have to go back to the committee and say, “Gee, which section should this be pointing to, or do we need to revise the text?”

And you can see all of the other section citations have been correctly marked up, and this markup will in fact be used when creating the XML.

Let’s see what other warnings we have here. Annex B hasn’t been cited—that may be OK but again you might want to check with the working group. Figure B.1 hasn’t been cited.

By the way, you’ll notice that in CEN’s workflow we have just names to the external figures here, we don’t in fact have the figures in the Word document, because in an XML workflow those figures ultimately have to be separate files so that they can be called in as images when you go to make PDF, when you go to the web, and so on.

But we also do support workflows where the images are embedded in Microsoft Word.

Again we have a warning that Figure D.1 hasn’t been cited, and D.2 hasn’t been cited, and D.3 and D.4. Again, these may be benign warnings, but better to have the warnings and be able to cross-check than to not have these automated warnings at all.

So having done all of the processes on the Advanced Processing menu, we’ve actually now helped the copy editor quite a bit, and at this point the document could be sent out for further copy editing, or at this point we’ve done enough that we can actually make XML.

So in order to make XML we come over to the Export menu and we choose if we’re making XML specific to CEN, in their case because they have some specific metadata requirements we’ve customized for.

And this’ll take just a few moments to convert the Word document over to XML using not just the paragraph styles that we added earlier, but also the character styles that have been added through the automatic processing.

And what’s really cool with all this, is that we haven’t had to go in and turn our copy editors into XML taggers. With the exception of applying paragraph styles, which you may already be doing in either Microsoft Word or InDesign to format your document today, we haven’t done anything more than that but we’ve added a lot of extra granularity.

And we have a message, congratulations, our file is valid according to the Document Type Definition or DTD—that means the document is valid according to the XML rules that you’ve laid out for it.

And what we have here in the XML is, first of all, all of the metadata. So we have the title in multiple languages, you can see that accented letters are represented as numeric Unicode entities. We could also do UTF-8, we can do ISO entities, so we have a lot of flexibility in how we set all of this up.

We have a lot more of that metadata that came, if you remember, from the beginning of this session from the Document Information dialog where we loaded this metadata from the server located in Brussels.

All of this metadata came in through that, including various meta dates that are included in this documentation, the permission statement, the entire title page. This actually was never even visible, but when we did that web services call, this title page information also came in.

And now we finally have information from the Word document itself. We have the foreword which, if I go back to the Word document, you’ll see the foreword here. And where we have the reference to the external standard, we’ve actually marked that up as a standard reference.

That’s really cool because then CEN can take this and make it a hyperlink when creating HTML or any other organization can.

And then we get into the body of the document, and we have sections. And this is where things also start to be really neat, because if you remember in the Word document, we don’t have anything special here in terms of the number 1 in this heading, or the bullets at the beginning of the list.

But yet, in the XML, we’re able to automatically make this a label and separate it from the title, which gives tremendous flexibility when doing formatting for online or for PDF, the fact that these are in separate elements.

We preserve the italic markup from Word. Again with the lists, we’ve automatically separated out the bullets of the label from the rest of the paragraph, and you can see our non-breaking spaces in here.

And then where we have a cross-reference to Annex D,—we’ve marked that up with an internal cross-reference, which means that when making PDF or making HTML, these can all be hyperlinks.

So what that means is now, instead of doing a lot of manual work to get internal cross-references as hyperlinks in your PDF, whether you’re setting up in Word or InDesign, all of this will happen automatically when you use Typefi to make your PDF.

Here’s our normative reference list, we have a special attribute on this, and if you were to look at the ISO Online Browsing Platform we’d see how they take advantage of these attributes.

We have the terms and definitions section marked up in TBX markup which is highly granular, highly structured, so that again you can do all kinds of flexible things in terms of pulling out the definitions, perhaps creating databases of them.

We also will cherry pick a few other things. We have the tables in the document. We preserve the column width information as well as the table width information, which can help with automatic layout.

More importantly, we preserve things such as column spans and row spans, so the entire structure of the table can be faithfully re-rendered online or in PDF, so no manual intervention is necessary.

You’ll also notice this scope attribute sitting in the table. This is for Section 508 accessibility. So by going to XML and using eXtyles to create that XML, you can start making it much easier to meet any accessibility requirements, whether you’re just concerned with, for example, in the United States, Section 508 accessibility requirements, or perhaps the Marrakesh Treaty which does require accessible content in many countries around the world.

So we have tables, we have figures in the document where again we parsed out the label at the beginning of the figure from the rest of the title. We’ve actually kept any author queries such as ‘gee, this figure hasn’t been cited’ as processing instructions. Those could actually be put onto a PDF proof if you wanted to.

And finally we have MathML, where we have converted all of the equation objects into MathML, and that can faithfully be rendered as well.

So we have very rich XML that nobody has to then go and edit by hand. So what that means is you can keep your team working very effectively in Microsoft Word without having much, if any, knowledge of XML.

They can very easily produce this kind of high quality XML, very granular highly-structured XML, according to the NISO STS DTD, and that can then be flowed with Typefi into InDesign to make PDFs automatically, it can be flowed onto a web page, it can be used to make EPUB, and it truly becomes the core of a single-source workflow.

So I hope in these last few minutes, I’ve shown you how easy it can be to take the Word content that comes from your committees or your working groups, and bring it through a process that gets you not only to NISO STS XML, but in addition can actually help your editorial team by making things faster and smoother through the whole process.

And ultimately get you to the point that you can produce wonderful documents in multiple formats, and actually do it faster, more easily, and more accurately, and at lower cost, than you’ve been creating just a PDF.

Thank you for listening, and we hope you’ll join us for the rest of the webinars in this series.

Standardizing Standards 4: Publishing Standards Using STS

Transcript

ERIC DAMITZ: Hello, and welcome to “Standardizing the Standards: Publishing Standards Using STS.”

My name’s Eric Damitz, I’m a Senior Solutions Consultant with Typefi. I’ve been with Typefi since 2014. Before that I worked in educational publishing for about 20 years, so my area of expertise is on the production end of both print and digital outputs for various publication types.

In the last webinar, you saw how to get your content into STS. So in this one we’re going to show you how to take that STS and produce different deliverable formats that you can give to your customers. And we’re going to do that using Typefi.

So what Typefi does, is we can take various types of input files. We can take XML that’s created using eXtyles, we can take STS XML created in whatever authoring platform, or we can also take Microsoft Word, or a variety of other authored documents.

We then convert that input format to what we call Content XML, which is an intermediary XML format that our system uses to produce all the different kinds of outputs that you might want.

We can take that CXML and send it to InDesign, and using InDesign we can produce InDesign files that can be published to EPUB or PDF or HTML.

We can also go directly from that Content XML to any other kind of XML you might need. We can go directly to EPUB, or directly to HTML. And we can also publish to the DAISY format.

So, using one source file, you can get as many different outputs as you would require.

What we’ll do now is look at what that source XML looks like. This is an STS XML file—it was created using eXtyles so the authors worked in Word and they used eXtyles to do all their editorial work, and then when they were finished with the file they exported it and we get this nice XML file.

We have all of our tags, we have all our metadata, and we have all of our content in here. We have links, and images, and math, and all kinds of stuff that you would expect to find in a file like this.

I can then take this XML file and publish it using Typefi.

So now we’ll move over to Typefi, and before I do any of this I’ll just let you know that I’m running this demo on a laptop computer, so I’m not using some super-computer or anything, this is just my laptop.

I’m using Typefi Desktop which is one of the ways that you can use Typefi; we also have server-based products that would run much more quickly than what you’re about to see on my laptop here.

This is the Typefi web interface, and I have this project set up. I have XSLT transforms, I have some InDesign templates, and then the content and images are also on the server here. And then I have this workflows folder.

I have two workflows set up, and these are completely customisable—you can set up whatever kinds of workflows you like. And a workflow’s job is to take a source file and turn it into whatever output file you want.

This PDF workflow is set up to take that source XML file we just looked at, and produce an InDesign PDF from it. If I check that box there and hit Run Workflow it’ll ask me which source file I should use, so I’ll just click my little folder here, and I can either browse my local file system to grab that, or if I happen to want to store it on the server I can grab it from here as well.

So here’s the file that we were looking at, and I’ll hit Run.

What will happen is that source XML file is converted to Typefi’s Content XML, and it’s sent to InDesign. An InDesign template will be opened and then all of the content of that source file will be placed in InDesign, and it’ll be styled according to how the template was set up.

Some other things that are going on as the pagination happens—the math is being brought in as MathType images, those are going to be placed correctly so that they sit on the baseline of the text. They’ll be centred on that baseline if they’re taller than a line, and they won’t crash into the paragraphs above or below.

We’re also running Typefitter, which is a plug-in that we sell. Typefitter can do typographical tweaks to the content, and they’re based on situations that you would like to fix as you’re running the pages. This one is set to get rid of widows or short lines at the end of paragraphs. And when we open this file after it’s done running I can show you where it’s actually done that.

You can see it’s just putting some math in there now, and then there’s a figure. You can bring in whatever kinds of images that InDesign can handle, can run through Typefi.

We’re basically using InDesign, and kind of controlling it and laying out the pages automatically.

Something else that’s happening here—if a table spans a number of pages we have a script running that adds the table continuation text at the top, so if you have long tables that go page after page, all of those can be automatically laid out.

So that is our InDesign file. It’s now going to open that InDesign file again to produce a PDF, and once that PDF is created we can take a look at it.

The reason we’re going with InDesign here, is it’s essentially the most used page layout software out there. Pretty much everything you’re going to buy at a newsstand or a bookstore is going to be created using InDesign.

There’s a huge user community, there are a lot of people, a lot of designers, that know how to use InDesign, there’s plenty of training available. InDesign is, like I said, the most widely used software for page layout, so that’s why we use that.

Here’s our InDesign file, and I just wanted to show you right here, this orange highlight, this is non-printing so you won’t see this in the output, but it’s an indicator that Typefitter has run this paragraph up. There was just a small amount of text on the next line here, so it ran that back so it’s in the spec of this design. Same thing with this one.

If we look at the PDF that was created, I’ll go to my job folder. This is again the Typefi server; after the job is done you can click on this job folder and all the files that were created during the production of this job are here, so you get everything, we’re not hiding anything.

Here’s the source XML file, here are those intermediary XML formats I was talking about before, here’s the InDesign file, and then here’s our final PDF. So I’ll open that.

And we’ll take a tour through this PDF, but before we do that I’m going to run this EPUB workflow in the background, so it’ll be finished by the time we’re done looking at the PDF.

Same idea, I’m going to take that same source file, but now I’m running it in a workflow that will produce an EPUB instead of a PDF.

So our PDF has quite a few little features in it. This is actually an accessible PDF, and you can see we have bookmarks, so these are all automatically populated in here. These are, of course, all clickable, they’ll take you to whatever page that these different objects are on.

If we go back to the table of contents, these are also clickable links in the table of contents, and then all of the URLs in here are clickable, all the cross-references to other sections of the document are clickable.

This one goes to the references, another cross-reference to a document item. So everything has been automatically added here to make this a very usable and interactive PDF.

We’ll zoom in on the math a little bit here. You can see that it’s been set correctly so that it’s centred on this baseline, rather than smacking into all of the equations above and below it. All of that spacing has been managed through the system to make sure that the output looks good.

And then we can also see, as I mentioned while the job was running, we had these table continuation notices. So this table goes to the second page here. This text here and the table title is repeated automatically, so you don’t have to go in and manually add all of that sort of thing.

This one has some custom table rules, and we can also support custom shading of cells so if you have specific cells that you need shaded in a table, all of that can be done based on the source file coming in.

I can show you some of the accessible features as well. This is the tagged text, so all of this information is tagged properly for accessibility purposes, so each of these stories is actually tagged right.

And then all of the document metadata has been set—so you can see, for example, the initial view has been set, and then all of the different metadata has been set up.

If I run the accessibility checker here we get a report back, and it passes accessibility. There’s a few issues that you’ll have to look at, some of these are manual things that you would have to check manually anyway.

So just a pretty automatic way to set up a file and get that accessible PDF and interactive PDF out.

Our EPUB is done now. Again, here’s my output folder, and I have all my files, and here’s my EPUB. I’m going to open this one on my Mac, so if I go to this folder here, I had this job copy the file onto my laptop.

If I grab that and open it up on my e-book reader, you can see we have a nice-looking EPUB from that same source XML that we made the PDF with. These, of course, are all clickable as well.

And then some of the benefits of an EPUB. Of course, you can do a search on this, you can change the type size if you want to. You can highlight. And then you can add bookmarks as well. So, of course that’s going to depend on which e-reader you’re using, all of those features, but the EPUB itself, that format allows you to do those sorts of things.

And you can see we have the nice-looking math, and here’s our figure, and then here’s those tables in the end. They don’t have any rules because that’s the design of this particular EPUB.

So those are two of the formats that we can pretty easily create using STS XML as the source, and then using Typefi to create those outputs.

I’ll show you another one really quickly. This is a different standard that we ran through a different job. This is a DAISY file, so what Typefi will do is output all of these different components of that DAISY file.

Again, we’re starting with that same XML, or an XML file, and running it through a DAISY export to create all of these different files. And then when I open this up in the DAISY reader…

SCREEN READER: Sidebar has focus. Section. Design for access and mobility part 4.2. Means to assist the orientation of people with vision impairment. Wayfinding signs. Paused.

ERIC DAMITZ: So you can see this tool that we’re looking at here is actually sort of a DAISY QA tool. The actual users of this have devices that they can use, sort of like play and pause buttons, to hear the reading of the book.

All the data from this content is brought through, set up in this format, and all of this is now readable. It’s in the correct reading order, and then if there are images in here, it will actually read the alt text for this image.

You can see I’m hovering over it here; you can see all of that alt text in there. That would be read to the person who’s using this file.

So again, it’s another output that can be created from that source XML.

Now we’ll take a quick look at how all that worked. The first thing we need to create a file like that is a template. To create the PDF file we have an InDesign template, and I’ll flip over to that one now.

Here’s the standards template that we used. Of course there’s no pages in here yet, this is a changelog here. We use the same template to create every standard, in this case it’s the standards that ISO produces.

And some of the features of this template, you’ll notice we have all of these different kinds of covers. All of the different kinds of documents that are produced, are all produced through this one template.

So you don’t need hundreds and hundreds of templates if you have different document types; as long as the base body text and the design is similar between them all, you can use one template to produce all these different things.

Another feature—we have all these different layers for different languages. And we have a JavaScript that turns on and off these layers depending on what language you’re producing. Again, you have one template that can have all these different possible outputs running through it.

And then, of course, because it’s a design-based tool, we have all of our styles—so all these different paragraph styles, this controls the look of the text. We also have table styles, so if you have different kinds of tables with different borders or different shading or different specs around them, all of those can be added here.

Of course, none of this is Typefi-specific, these are all just things that InDesign does anyway. So if you’re already in an InDesign workflow, you could take the template you already have and then add Typefi markup to make it an automated template.

And some of that Typefi markup—we have Typefi Designer, which is our InDesign plug-in, and that lets you set up things like Typefi sections, and you’ll see all these different cover sections, and then we have our main content section and so on.

This is how we determine which master pages to use, and what the pagination should be. Should it start on a right-hand page? How does the page numbering look? All of that information is stored here.

We also have Typefi elements, and an element is a smaller-than-the-whole-page type thing. If we look at this template, we’ll see that there’s some element pages in here.

We have all of these cover note elements, so different types of covers have these different notes that can be placed on them. They’re all placed in the template here, and as a particular job is run, if a specific one of these is needed it’ll be grabbed from this page and put onto the actual page that is published through the job.

We have a figure element. This is just an empty frame here, and depending on the size of the image that goes into that as the job is running, this frame will kind of shrink to fit that image and be placed correctly. And if the image is larger than this frame, the image can actually be resized so that it doesn’t stick off the side of the page. So there’s a lot of control over how the design looks using these elements.

This one actually is also set up for redline. If you run a redline job through this, this green box will be sized and placed above any additional content, and this one will be placed on top of content that’s been removed. I don’t have a demo for redlining, but that is definitely something that this system can do.

Again, if there’s a new figure it’ll have a green line around it, and if there’s one that’s been deleted, it’ll have the x through it. These are just brought up and placed on the actual layout pages as the job is running.

You use the Typefi plug-in as well to add all of these elements to the template. And again, these are all customisable, you can name them whatever you need to, and you can have as many or as few as you need, it’s completely up to you.

So that’s our InDesign template, and once that’s created and everything’s working, you can run any job through it and get the output that you’re looking for through that template.

I’ll go back to the Typefi system here and show you what this workflow contains.

This is our PDF workflow and we’re bringing in our STS file here—it’s asking me which one, that’s why that screen popped up and said which file do you want—and we’re creating a CXML output, and we’re using this transform.

So the transform will take a certain kind of XML and turn it into a different kind of XML. There’s a basic STS transform and then you can tweak it as needed, depending on how you implement STS.

Each one of these little boxes is called an action, and I have a representation up here in the toolbar. When I add one to here, I can reorder them, and I can add whichever ones I need. This is adding some tags, so these were those blue boxes in the job folder.

That’s the EPUB, I’ll go back to the PDF one.

Then we have another step where we’re applying conditions, so the XML has certain sections that should be omitted in certain cases, so if you want to omit those you’d turn that on and run the job, and they’ll be left out of the file.

After that step, we’re creating the InDesign document, and this is the actual software that runs InDesign. Here’s the template that we just looked at, so we’re telling it which template to use, and we’re saying what version of InDesign.

And then in this tab we have what we call event scripts, so these are additional little programs that change the pagination based on your needs. So, at each of these steps in the process of paging the document, at the start of a document, at the start of each section, at the start of a spread or a page, or at the end of all those, you can kind of pause what the Typefi layout engine is doing and inject a little additional programming to make that layout exactly what you want it to be.

And then once that InDesign file’s created we send it to the Export to PDF action and that creates our web-ready PDF based on the Adobe preset, and of course these are also customisable—you can have whatever ones you need.

And then we have a final step where we update the metadata, and this is to make it fully accessible, so here we’re adding the metadata that accessible PDFs require, and we can also set the initial view when you open the PDF, right here.

So that’s a pretty simple workflow. The EPUB one is very similar. Again, we’re taking that source file, add a few tags, do our omit condition.

In this case we’re doing another transform, because we need to change a few things, because an EPUB is very different from a PDF, of course. The code behind it is different and the usage of it is different, it’s reflowable. So we’re changing a few things in the CXML.

In this case we are creating an InDesign document again using a different template, so this is optimized for EPUB output. And then we have an Export to EPUB, so we’re kind of using InDesign’s EPUB Export tool, filling in all the metadata that you would use in InDesign, and we have a lot of different things to set up here, a lot of different options you can do.

So you set up all of this, it’ll generate the EPUB, and then the last step is the Cleanup EPUB. InDesign makes notoriously questionable EPUB code, and this helps clean that up to make the files smaller, make them a little nicer as far as the way that they’re coded goes.

And then my last step here was just to copy that file onto my other computer.

So that’s another fairly simple workflow. I could’ve combined these into one workflow—so maybe I have a final workflow where I produce both of these at the same time, that’s totally doable.

In fact, we can have fairly complicated workflows if they’re needed. Here’s an example of one. You can see this one has quite a few steps in it.

This is a specific action that this particular customer wanted to be done, so we made them a custom action—only they get to use this one. Their input file is actually a ZIP file that contains XML and images and all the supporting documents that they need to produce their output.

This specific action here takes a ZIP file as its input, and then it gives you an XML file as its output.

The next step then is similar to what we saw before, where we’re turning that into CXML using this transform. Adding a tag. In this case we’re editing some metadata, so you can manage metadata in the XML file, and if you need to override things during a job you can do that here, or you can just allow the values to pass through from the XML, and that’s how this one is set up.

Then we have conditions here again that omit, that we saw before. We’re creating InDesign. This is a specific kind of InDesign document—this is content with the DOIs included—and then we’re exporting to PDF, and this is the one with the DOIs, it’s accessible.

Then we’re going to update the metadata, again, to make it accessible.

So this file that we’ve created here, this PDF, is going to have live DOIs, and it’s going to be accessible, and it’s going to be optimized for viewing on the web, and then again have those live DOI links.

We’re going to take that same source file that we imported at the beginning. Now we’re going to create a version that doesn’t have the DOIs. We’re going to go through that metadata step again, apply conditions again.

We’ll create a new InDesign document, this doesn’t have DOI links in it so the pagination might change slightly—we just don’t want those in there.

So, new InDesign file, another export to PDF, another update metadata, so now we have our web-optimized accessible PDF file that doesn’t have those links in it.

And then we’re going to run a script on that no-DOI InDesign file that we created, and that’s also going to be exported to PDF, and this is the one that goes to press.

And then we’re going to run another script on that same file, produce another PDF, and this is the version of record of that particular PDF.

So we’re creating four PDF files from this one source input. The only thing the user has to do is tell it which job to run, which input file to use. It’ll run through all of these steps automatically, and then at the end they have another custom action that zips all of those up and puts them on a specific server via FTP.

So you can have a very automated workflow where you have lots of different outputs all running at the same time. They can be brought in from a server, you can do it manually—however you want to get them into the system.

And then once all those outputs are created, you can put them somewhere you want to, you can download them individually; however you want to work that, we can set up a workflow to manage that.

So that is the extent of our demo.

Some of the benefits of a system like this are we can take those STS source files and use them to create all of the different outputs you want. Because we have set up the templates beforehand and all of these transforms, once that XML is available you can much more quickly get those publications out.

And you can get them out simultaneously. It’s no more creating the print version, then sending it to someone, waiting awhile, and getting in the electronic version, whether it’s an accessible PDF or an EPUB or a DAISY file. It’s all done at the same time.

You can produce those multiple outputs, again including EPUB, HTML, redline versions, DAISY, XML, PDF, whatever you need, all at the same time from the same source file.

And you can continue to use the same tools that you’re already using. So Microsoft Word, either alone, or Microsoft Word with eXtyles to create XML. Whatever authoring tool you’re using, we can take those files and create the outputs for you.

And it does significantly simplify the workflow. Instead of having a multitude of vendors or those documents changing lots of hands, going through different departments, you can set it up so that you have one input file, one set of images, run them through, and then you get all of your outputs from the same place.

Thanks very much for watching. You can see our web address down there for more information on other webinars in this series, and have a great day! Thank you.

Standardizing Standards 5: Working with Standard Adoptions

Transcript

GABRIEL POWELL: Hello and welcome to “Standardizing the Standards: Working with adoptions.”

My name is Gabriel Powell and I’m the Senior Solutions Consultant for Typefi Systems. I’ve been with the company since 2010, and it’s my privilege today to introduce you to what it looks like to adopt standards.

We also have Bruce with us.

BRUCE ROSENBLUM: Hi Gabriel, I’m Bruce Rosenblum, thank you for asking me join this session. I’m the CEO of Inera Incorporated. I’m also Co-Chair of the NISO STS Working Group.

Inera is located just outside of Boston, and I’ve been with Inera for 20 years, and have been working with standards for a good part of that time.

GABRIEL: In this webinar, we’re going to discuss how national standards bodies can use an XML workflow to simplify the process of adopting and publishing standards that are produced by ISO, and European standards.

That would include the European Committee for Standardization, CEN, as well as the European Committee for Electrotechnical Standardization, also known as CENELEC.

I’m going to begin with an overview of what an adoption is, and then I’ll demonstrate how Typefi and Inera eXtyles are used to facilitate the adoption and publishing process for standards.

So what is an adoption, exactly? Likely, if you’re watching this, you know what an adoption is, but I’ll begin with a brief overview for those who are unfamiliar with the process.

The International Organization for Standardization, known as ISO, develops and publishes international standards. In fact, ISO has already published over 20,000 international standards. These standards are established within a broad range of industries and business sectors.

A standard is developed through a consensus process, and experts from all over the world work together to develop the standards that are required by their sector. These are really subject matter experts.

Once a standard is developed by ISO, an ISO member country can sell and adopt that standard nationally.

In addition, ISO standards are also adopted and sold as European standards. And a European standard is a document that has been ratified by one of the three European standardization organizations—that would be CEN, CENELEC, and ETSI.

Let me give you a brief overview of what an ISO adoption looks like.

Here we’re looking at, on the left side, the cover page of an international standard. This happens to be ISO 20160. And then on the right we have an adoption from Denmark. They adopted that ISO file, and let me take you into the core of this so you can see what that adoption looks like in the final published PDF.

This is the Danish Standards adoption of that ISO document. You can see they put their own cover page on this, along with their own designation, which becomes a DS/ISO standard.

If I move through, they’ll have an inside cover page with their own metadata specific for their needs, and then you’ll see the international standard cover shown on the right, and this is the same cover page as the ISO file itself. In fact, it’s the same copyright from ISO, as well as the same table of contents, and body throughout.

You’ll notice that Denmark places their designation at the top in the running header, so that you can see this is a DS/ISO adoption of this ISO standard. But the rest of this document is the same, from every page all the way to the back. Sometimes some national bodies will actually even append their own back page.

So that’s an idea of what an ISO adoption looks like. And if you were going to adopt a European adoption, it would be very much the same, where here we have the CEN cover page and then the actual adoption.

Let me just open up that document.

So here you can see that the Danish Standards cover is here with the designation DS/EN. And moving through, we have the same title page but this time we have a CEN cover with all of the CEN metadata and information. And then the table of contents, and throughout, we have all the content.

So it’s very similar to an ISO adoption, except for the body is from CEN.

I’d like to introduce you to the various input formats that can be used for creating an adoption.

We have three possible methods. One is to use ISO STS XML, and that has been in use for quite a few years now. And many of the standards, since around 2011, are being produced in this format, ISO STS. And then ISO also has gone through and created a back catalogue of XML files, so that would allow you to publish adopted standards for quite a few years back.

Then we have a new upcoming format that Bruce will introduce you to at the end of this webinar called NISO STS XML.

Then, alternatively, some standards just are not available in an XML format, and that would include some CEN standards, and maybe if you’re adopting an IEC international standard. If there is a back catalogue file that you want to publish and you want to use the PDF source instead of the ISO XML source, that’s also a possibility.

Let’s take a look at the pros and cons of the XML input format vs the PDF source input format.

If you’re using either ISO STS or NISO STS as your input format, the workflow looks as follows. On the left you will see that you have several STS XML files. You will have one or more depending on what you are adopting, and I will show you here in a moment an actual XML package.

But the benefit of using XML as your input is that you can generate multiple output formats. You can generate EPUB, PDF, HTML, DAISY, and more.

If you use PDF source as a format, you get reliability. You accurately represent that adoption or that standard. However, because you’re using a PDF input file, it is not possible to generate any other output format other than PDF, so keep that in mind.

Let me just move over and I’d like to show you what the input packages look like, comparing an XML input package to a PDF body input package.

So, we’ll begin with an ISO adoption. So inside of an ISO adoption package, this is the input package, notice that we have the original ISO XML file, and this in the ISO STS format.

It has a front section with the ISO metadata, as well as the front matter, foreword, and introduction, the body, starting from the scope all the way to the end of the document just before the annex sections if they exist, and then finally the bibliography.

This is all the original ISO file, and then to create the adoption, all we need to do is create an extra XML input file for the national adoption itself.

In this case we only have a few parts. We have the national metadata section, carrying all that metadata for the national components which includes the metadata for the cover page and the title page.

And notice the body section is empty, because it doesn’t actually have anything. The body is going to come from the ISO input file.

So that is a typical package for an ISO adoption.

Interestingly, it’s also possible to publish a multilingual adoption, so let me show you how that looks.

In a multilingual adoption you would have the same files. We have the same ISO input file with no changes, and then in addition to that we have the Danish standard or the national component, and in this case the body is not empty.

In this case the body has the translated text—so that international standard has been translated into the Danish language and it appears within the XML file from that national body, and that would include every section.

So that’s how we would create a multilingual adoption.

And just to show you as well what that would look like, when we publish the multilingual edition we get two options. One, we have a web-optimized view, and we have a print-optimized view.

I’ll start by opening up the web edition. It starts off like any other standard—we have the national body cover page as well as the front matter—and what makes this web-optimized is that there is a language code to the right of every heading in this document.

If I click on that heading, that language code, it actually takes me to the corresponding language, so that’s a really handy way to navigate between the languages in this document, and you’ll see that all the Heading Level 1 and 2s have a language code, so it’s very easy to compare and contrast and move between languages. That makes this web-optimized.

Furthermore, we have a print-optimized version. And for print, you don’t want to have to read one language before you get to the next language, so it’s really much more convenient to view the languages side by side, and that’s exactly what this print version offers.

You’ll see that the English language is on the left, and the Danish in this case is on the right. So it’s a really handy way when viewing this in print to read this document and compare and contrast along the way.

So you’ve seen an introduction to what an XML input package looks like when using ISO STS. Let me show you what an input package looks like when you have a PDF source.

So here is the final PDF output and we’ll take a look at that. I would like you to notice that this PDF, although originates from a PDF source, when you go through Typefi you do not lose the bookmarks. So if bookmarks are in that PDF source file, they’re actually preserved. If the TOC entries are clickable, they remain clickable. So this is a still a web-optimized and highly interactive document, as long as the input PDF file, that source PDF, contains those elements.

So that was the final output; let’s take a look at the input package.

In the input package, you’ll notice that instead of referencing another XML file that would contain the ISO or CEN standard, in this case we have PDF source files, and we have one control file which is what the national body will create.

In this case, Danish Standards has created this. It contains a national metadata section, again used for creating the cover page and their title page.

And then in the body section, instead of actually having XML content to create the body of the document, we have what is called a PDF rendition section, which references the name of the source PDF. And this is referenced in the order that the source PDF file should appear in the output.

It’s really that simple. You just need the control file with a national metadata section and one or more PDF rendition sections, and when you run that through Typefi it will generate the final result, again complete with clickable bookmarks, TOC entries, and hyperlinks.

I’d like to hand it over to Bruce, who will introduce us to how eXtyles is used to facilitate adoptions.

BRUCE: Thank you very much, Gabriel. eXtyles facilitates preparing these adoptions as well. If you’ve already watched the session about getting to XML in this series, then you know a bit about how eXtyles works. And eXtyles can work with adoptions in the same way that you work with preparing a full standard.

What you do, simply, when you’re working with an adoption, is you prepare the front and the back matter of the adoption in Word, and then you convert that to XML. Sometimes that may require having Inera add a few extra paragraph styles to your set in your palette, but otherwise the process works in a similar manner to preparing a full standard.

There are, as Gabriel pointed out, two ways of having the information prepared. The first is to have just your local information in XML, but the standard that you’re adopting in PDF. In this case you would use eXtyles to prepare a Word file with local front matter and your local annexes, and you would use the paragraph styles that have been set up for national standards bodies.

There’s also allowance for various custom metadata elements as necessary, to meet your local requirements for an adoption.

Then you also insert a reference to the PDF file into the Word file, and then you create XML using your standard export processes in eXtyles.

Typefi, as Gabriel showed, can then take that XML and combine it with the ISO PDF, or if you’re a national body adopting a European standard, a CEN PDF, and that all becomes a single PDF in the final result.

You can also do what we call a whole document adoption, where you take the final ISO Word file, you can add your local front matter and annexes to that Word file, and then you can create a single XML file that has your adopted material plus the standard embedded within it, and then Typefi can create a single PDF file from that XML file.

And now I’ll hand it back to Gabriel and he’ll give a demonstration of how all of this works.

GABRIEL: Thanks Bruce. Now I’d like to demonstrate how we can use Typefi to generate an ISO adoption. And this time I’m going to show off an adoption from Sweden, from Swedish Standards.

Here we’re looking at an input package. This is actually just a ZIP package, I’ve opened it up. In this case we have two input XML files. Again, one is the original ISO XML file, which was produced from eXtyles, and then we have the national XML file and it contains the national metadata section used to produce the cover, and then we have another section which is used on the copyright page.

The body is empty because that content comes from ISO. Then we also have all of the graphics, and that would include the math equations and any figures, and those would be in the package as well.

So to run this through Typefi, I would just need to locate a workflow in the Typefi server, and here you can see a workflow is indicated with this space rocket icon. We’ll produce a web PDF.

This is the actual workflow. I’ll give you a very brief introduction to the components. You have seen this in a previous webinar in much more detail.

It begins by ingesting that input file, so in this particular case there are two parts. One, this component, the SS multi component, will actually unpack the ZIP package and copy all the resources, the images and the XML files, and it will merge those two XML files into one, and then we actually convert those to another form of XML called Content XML using this SS component.

In the end, that input file is passed through a series of actions to create an Adobe InDesign document, and then finally export it to PDF, and finally to update the metadata so that all the metadata is added to the PDF and some other settings are applied to it to make it accessible.

I’ll go ahead and run that workflow. And here I’m just going to choose an input file. And I’ll go ahead and pick up this package, and go ahead and run it.

Now you’ve just watched me manually run a job. It is possible to connect another system on the front end, such as a MarkLogic system, which can actually send the input file via an API connection to Typefi so that you don’t manually run jobs. You can automate the running of the jobs as well.

In this case the InDesign template opens up and you can see all the metadata from that national metadata section showing up on the page to create the cover. And then some additional metadata. Now we’re building the table of contents, the front matter, and now we’re working into the body of the document.

Keep in mind that all the content we see being placed on the page at this moment is coming from the ISO XML input file.

And here you can see the Typefi engine working to apply formatting to all of this XML content, and it’s working out the size of the table, and if tables continue across to another page, even the continuation notice appears automatically.

And at this moment cross-references throughout the documents are being resolved. So wherever you see internal references to tables or figures, those are cross-references, and those are being resolved at the moment. And once they’re resolved, the rest of this document will be very quickly paginated.

Here you can see the rest of the document being laid out. There are various tables and graphics being placed. There are a lot of rules going on here, in fact, there are several keep options that keep figures together so that they don’t straddle a page.

And finally the bibliography and a couple of notes pages are added, and then followed by some static back matter and the back page itself.

So that’s what it looks like to use Typefi to work with an input package, run it through Adobe InDesign, and generate that final PDF which is being created as we speak.

When the job is complete, and you see the green check mark, you can just click on that job and collect all the output. And here you see is the final web PDF that was produced from this workflow.

So now I’d like to hand it over to Bruce, who’s going to introduce you to the new upcoming NISO STS model for adoptions.

BRUCE: Thank you very much, Gabriel. As you mentioned, there is a new model for adoptions in the new NISO STS standard that just came out in October of 2017.

In this model we can have two different top level elements. Previously in ISO STS, we could only have standard as a top level element, and now we can have either standard or adoption as a top level element.

Of course, the ISO STS model is fully backwards compatible with ISO STS, but the new model gives you more flexibility.

Before we look at the new model though, let’s look at the existing ISO STS model. It’s a relatively flat model. You can have a front matter section with one or more metadata blocks and special front matter sections, you can have one body section, and then you can have multiple back matter sections; for example, ISO annexes followed by your local adopted annexes.

But as a flat model it’s limited in terms of flexibility.

With the new NISO STS adoption model, it’s a hierarchical model. So you can start with an adoption at the top level, you can nest adoptions within adoptions, and then you can have a standard inside of that.

At the very inside portion of this, you can see we have a standard element and that could be an ISO standard. The inner adoption element would be a CEN adoption that has standard document metadata for that standard as well as the CEN annexes just after the standard element.

And then at the outermost level, you could have the national adoption; for example, a Danish or Swedish adoption. And that adoption would have its own standard doc meta element, and its own back matter as well.

Now one of the interesting things about this model is that the back matter for the national adoption could appear just after the standard doc meta rather than after the adopted information.

This gives you tremendous flexibility in terms of representing, in your XML, the reading order, because some national bodies put all of their adopted material up front, and some national bodies have only their metadata up front and their annexes at the back after the adopted material.

Take another look at how this works. This is actually straight from the NISO STS documentation, where you have an adoption with adoption front matter.

And then, in this case, rather than incorporating all of the XML in, you can have a standard xref element that says this is an adoption, that now pulls in by reference an ISO standard.

This is really really cool, and leads to part of the flexibility that we have with this new model for adoptions.

Not only can we include, as shown in the third bullet point, by standard xref, but we can also include standards with XInclude, for those of you who like using that XML model, or you can have the entire standard or, of course, as Gabriel showed before, you can just incorporate a PDF and you could do that using the standard xref model.

So this model that allows multiple inclusion formats is incredibly flexible, because it allows you to have what the working group calls “reading order” XML; in other words, the order in which you prepare your XML file reflects the order in which the content will be rendered into PDF or HTML or whatever reading format you’re going to render it to.

It’s a recursive model, and it allows multiple level adoptions, much much more flexibly than the ISO STS model. And this means that you can have three or even, if necessary, four levels of an adoption as you go through the adoption process.

There’s lots more information about this model. We’ve included two URLs here if you want to learn more. The first one points to the adoption documentation at the niso-sts.org website, and there’s full documentation including some limited examples of how to do an adoption.

The other is STS4i, which is a group that’s promoting best practices for use of both ISO and NISO STS, has a repository on GitHub, and they actually have a sample of a DIN/EN/ISO adoption that shows all of the layering and shows exactly how you can take advantage of this tagging when you’re doing an entire standard as an adoption.

We hope you’ve enjoyed learning about all of this, and I’m going to hand it back to Gabriel now.

GABRIEL: All right, thanks everybody for watching. If you’d like to watch other webinars in this same series, take a look at the URL below, typefi.com/standardizing-standards, and you’ll see a full listing of previous webinars that you might’ve already missed, and upcoming webinars that you might be interested in.

Standardizing Standards 6: Increasing the Value of Your STS Adoption

Transcript

GUY VAN DER KOLK: Hello, and welcome to “Standardizing the Standards: Increasing adoption value.”

My name is Guy van der Kolk and I’m a Senior Solutions Consultant with Typefi. I have been with Typefi since about 2013. Before that I worked as a graphical production specialist and a trainer at various companies.

Since I started at Typefi I’ve been able to travel the world and assist customers in implementing multilingual, multi-format, accessible workflows. I’m currently living in The Hague in the Netherlands.

This is the sixth webinar in our series on implementing a NISO STS workflow.

So far you have seen a brief history of using XML, and specifically STS, for your standards publishing. We’ve told you at which point it is best to introduce XML into your workflow, how to convert Word files to STS XML, and how to use that XML to generate PDF, EPUB or HTML.

In the last webinar we talked about adoptions in an XML workflow, and now we are going to discuss some of the other formats you can produce from ISO STS.

In this webinar I will start by demonstrating some variations on PDF and showing you what else is possible.

After that we will change focus and talk about accessibility. We’ll discuss why you would want to make your content accessible, and what accessible formats you could produce using STS XML.

In previous webinars we’ve been talking a lot about producing a PDF output from ISO STS. But it is important to address the fact that different kinds of PDFs can be produced depending on your requirements.

As an example, I would like to talk about the workflow at the British Standards Institute. BSI starts by producing one or more proof PDFs with a watermark during the initial proofing stages.

Once the proof has been given the final stamp of approval, they use Typefi’s modular actions combined into a final workflow that produces four kinds of PDFs: a press-ready PDF with the appropriate settings for printing; a Version of Record (or VOR) PDF for archival purposes; an accessible web PDF that does not have Document Object Identifier (or DOI) links; and an accessible web PDF with DOI links.

Let me show you how that looks.

So here we have the VOR PDF open, and I would first like to demonstrate some basic features of a PDF produced by Typefi using ISO STS XML as an input.

For one, you will observe the bookmarks panel along the left side that offers an easy way of navigating. We also, a few pages down, have a regular table of contents with clickable hyperlinks, and if I click on Section 6 Examples, you will notice that we also have clickable cross-references to other sections, to other clauses, to figures, and even to bibliographic references.

These features are shared among all PDFs except the print version which doesn’t need clickable links.

If I go to page one in this VOR and I compare it to the link PDF or the web PDF that I also have open, you will notice that there are differences in the way that the covers are produced.

The first one doesn’t have a cover because it’s a Version of Record, while the other two PDFs do have nice colorful covers. The only difference between the web and link versions of the PDF is that one has clickable hyperlinks to DOI information, and the other one does not.

In this case you will observe that the text BS ISO is not clickable, so this does not have a DOI link, whereas in the web version you will observe that the hyperlink is clickable and it takes you to the DOI website.

All of this is produced from the same input file, and all exactly to the specifications that BSI provided, with no additional effort.

Now that I’m here in Acrobat, let me also talk about preview PDFs. Preview PDFs allow you to provide your customers with a sneak peek at the content without actually providing them the full document.

Here we have an example of a full standard that has all of the information; you can see that from the bookmarks, and you can see if I scroll a couple of pages down to the table of contents, this is the full PDF with all of the information in there.

But we’ve also produced a preview PDF, and if I click on the preview PDF you will observe that on the left hand side there’s a lot less bookmarks, and if we go to the table of contents a few pages down, you will observe that all of the entries in the table of contents that are black are all clickable and allow you to go to that specific page, but all of the other entries in the table of contents are greyed out.

This is a great way to allow someone to see what the standard contains before they make a purchasing decision.

Producing a PDF adoption is one thing, but dealing with change is something completely different. Change is a given and there are several tools that help you track your changes, but almost all of them rely on humans actually enabling tracked changes when they are working. And this is not exactly foolproof!

So what if there was a way to compare two XML files for content and semantics, and provide you with an overview of the difference in a very visual way? With ISO Redline PDFs, there is.

So how does it work? We start out with two XML files that we want to compare. Those are run through a DeltaXML action and combined in a resultant XML file with all the differences marked up. The resultant XML file is then run through Typefi to produce a redline PDF.

Let’s take a look at an example of a redline PDF.

Here I have the PDF open onscreen, and if I go to the next page you’ll observe that there is a legend on the inside cover that explains some of the things that you’re going to see in this PDF.

We can see here in the table of contents that there are significant differences between the input XML files; almost every section has a modification that is indicated by the yellow highlights.

As we scroll through the pages we can clearly see the added and the deleted text marked up. This makes it very visible and almost impossible to miss any changes.

Even though this sample document doesn’t have any figures, but it does have a couple of examples of tables that are treated in the same way as the figures. If I open up my bookmarks panel and I go down to Annex A, I know that there’s an example here of some tables.

Here onscreen we can see that the table has a big red cross through it that visually indicates that this table has been deleted from the content. And with the big red cross, rather than having every single line in the table deleted, this makes it very very clear that the entire table has been removed instead of just some text in the table being modified.

If I go a few pages down, here you will see an example of the table that has been inserted that has a green border, but it also has some text that has been added, and we can also see that the date above the table has been changed.

This is a very visual and highly useful method of showing changes between XML files.

Let’s take a moment and talk about a format you may be less familiar with: EPUB.

EPUB is a file format that is mostly used for digital books, and perhaps some of you watching this webinar have read a book on a Kindle, Nook or any other digital reading device.

The format is commonly referred to as “reflowable”, and it shares some features with PDF, like clickable hyperlinks and a navigational table of contents that is similar to bookmarks.

While a PDF is focused on the presentational aspect of your content—in other words, items have a specific position on the page and design is important—EPUB instead focuses on your content.

This format uses HTML at its core and it really shines in the text-heavy and structured world of standards content.

Let me show you an example of what a standard published as an EPUB looks like. I will switch to my iPad that I’m projecting on the screen here. Almost every digital device has an EPUB reader on board, and in my case I’m going to show you iBooks on my iPad.

Here you can see my library in iBooks, and the cover of this standard is well-represented, it’s the first one at the top left. So let me open that up for you.

In any e-book you have a navigational table of contents—in this case it’s the three lines with the periods in front of them up here at the top left—that take me directly to the navigational table of contents. And as you can see here, the PDFs that we were looking at earlier, they have the bookmarks. And the bookmarks in the PDF are the navigational table of contents in EPUB.

Here it takes me directly to the foreword, but I’m actually going to scroll through the pages a little bit, and here you can see that not only do we have a navigational table of contents that is controlled by the digital device that you’re reading your EPUB on, but we also still have a regular table of contents in the actual content itself.

If I swipe even a little bit more left, you can see here that the copyright page, which is present in the EPUB as well, ISO has chosen to add some tips that greatly improve the e-book reading experience for people who download this, because not everybody is familiar with what’s possible in e-books.

What’s important to notice is, because this is an EPUB, this specific kind of content is only visible in the EPUB. It would not be beneficial to have it in the PDF, which means that in order to make this content appear only in the EPUB and not in the PDF, we have the possibility of conditionalizing some content. So making sure that certain things only appear in certain kinds of output.

As I scroll through this EPUB on my device, you can see that, as with the PDF, we have clickable hyperlinks. Websites, there’s cross-references, or like we saw in the PDF earlier, all of those would work just as well in the EPUB.

One of the reasons why EPUB is referred to as reflowable is that the size of your PDF just shrinks with the size of the device that you’re looking at it. So, if you’re looking at a PDF on your phone, then there’s going to be a lot of zooming in and zooming out and panning around on the page.

Whereas, with an EPUB, depending on the size of the screen, the content automatically adapts and gives you the best possible size and viewing experience. So with EPUB, as I said earlier, your content is more important than the visual representation of it.

On top of that, an EPUB allows me to increase the font size, something which you cannot do in a PDF, because a PDF is more about the design.

If I click on the two letters that are at the top right of my screen, you can see here that by clicking on the A, the text in the background proportionally increases, which makes it easier for me to read this.

And, if I would want to, perhaps the kind of font that is being used is not good enough for me to read on, so I can even choose to choose a completely different kind of font to make it easier for me to read this content.

This is what makes an EPUB unique, and different to a PDF.

Let us switch focus and talk about accessibility. Accessibility enables everyone—regardless of disability or special needs—to read, hear, and interact with content.

The goal of accessibility is to create an inclusive society for people with visual, auditory, physical, or cognitive disabilities.

Impairments for the end user may be as simple as color blindness, or as complex as being blind, deaf, or other physical impairments.

I myself am color blind and although I have no problem at all at the stoplights, content that uses red text in a big sea of black, or tables that use tints of green and red to highlight cells, are very difficult for me to view.

Granted, the challenges are minimal compared to someone who is blind, but it’s still something to think about nonetheless.

In 2008, the UN Convention on the Rights of Persons with Disabilities specifically recognizes that access to information, communications, and services is a human right.

The Marrakesh VIP treaty is a treaty allowing for copyright exceptions to facilitate the creation of accessible versions of books and other copyrighted works for visually impaired people. 51 countries signed the treaty and 34 countries have ratified it as of October 2017. And the treaty itself came into effect on the 30th of September 2016.

Countries around the world are addressing digital access issues through legislation. In the UK, the Equality Act of 2010 makes it illegal to discriminate against people with disabilities. In the US, Section 508 of the Rehabilitation Act requires Federal agencies to make their electronic and information technology accessible to people with disabilities. The Australian Government ratified the UN Convention on the Rights of People with Disabilities in 2008.

We’ve said this before in some of our webinars—technology should not drive business decisions. So once you’ve chosen to produce accessible content, the actual production in an XML-based workflow can be achieved with comparably little effort and, as a direct consequence, it increases your revenue potential.

Now that we’ve talked about the right people have to accessible content, let’s switch focus to accessible formats. The most accessible formats, in no particular order, are PDF, HTML, EPUB and DAISY.

It’s important to note that PDF is by no means the most accessible of the formats that we’re talking about today. However, if you combine the content already present in the ISO source XML, such as cross-references and alternative text for images, with the production and post-processing of the PDF by Typefi, the end result is a PDF with greatly improved accessibility.

This combination produces a PDF with the proper tagging order so it can be used by read out loud software, and greatly improves the ability for Google to index your content. By extension, this means it will be easier for people to find your content if this is something that’s important to you.

Alternative text provides a means for the visually impaired to get an impression of the graphics in your documents. And cross-references and bookmarks, as we saw earlier in this webinar, greatly increase the possibilities of navigation through your PDF documents.

I have demonstrated some of these accessible features like bookmarks and rich linking with cross-references earlier in this webinar, so now I just want to show you an example of alternative text in a document.

If we look at this document that’s onscreen now—I’ve already gone to the specific page that has a graphic on it—you can see that if you can see this graphic, it makes perfect sense to us. But if you can’t see the graphic, then alternative text, which you will see as I hover my mouse over this image, allows you to at least give a word-based description of the graphic or the image that you’re looking at.

This means that when this PDF gets read out loud by reading out loud software, it will provide the message that you see here rather than a much more generic message: “This is a figure.” This thus provides a much more meaningful way of showing your content when you’re visually impaired, and to still get an impression of what the graphic represents.

Where PDF is, by its very nature, a presentational format that normally focuses on design exclusively, HTML is at its core a structural format that separates content from how it looks. It is platform-agnostic and, unlike PDF, its main focus is on your content.

What I am saying here may not rhyme with your experiences of HTML, but that is because you are used to dealing with websites. Websites have HTML at the core, which is the content, and then use CSS (otherwise known as Cascading Style Sheets) and JavaScripts for styling and functionality.

Bruce talked about ISO’s Online Browsing Platform in webinar two and I would like to reuse his example to prove my point.

I’ve previously opened my browser here, as you can see, to iso.org/obpwhich is the URL for ISO’s Online Browsing Platform.

And in Bruce’s example he used the search to find pasta. I love pasta like anybody else, but I was more interested to see if there is a standard about LEGO. So let me do a search for that.

Now I don’t doubt that LEGO actually is pretty standardized, but there is no standard about LEGO on the ISO site, to my great disappointment. But let’s click on the very first standard that we see because in the end this is about the HTML and not so much about LEGOs.

As you can see here, in this website we’ve got a visual representation, so this looks really nice and really good.

If I go up here and I temporarily disable my CSS styles, the presentation of this website changes dramatically. It no longer looks pretty, there’s no longer a hierarchical or visual representation of the content.

But regardless of that, even with no styling at all, you can still get an impression of the content. And once we actually reach the actual standard itself, and none of the visual things around it, we can see that there is a very clear and readable hierarchy to this content.

We can see the foreword, we’ve got bullets, and everything else, so even with no visual representation, the structured nature of HTML allows us to see this content, and still make sense of it.

I’m going to re-enable my styles, just because it looks better and easier, and I’d like you to notice that in this particular case, we talked about the PDF preview and about the fact that we can create a version of your content in PDF form that shows a teaser.

In this particular case, that same thing is done with the HTML format. So on the left hand side you can see the table of contents, and you can see that some of these are greyed out.

So these are not available to you, but the content that is allows you to get an impression of what is available in this standard. So again, even with the HTML, we can make sure that your content, only that content which is relevant and that you want to make available to someone to see if they want to purchase your standard, can be done using the workflow.

Now, I also want to show you something on my phone, so I’m quickly going to switch to a different device.

I’ve gone to the same iso.org Online Browsing Platform, and I’ve opened the same standard on my phone. And the reason why this is relevant is that studies show that more and more customers consume content on mobile devices first, before making a purchasing decision.

HTML is invaluable in making your content accessible and easily viewed on multiple devices. As you can see here, we’re looking at the same content but we’re looking at it on a much smaller screen than we were looking at previously in the browser. This is one of the big benefits of HTML.

We discussed EPUB as a format earlier in this webinar, where I also showed you the ability to increase the font size and choose a different font. From the perspective of a person who has a visual impairment but is not completely blind, you can imagine the benefits that that offers.

If you offer both PDF and EPUB options to a person with this kind of impairment, they are more likely to choose EPUB as a format, because it allows them to consume your content in a way that is suitable for them.

DAISY is managed by the DAISY consortium and is another HTML-based format. Like EPUB, the HTML is wrapped specifically for DAISY readers which are usually dedicated hardware devices. As a format, it is still widely used by visually impaired people for whom simply increasing the font size has no benefit. These people rely solely on having the content read out loud by their DAISY devices.

Let’s take a moment to summarize accessibility. There are solid arguments for making your content accessible, both on the legislative and on the inclusiveness side of things, and as we have demonstrated so far there are several options for making your content accessible.

You can choose one or more of these formats and determine how far you are willing to go from a business perspective. Many, though not all, ISO STS input files come with accessible features so you can benefit from those when you are producing your adoptions.

But the point I really want to drive home is that with these workflows and using STS XML as an input file, producing these other formats can be achieved with relatively little effort and as such, making your content accessible really is “the cherry on top.”

Our purpose for this webinar was to give you an impression of what’s possible, all within the premise of taking a single source ISO STS XML file and producing different kinds of outputs.

Typefi is a versatile tool and we can help you produce all of these formats, or only a selection of them. Or perhaps you have a different format or a different requirement that we haven’t mentioned here or in this webinar series. Please don’t hesitate to reach out to us so we can see how we can assist you.

My email address is here onscreen, and maybe you’ve just jumped into this webinar and you actually haven’t seen any of the previous ones, then the URL typefi.com/standardizing-standards that you can also see onscreen here, will take you to the website that contains links to all of the previous webinars.

Thank you so much for your attention, and have a great day.

Standardizing Standards 7: Multilingual Standards Publishing with XML

Transcript

ANTTI SAARI: Hello, my name’s Antti Saari, I’m from the Finnish Standards Association SFS, here to tell you about the way we do multilingual standards publishing with STS XML.

The Finnish Standards Association is a national standards body in Finland, one of three. We do most of our standardization work in ISO and CEN, and work with the other national standards bodies in Finland to also publish and sell national adoptions and translations from CENELEC and IEC.

We also do some Finnish national standards—those are not a big priority for us, but we do publish a few and they are included in the numbers that I’ll show you in our later slides.

Since the topic of this presentation is multilingual standards, I think that we should first define what we mean by a multilingual standard.

In our context, most of what we publish is adoptions of CEN and ISO standards. We publish those in English, the original English text of the standard. We are bound by CEN rules to publish everything or at least adopt everything that they publish, so that accounts for something around a thousand individual standards per year.

Now, we look through those standards and try to find the ones that are particularly interesting to Finnish industry, Finnish stakeholders, and provide a Finnish translation of those CEN and ISO standards.

The Finnish translation that we do does not have the same status as the original text of the standard. We have this disclaimer on the cover page of every translation that we publish, saying that in case there’s anything different about our translation, or if the translation doesn’t seem to match with the original, the original text is the one that applies—follow the original.

So, the customers who buy the translation, we have to provide them with the original text as well. What we call a multilingual standard consists of the Finnish translation and the English original standard in one product, or one PDF right now.

And it has to be made in a way that makes it easy for the user to read through the text in one language and then cross-check with the original text, and then get back to the translation and continue reading.

In terms of numbers, these are from last year, we published around 250 of this kind of multilingual standard. The top row says 268, but like I mentioned before, that includes our national standards as well, so that’s a handful, a dozen, maybe a bit more. There are about 250 translations for multilingual publications.

Of those, we did a bit over 200 in STS XML using a workflow including eXtyles and Typefi, and the rest were done in our old workflow that I’ll cover in the next slide.

The reason we’re not doing everything with STS XML is because sometimes the original standard, the English original coming from ISO, CEN, CENELEC, whoever, is not made in STS XML. And if it’s not made in STS XML then well, we’re not going to do it either.

Our idea is to, when we’re making these multilingual standards, is to use the English XML we get from the international organization as is, and then just add our translation to it. So if we don’t get the English XML, then we just don’t produce an XML publication from it.

The percentage of XML to non-XML is roughly similar to what CEN itself is doing. CEN manages to publish XML with over 80 percent of their publications and we’re in the same ball park.

Now, we’ve been doing this kind of multilingual standard since way before STS XML was a thing. I’m not sure about the exact year we started, but way before my time.

The way we used to do those, and the way we still do anything that we can’t get in XML, is with Adobe FrameMaker and a custom plug-in that we have for that.

I think the main reason this is interesting is that the workflow we had for FrameMaker was, and is, very similar to what we are doing now with eXtyles and Typefi.

So we had a Word file from CEN and ISO. Since we are a member we get all of their output—we get their PDF files, their Word files, and their XML files if they produce them.

We take the Word file, and have it translated so we get another Word file. Someone would go through those Word files and style everything according to our specification, using just Word’s built-in tools.

Then we’d have a plug-in that would read through the Word file, convert it to XML, apply structure, and our editors could work through it in FrameMaker.

The reason we were using FrameMaker is because it allows you to have multiple text flows, among other things. When we started doing standards like this, we were mostly thinking about people reading standards as a physical book where you have actual binding and you have to turn a page, and stuff like that.

So, the idea was, in that book we’d have always the left hand side page in Finnish, and the right hand side page would be in English. FrameMaker lets you do that pretty easily, and our plug-in made it easier still.

The only thing our editors really had to do was to make sure that the standard would proceed in both languages at the same rate. So normally Finnish takes a bit more space than the English does, and they would have to make the Finnish take up less space by using layout tricks like reducing the spacing between paragraphs or between words, or even between letters.

Now, when we moved to our current publishing process with STS XML using eXtyles and Typefi, we also wanted to rethink the end product we have since years have passed and those book standards are not so important anymore.

More people are reading standards on computer screens or even tablets, and if you’re reading a standard on an iPad you can’t really fit more than one page on the screen at a time.

So if you had that kind of book layout we had before, that would mean that every other page is in Finnish and every other page is in English, and that would be really annoying to read if you could only see one page at a time and you were trying to read through it in one language.

So, when we moved to eXtyles and Typefi we just went to rethink that.

Also, there’s the thing that making that kind of change that our editors used to make to compress the space Finnish language takes, is kind of hard to automate. At least, that’s how we felt back in 2014.

I’ve heard now that some of Typefi’s customers are doing the same kind of thing, using Typefi automation to basically make that kind of side by side layout automatically, but in 2014 it seemed difficult for us.

So, the way we’re publishing standards now with eXtyles and Typefi, we have one process that we use for everything that we publish, including our national stuff, and standards from CEN and ISO.

The way the process works—well first of all someone has to decide that ok, we want to translate the standard. So, we send the translator a Word file. We’re still dealing with Word files for the translation part.

And they just type their translation on top of the Word file, or if they have access to special software like Trados, they might import the Word file into Trados and re-export it back into the same structure.

As it happens, the translated Word file we receive will be already styled in the exact same styles that we need for eXtyles to be able to export XML from it. That is because the Word file we send to the translator has already been treated by CEN or ISO, and their editors.

And while it’s possible that the translator might make a bit of a mess with some of Word’s built-in styles—do something unexpected—I’d say 90 percent of the styling work that the eXtyles is used for in this process has already been done in the translation. We really only have to do some minor cleanup at this point.

The reason we are using Word files for translation, even though XML is available, is that the dedicated software that you’d need to do XML translations is kind of expensive. Not all of our translators have access to that software, so that’s an issue.

Also, if you know there’s a mistake in the translation, and the translation only exists in XML, then the only person who can fix it is the guy who has access to the translation software, and so they can re-export the XML.

The only alternative is really to manually fix the XML file and we are trying to avoid that—that’s not something our editors are generally comfortable doing.

So, the part where Typefi comes into this process is where we already have two XML files. The first one is the one we’ve produced in-house—it has all of the translated content of the standard in Finnish, and all of our metadata is in there.

And then there’s the second file. It’s the same XML file as ISO or CEN created and let their members download. It’s the same file, we do not add anything to that file.

We feed those two files to Typefi, and it merges the contents in the order that we want them to be.

So there’s our cover page, then it gets the cover page from CEN coming from the CEN XML file, then comes all the Finnish contents from our XML file, followed by the English contents from the CEN XML file.

Something like that. There are some variations to this, but that’s the general idea.

While Typefi is doing this, every time it sees a section heading it says ok, this section heading came from the Finnish XML file. So Typefi will create a cross-reference to the section heading with the exact same ID attribute value that comes from the English XML file.

Typefi doesn’t check whether that kind of section really exists, it’s just assumed that there will be a corresponding section with this exact same ID attribute. The link is created, and we ensure that there will be a target for that link.

At this point I’ll just quickly show what a PDF file like this looks like.

You’re looking at an SFS cover page of an ISO standard that has been adopted by CEN and then again adopted and translated by us.

So really, this actually has contents from three XML files, one from us, one from CEN, and one from ISO. The general idea is the same.

Now, the first actual section in the standard is the foreword coming from CEN. And right next to the section heading you see a hyperlink. When I click the hyperlink it takes me to the same section in English.

I can scroll through the standard in English, I get the terms and definitions, and there’s a cross-reference to the terms and definitions section in Finnish—so I could do it like this, and now I’m at terms and definitions in Finnish.

That’s basically all there is to it. It’s stupidly simple, even. But it’s kind of convenient I think, it’s an easy way to cross-reference between languages, and you could apply the same idea to online platforms as well.

We’re doing it with PDF files for now, but it’s the same kind of idea behind how you create links and allow the customer to read through the standard in different languages and check the other language.

If that’s the kind of thing you want to do, this is a surprisingly easy way to do it.

Now, being able to do it like this is not a given, I have to stress that. It’s easy for us because when we started our project with eXtyles and Typefi, we copied everything that the ISO had done.

ISO had already started a project, they were publishing everything in XML by that time.

CEN had already started as well, they weren’t quite finished in the sense that they were still developing parts of their project, but they had all the eXtyles templates and customizations were already in place. So we could just march in and get all their stuff.

We wanted our stuff to be as similar as possible to what they were doing, so we just told eXtyles and Typefi, don’t make any customizations, just give us what they’re doing and we’ll work with that.

What that leads to is that when we have sections, all sections do have an ID attribute. That’s something that the ISO and CEN have decided, and that’s something that we’ve decided to follow them on, so it’s easy for us to always assume that ok, we have a section heading, we can create this kind of hyperlink, because the ID attribute will be there.

Similarly, the ID attribute values need to match because, like I said, we are not actually trying to look for a matching section in the other language, we’re just assuming that a section with the exact same ID will be there. So that means that the section ID schemes must be the same in the translation and in the original text.

I’m bringing this up because in the NISO STS, the official documentation for ID attribute—I have to speak here from the documentation—and it shows an ID attribute with the value s6.7.1.5.

I suppose that’s a good way to identify your section 6.7.1.5 but that is different from what ISO is actually doing, and what CEN is doing, and what we’re doing.

So, if someone were to look at the NISO STS documentation and decide that ok, this is how the section IDs work in the examples, we’re going to do exactly that, it would make sense for them.

But if we wanted to produce a translation and a bilingual standard using that XML from that organization, we would be in trouble, because our ID attribute schemes do not match. Well, it wouldn’t be hard, but we would have to create some kind of mapping to be able to match them.

All in all, these are not huge issues, we have some custom sections there to handle national content and we’ve created workarounds for them, but all in all this is a pretty easy thing to do.

Now, how has this all worked out for us? This shows how many pages per year we’re publishing. Our production of STS XML started in full in the middle of 2015, and in 2016 we were using STS XML / eXtyles / Typefi as our main publishing process.

The page count in those two years was pretty ok, normal. The amount of pages isn’t really limited by what we can publish, it’s more limited by what our translators are able to produce, so if they produce 17,000 pages of translations then that’s what we publish. No more, and no less.

The thing to take away from this is that the page count is normal, it’s the same as it was before.

Which takes me to the next slide. On top, the line shows how many days it took from when a draft translation was first submitted to our publishing department, to when there was a ready translation, ready to be sold to our customers.

You can see a fairly dramatic drop in there, almost a month compared to 2015. But in 2015 we had a lot of issues because, you know, we were starting this whole new publication process that took a lot of resources.

But even compared to the previous years when we were still completely using the old system, we’ve managed to decrease the amount of time it takes to create a publication by quite a lot.

And, furthermore, we’re doing it with a lot less effort required. The bars or columns on the bottom show how many person-years we’ve worked in our publishing department per year. So it’s not just creating those translations, it’s everything that our publishing department does, including creating advertisement materials and publishing the thousand adoptions that we do, all that stuff.

The amount of work, you can see, has gone down a lot, by what, one and a half person-years compared to when we were only using the XML method.

Compare it to 2014 when we produced a very comparable amount of pages, also 17,000, but we had over eight people working to create those pages.

So we’re doing a lot of pages, we’re doing it faster than before, and we’re doing it with less work required than before.

That’s all from me, thank you for listening.

Standardizing Standards 8: Implementing and Leveraging an XML Workflow at ISO

Transcript

SERGE JUILLERAT: Hello, and welcome to implementing and leveraging an XML workflow.

I took this photo to express the fact that our journey wasn’t completed in one day, like this natural rocky circus in Switzerland which is called the Creux de Van.

My name Serge Juillerat and I am Senior Developer for ISO Central Secretariat. I’ve been with the company since 2008 and I’m really happy to introduce you to my journey on the ISO publication chain.

We also have Brian Stanton.

BRIAN STANTON: My name is Brian Stanton, and I am Editorial Group Manager for ISO Central Secretariat. I’m responsible for a team of English and French language technical editors who assist in the development of international standards following the XML workflow.

ISO is an independent non-governmental international organisation with a membership of 162 national standards bodies.

Through its members, it brings together experts to share knowledge, and develop voluntary consensus-based marked-relevant international standards that support innovation and that provide solutions to global challenges.

As of January 2017, ISO has published over 21,000 international standards and standards-type documents, comprising nearly one million pages. Each year, experts representing over 160 standards bodies take part in more than 1500 technical meetings all around the world.

144 ISO staff coordinate the worldwide activities of ISO from its offices in Geneva.

In 2010, or goal was to improve our production processes and to bring them in line with modern publishing practices. Before 2010, our publishing workflow consisted simply of producing a PDF from a Microsoft Word document.

ISO’s main goals were to improve speed to market, and to streamline ISO production processes.

SERGE: In 2010, we started our journey by implementing Typefi Writer and Typefi Designer. This resolved one bottleneck, typesetting, as Typefi Writer was directly used by editors. This change led to the disbanding of our composition department.

We knew that this setup was temporary, that we needed to decide which flavour of XML we would choose, and would it be XML-first or XML-last?

In the end, we chose XML-middle. All the pieces came together. We chose eXtyles for the purposes of editing and producing XML, Typefi for our publishing platform, and the central piece, the NLM DTD, as a base to create the ISOSTS DTD.

This choice made sense as all the tools were very familiar to the NLM DTD.

Before setting up this production environment, the back-catalogue conversion of over 30,000 standards was the acid test.

The ISO collection contained documents going back to the late 1960s—a mix of Microsoft Word and scanned PDFs. The project was planned over two years with a high level of accuracy.

A few milestones for 2010 and 2011. After implementing Typefi Writer in 2010, we started 2011 with the selection of the XML schema. eXtyles was presented to the ISO editors in the course of the year. And ISOSTS version 0.6 arrived at the end of 2011.

We focused on three main areas, content, structure, and layout.

Our content was defined by technical committees using Microsoft Word.

For structure, eXtyles. Working with eXtyles was easy, as it is integrated into Microsoft Word’s ribbon. eXtyles provides powerful editorial tools, and can be customised if you need to add your own processes.

One of our colleagues coined the term ‘machine-generated XML’, which makes a lot of sense, as every export to XML from a particular Microsoft Word document will give you the same XML outputs.

For typesetting, Typefi. We use Typefi to generate the products that we make available to our standards users.

A single source allows you to produce multiple outputs. It is user-friendly, generally requires no post-production work, and reinforces our publication chain with brand-compliant publications.

BRIAN: We are creatures of habit, and change is tough. The move to XML publishing is a major task, and it is important to be vigilant, and to be ready to provide support and help to those directly impacted in the change.

Microsoft Word skills are key to mastering complex documents.

In September 2012, we started a pilot to handle surplus production with three additional editors in Manila. The collaboration worked well, so we expanded the team to five, then 11, and finally 14 editors. At one point they were handling the majority of the File Draft International Standards.

A few milestones for 2012. In May there was the release of the ISOSTS version 1.0 DTD, and at the same time we made available the Online Browsing Platform. We then started the back-catalogue mass conversion. eXtyles and Typefi were deployed in production in the summer.

2013 was largely devoted to completing the back-catalogue conversion. We started with batches of simple documents and then ramped up to more complex documents. As the project went on, we were also able to gain more confidence in the technology.

Outsourcing is a sensitive topic. It is complex to set up, and depends on the IT environment you are able to put in place. Our goal was to focus on flexibility and business continuity.

We also started thinking about accessibility in our PDF documents. As stated in the WHO Disability Report, 10% of the world’s population experience some form of disability.

The technology has enabled us to consistently produce PDF documents with accessibility features.

A few milestones for 2013. We started working on larger and more complex documents such as those produced by ISO/IEC JTC 1.

We kicked off the redline project, which compares two editions of a standard, and at the end of the year we deployed eXtyles SI to automate the submission of documents to ISO.

SERGE: 2014 was dedicated to experimentation. We chose DeltaXML to compare the XML files and to create redline versions of our standards in PDF and HTML on the Online Browsing Platform.

We also experimented with different layouts, and eye-catching covers in colour standards. As an extra help for the users of ISO standards, we created handbooks for high profile standards in A5 format.

The implementation of eXtyles SI was in response to a request from ISO members to have XML available at Draft International Standard stage. Using eXtyles SI allows us to save resources and time at this stage, although there is still some manual processing to do afterwards, as our documents come in many different flavours.

A few milestones for 2014. In April we completed the back-catalogue conversion. In June, ISO was contacted by Robert Wheeler from ASME to kick off the NISO STS project. And the end of the year was devoted to the Typefi 7 upgrade.

BRIAN: Even though some vendors claimed 100% quality and 100% automation, we knew this was just not realistic.

When we read through the contracts, one thing caught our attention. It was stipulated that the quality check had to be done at the vendor’s end and nothing was mentioned about any internal quality check. We were even told it’s digital to digital, what can go wrong? No need to check!

Don’t believe that. In response to this, we added internal quality control, and shared many best practices with the vendor. Not only was the full budget spent, but also a 20% contingency reserve.

Compared with what we estimated in our contract, we processed four times as many tables, 12 times as many equations, and almost twice as many images. The rule of thumb is, don’t underestimate the complexity of your content.

Because of perceived drop in quality of ISO deliverables since the start of XML processing, the ISO Technical Management Board (the TMB) set up a task force to gather feedback in April 2015.

The majority of issues identified were related to Microsoft Word files, and not the other products. However, the number of ISO corrigenda remained unchanged.

A few milestones for 2015, which was a particularly busy year. At the beginning of the year we started using eXtyles SI on Draft International Standards for automatic styling in documents.

In the course of the year, we implemented an enhanced table model to add flexibility to our workflow. And in October, we started the NISO STS project.

After five years of constantly moving forward, in 2016 we decided it was time to stop, and look back at what we had achieved.

In 2016 we focused on consolidation, and enhanced our production tools with new features like enhanced table processing, and automating the generation of EPUB.

We spent a lot of time with our end users to understand their frustration with the technology, to identify any potential issues, and to see how we could best resolve them.

A few milestones for 2016. The NISO STS project was our ongoing project for the year. In August we opted for Citrix as our main production environment, and the end of the year was spent on the Typefi 8 migration project.

SERGE: In the course of 2017, we set the stage for our future IT environment. The Typefi upgrade project offered more automation and a more robust production environment.

One conclusion of the TMB Task Force on quality was that the Word file provided back to committees was not good enough. We started looking into how we could create a Microsoft Word document from the XML file.

We also added more statistics on our publication chain to be able to flag bottlenecks, and give to management a clearer view of production.

And finally, on the ninth of October, the NISO STS standard was published.

Just two words to conclude: It works!

Our processing time has been reduced and we are publishing more standards each year. ISOSTS DTD was adopted as the basis of the NISO STS DTD, and we are already seeing new initiatives like STS4i.

We chose the right model to work with, and the ISOSTS DTD covers all our needs. It has been a great journey, and you realise how important it is to work with people who know their stuff.

We’d like to thank everyone who supported the project, with a special mention to Laurent Galichet and Holger Apel. And a huge thank you to Debbie and Tommie, Bruce and team, Chandi and team, Nette, Diwa, Ella and team, Sadhik, Rizwan, Saumya and team.

Thank you very much for listening. We hope this may inspire other companies and organisations to also take the plunge to implement and leverage an XML workflow.