eXtyles and ORCID

ORCID is “an open, non-profit, community-based effort to provide a registry of unique researcher identifiers and a transparent method of linking research activities and outputs to these identifiers” (http://about.orcid.org/). Many partners in the scholarly research endeavor have been quick to show support for the initiative, including individual researchers, research organizations, research funders, professional and learned societies, publishers, and abstracting and indexing organizations.

The ORCID registry was officially launched in October 2012, and, by the end of that year, over 40,000 individuals had registered for iDs. The latest figures from ORCID (July 2013) show that there are now almost 200,000 registered ORCID iDs (of which about 65% are associated with a verified email address), and the number of registered works is already over 1.5 million. Free services that are currently available include the ability to register, maintain, and share an ORCID identifier and associated record data, and to search ORCID records and view public data, the Public API, the ORCID source code, and tiered documentation..

In addition, ORCID already has 65 members, organizations that have paid a membership fee in order to support the development of ORCID and also to allow them to access the member API. Services provided by the member API include integrating authenticated ORCID iDs into manuscript-submission systems and grant application processes, retrieving ORCID data into internal systems such as institutional repositories, linking ORCID identifiers to other IDs and registry systems, and, for employing organizations, the ability to create ORCID records on behalf of their employees or affiliates.

Some use cases are very clear: research funders and institutions want to track the research output of their grant-holders or faculty. ORCID gives a way to index this information across an individual’s career, but importantly it also supports unique identification of researchers, particularly helpful for those researchers with common names, name changes, or who express their name using special characters. The US National Institutes of Health are releasing their ScienCV platform this summer, a grant application platform that incorporates ORCID identifiers, the US Department of Energy has integrated ORCID identifiers into their eAPPs system, and the Wellcome Trust, the National Institute of Health Research, and the Japan Science and Technology Agency are all starting the process of integrating ORCID identifiers into their grant application processes, a clear indication of the value that funders place on this information. It seems clear that researchers will increasingly need to register for an ORCID iD when they apply for grant funding. The ORCID iD seems destined to become a central part of the mechanics of academic research.

Unsurprisingly, there is a lot of interest in ORCID in the scholarly publishing community. At the November 2012 eXtyles User Group meeting, ORCID was one of the hottest topics of conversation among eXtyles customers. The Editorial Manager online manuscript-submission system, from ORCID member Aries Systems, already accepts ORCID iDs for authors and allows submitters of manuscripts to generate iDs for co-authors who do not yet have them. With the launch of JATS 1.0, the latest version of what was previously the NLM DTD, there is now support for contributor IDs in a widely adopted and supported XML format. Many eXtyles customers are keen to incorporate ORCID iDs into their eXtyles workflow.

However, what is less clear at the moment is when and how publishers will incorporate ORCID identifiers in a publication workflow. There is clearly something of a stampede going on, but where exactly is it headed?

Opportunities and challenges for publishers of scholarly content

Benefits for researchers

There are several benefits for researchers as the ORCID identifier becomes widely adopted and is supported by the manuscript-submission systems and publishing programs of the publications to which they submit their work.

It is already possible to envision a situation in which an author will need to provide only their ORCID iD when they use a manuscript-submission system, removing the need to maintain several profiles across the multiple publisher sites that a typical researcher will use. If they get a new email address or move departments or institutions, all they need to do is update their ORCID record and that change should flow into all the manuscript-submission systems with which they interact.

Some publishers, including Nature and Hindawi, are passing ORCID iDs through the entire publication process into their final published XML content. In these instances, the online version of the article contains an actionable ORCID iD, expressed as a URL, and the publisher also deposits the XML to CrossRef including ORCID iDs. CrossRef has been receiving ORCID iDs since March 2013, and PubMed received its first ORCID iD in April. This means the author will not need to update their publication record on the ORCID system; it might be possible for a publisher deposit to CrossRef that includes ORCID iDs to update an author’s ORCID record automatically.

Benefits for publishers

How much time is spent in the average editorial office chasing down authors whose contact details have changed? Whether it’s unreturned proof corrections, a missing copyright form, or even trying to follow up an accusation of publishing malpractice after publication, trying to track down authors can be a thankless task.

As authors increasingly see an incentive to keep their ORCID record up to date, so publishers may find it much easier to contact them. They will simply require each author to provide an iD, and their tracking systems can be set up to use the iD to provide the latest contact information.

It will also be easier for publishers to track where their submissions are coming from, which institutions, disciplines, countries or regions are over- or underrepresented, and to use that information for business development and marketing efforts.

But … who’s going to do the work, and how?

All this sounds very exciting and a great way to save time and effort all around and to provide a better service to authors and readers, as well as funders and institutions. However, there are some limitations to implementing the ORCID iD in an end-to-end XML workflow today. Let’s do a thought experiment and imagine the publication cycle of a typical paper.

Suppose an author submits a manuscript using an online submission system, and includes her ORCID iD as required. Even better, she may provide ORCID iDs for those co-authors who already have them and create new iDs for those not yet registered. The paper goes out for peer review, maybe gets revised, and eventually is accepted for publication. Depending on the discipline, this revision cycle can last several months or even a year or more. In that time, authors may have been added to or removed from the paper, but the lead author may update only the author list on the manuscript, and not the list in the submission system, so the metadata with the ORCID iDs is now incomplete.

Eventually the paper may be accepted for publication. What happens next is critical. The ORCID iDs are stored in the manuscript-submission system, but how do they find their way into the manuscript itself? Best practices call for automatic merging rather than rekeying of metadata, especially long strings of opaque numbers like ORCID iDs, but many publishers (or their suppliers) do not have systems to automatically merge metadata. The problem is even more complex when considering the potential for mismatches between the author list in the accepted paper and that held in the submission system. Because authentication is a key part of ORCID, any system to merge the iDs will need an authentication process.

Additionally, many publishers have not decided what role, if any, an ORCID iD will have in the manuscript itself. In our discussions with publishers, we found some who plan to include the iD in the author line, but others who do not want to include it at all in the manuscript file. However, if the ORCID iD is not included with the manuscript, adding it later may be problematic.

At the level of least automation, ORCID iDs could be added to the manuscript by hand by the author or by the publisher, and eXtyles could be modified to support this workflow by adding a character style for the iD, and the iD could then be exported to XML.

If an ORCID iD is not included as part of the manuscript as it enters production, it could potentially be added to the manuscript at a later stage. For example, an eXtyles Advanced Process could be developed to look up iDs based on each author’s affiliation. However, there is currently no method in the ORCID registry to submit a query with an author name and affiliation and retrieve an iD. ORCID is working to develop this feature in the future.

Another possibility might be to add ORCID iDs with an author/affiliation lookup after the XML has been created. This is a particularly attractive solution when considering the vast amounts of legacy content that have been mounted online in the last decade, most of which already has XML headers, if not full text. This has the potential to generate the ORCID “Big Bang”, with huge numbers of links between already published articles and their authors being available overnight.

However, as already mentioned, the author/affiliation lookup feature is not yet available from ORCID. And links to content published when the author was at a previous institution would presumably only be possible if the author had already listed that affiliation on their ORCID record. Authentication of automatically added iDs could also prove challenging.

Yet another possibility for new content would be for submission-system vendors to build APIs that allow XML file submission. The XML file returned to the publisher would merge the iDs from the submission system and/or report an error if an alignment conflict, as described above, was found between the author list in the submission record and the final XML. Such errors would require manual intervention by the publisher.

As we can see from these examples, there are still some challenges to overcome to integrate ORCID iDs into an XML publishing workflow.

What does a published ORCID iD look like?

The simple answer is, it’s too early to tell. Presumably, online hosting platforms are planning to implement the ORCID iD. In the online environment, one implementation might be to have each author’s name be a link to their ORCID record. The first article in Nature Immunology to contain actionable ORCID iDs was published online on 2 June 2013, and Hindawi has since followed suit.

Is there going to be a print rendering of the iD? Many publishers include DOIs as plain text (which may also be a link) in their reference lists, mainly for the benefit of readers of a print instance of the article (including a PDF printout), so perhaps there will also be a need for a format for the ORCID iD in print. On 21 February 2013, ORCID recommended that iDs should be represented as a URI, in the form “http://orcid.org/0000-0001-2345-6789”, analogous to the default print representation of DOIs. Quite a bulky piece of text, particularly on a genome article with hundreds of authors; the “limited space” recommendation, dropping the “http://” prefix, is only a marginal improvement in such cases.

However, it is interesting that none of ScienceDirect, Atypon, HighWire Press, or SilverChair gives any prominence on their websites to what they are doing to support the ORCID iD — presumably this indicates that online hosts have not yet fully worked out implementation plans, but with some journals beginning to include ORCID iDs in their published content, it’s likely the other hosting platforms have their plans well advanced.

Call to arms: Who is going to come up with the publisher use case?

There’s no denying that there is a buzz about ORCID. In particular, funders and institutions are understandably excited at the prospect of being able to track the publications of their grantees and staff more easily. In the UK, a broad group of funders and academic management bodies, including the Wellcome Trust, have signed a joint statement expressing their support for the ORCID initiative. A similar statement has been issued by a group of funders and universities in Sweden.

Editorial Manager was the first mainstream manuscript-submission system to provide support for ORCID iDs, and, in November 2012, Manuscript Central also announced support for iDs in a press release. eJournalPress also now supports ORCID iDs. So, the manuscript-submission systems are increasingly able to support ORCID, and funders and institutions are keen to start harvesting the invaluable data that they see ORCID delivering.

Authors and/or publishers have the means to get ORCID iDs into the publication process right up front, and there is a clear demand to get them out again at the end. What’s missing now is a clear path for how publishers can harvest ORCID iDs from authors, validate them, incorporate them into author manuscripts, pass them through editorial and into production, and finally render them in print and online environments. No doubt this will be a key topic of discussion at the next ORCID Outreach Meeting, taking place in Washington, DC, on 30 October 2013. Inera is keen to work with customers as that path becomes clear, in order to add appropriate tools in eXtyles to help manage and propagate ORCID iDs in the publishing workflow.

Acknowledgment: We are grateful to ORCID Executive Director Laure Haak for her helpful comments on an earlier draft of this piece.

<Blog/>