Table of Contents
Persistent Identifier Meeting
A workshop was held on persistent identifiers, hosted by JISC, on 3 February 2010.
An overview summary of the meeting can be found here.
The full report of the meeting has been made available via JISCPress in order to facilitate comment and discussion around the various topics covered. The report can be navigated via the Table of Contents on the left from sections 1 through to 10, and is also available for download and printing as a PDF
Other Pages
Some additional pages to help provide some background and move the debate forward in the context of other JISC activities have been added to the main document. These are:
- What are Persistent Identifiers?
- Why should we care about Persistent Identifiers?
- Some current approaches
- Persistent Identifiers and the Web
- Persistent Identifiers and the URI-dependent institution
Feedback and comments on these is encouraged.
Useful Links
Links to other information of interest following the February meeting:
- Topsy link to #JISCPID tag on Twitter
- Lorna Campbell’s CETIS blog post which provided a good summary and some extracts
- This CrossTech blog post is part of a useful thread on DOIs, linked data and HTTP URIs
Submitted by: Martin Dow Stephen Bayliss Acuity Unlimited Tel: 020 7100 5625 martin.dow@acuityunlimited.co.uk stephen.bayliss@acuityunlimited.co.uk
This document represents a summary of the outcomes from a workshop on persistent identifiers, hosted by JISC on 3 February 2010. The overall background and context for the meeting is given in the Briefing Paper and in Chris Awre's introductory presentation, with the primary objectives being: to identify common features of the landscape where JISC investment might yield benefits; and to identify what may be needed or required for the UK landscape Taking an institutional perspective, C [...]
The primary objectives of providing this summary of the meeting are: to validate the essential points and the common themes distilled from the meeting to solicit feedback from meeting participants, and to invite additional comments not elicited in the meeting
Discussions focused on specific domain areas established that stakeholder concerns and needs do not revolve solely around persistent identifiers per se. Central themes for debate also revolved around resource preservation, metadata about resources and their relationships, and ensuring persistence of access to resources. These three recurrent high-level themes which represent the overall context of persistence can be summarised are as follows: Preservation: The physical location and stor [...]
A candidate set of statements of common agreement was proposed as a basis for moving beyond discussions about the merits of particular identifier regimes. These were: We accept that we are dealing with heterogeneous identifier environments. Well-defined contexts may have their own schemes. Where dominant schemes are used in a particular context (eg DOI) then we should encourage their ubiquity. Where there are not dominant schemes, then we would encourage the adoption of HTTP URIs W [...]
The plenary sessions revolved around identifying common features of the landscape, and identifying where JISC investment might be useful in terms of what is required for the UK landscape. The overall outcomes of the meeting are presented here under some proposed high-level categories, which are: providing services around persistent identifiers; providing advice, toolkits and frameworks; and engagement with the community and increasing awareness.
(Intentionally blank)
Primarily it was recommended that JISC should not be involved in building a new persistent identifier management service, responsible for centrally minting and resolving identifiers. However it was also noted that JISC might play some role in the establishment of a national shared persistent identifier infrastructure service; this might be for example, a "service of last resort" for cases where" all else fails", whether due to policy or technical reasons. It was noted that the PILIN project [...]
It was suggested that JISC should help create facilities, not authorities – decentralised services that use independently-managed identifiers rather than centralised services that create and manage them. In general JISC should play a role in enabling HEIs to provide services to researchers that they want, using identifiers. This could perhaps be in the area of preservation services, although the actual implementation may well be at the institutional level, facilitated by JISC's provision [...]
Although there is existing work in the area of persistent identifiers, it was felt that the advice and guidelines available were sometimes contradictory, perhaps a consequence of the overall heterogeneity of the features of the identifier landscape, A good solution in one context might be a poor choice in another. It was suggested that JISC should have a role in gathering evidence of where persistent identifiers worked in practice across a range of different contexts to identify, dissemin [...]
Where there are successful persistent identifier schemes, their features should be examined; NASA-originated numbers assigned to aeronautical research reports and the Linnaean taxonomy were given as examples. Although neither of these schemes has an associated automated identifier resolution system, their identifier referents can always be found. HTTP URIs can be and are being used successfully as persistent identifiers, and Southampton University's use of Eprints for bibliographic data w [...]
Case studies addressing the following were suggested: Illustrate the benefits in “closing the gap” between those with an incentive to maintain identifiers and those whose responsibility it is to do so Illustrate costs and benefits by articulating workflows in specific HEIs, and identifying stakeholders and value in each context Provide a cost assessment of getting a system running using various persistent identifier schemes and infrastructures Illustrate how existing [...]
Cost, and perceived cost, was identified as an issue, and was raised explicitly by the Cultural Heritage group. To address this, it was suggested that JISC develop and provide toolkits and frameworks to assist institutions in the assessment, adoption and implementation choices pertaining to persistent identifier schemes and services.
JISC should draw a line under long-running arguments about particular persistent identifier schemes and instead should focus its efforts on enabling HEIs to choose and implement schemes appropriate to their needs. JISC's focus should be on points of agreement. Clear, easy-to-understand advice should be provided on how an HEI might choose between identifier schemes based on their own needs and contexts. The pros and cons of various approaches in different circumstances, for different purp [...]
The heterogeneity of the factors involved in persistence of identification, particularly in the context of the needs of the various stakeholders, became apparent during the day. In supplying materials and advice, these heterogeneous factors and their relevance to the various stakeholder groups merit consideration. Determining “what makes a good identifier" was expressed as a desirable goal at the meeting. The headings below represent approximate groupings of statements into what are a [...]
A wide variety of entities were identified, with diverse artefacts across the Research Papers, Research Data, Learning Materials and Cultural Heritage groups. The Research Papers, Learning Materials and Administrative Information groups all identified a need for real-world entity identifiers, particularly people identifiers. The Learning Materials group identified a range of other related real-world entities such as modules, courses and lessons whilst the Administrative Information group has [...]
There was evidence that there are a wide variety of preservation, curation and access responsibilities, and that these responsibilities can change. The Research Papers group identified a need to deal with resources where the responsibility for curation largely lies outside the control of this group, particularly grey literature and self-published literature. These currently have little by way of curation policy, where publication rather than preservation is seen as the primary objective, de [...]
Scalability refers to the number of resources to be identified, which has an impact on systems involved in persistent identification. This was seen as an issue for the Research Data group. The Administrative Information group suggested that a selection and appraisal process is required, as identifiers cannot be assigned for every conceivable thing, and capturing metadata to describe the contextual setting of certain information may be impractical.
The need to identify component parts of an entity was noted as a requirement by the Research Data group, for whom it is important that the data structures involved, their relationships and derivations, are represented to a sufficient level granularity. Aggregation of entities into identifiable composites was identified as use case by the Cultural Heritage group. Learning Materials identified a similar use case in dealing with complex, composite learning materials that can be multi-level, wi [...]
Knowing the kind of entity being identified in a certain context, the role a person may have for example, was a requirement expressed by the Research Papers and Administrative Information groups, with the latter identifying that identifying entity type is necessary for parties to reach agreement about something, as they need to know whether they are referring to the same thing. Metadata describing the context in which an identified entity is used was identified as an underlying requirement [...]
Clarity about rights was expressed as a need, for example database rights, in identifiers, aggregations of identifiers, etc, and that this relates to development of viable business models for the sector. The Digital Economy Bill may have implications for persistent identifiers in relation to linking to publicly-available material. As the bill goes through the parliamentary process the implications or otherwise of this will become clearer.
It was felt that JISC should play a useful role in raising awareness and engaging with those various communities and stakeholders who have interests and needs around persistent identifiers, both within the HEI community and with external organisations. It was noted that persistent identification issues are not merely technical but also organisational, and that by raising awareness and engaging with the community JISC can start to impact on practice. Where possible, existing channels such a [...]
Awareness should be raised of existing work in the field of persistent identifiers, and the materials produced by this work should be more widely disseminated. Specifically noted was the work done on services that use identifiers by the National Archives[1] and the RIDIR project’s ("Lost resource finder"). It was stated that the community lacks widely understood policies, and that there are cases where there is a lack of awareness of how to best use existing identifier schemes. A parti [...]
JISC should identify the communities and people for whom persistent identifiers really matter and work with them. JISC's role should be one of facilitation rather than intervention. A cultural change programme with the Universities and Colleges Information Systems Association (UCISA) and the Society of College, National and University Libraries (SCONUL) etc could be initiated to encourage people in HEIs to care about identifiers. JISC should develop a "group of common statements in which [...]
Chris Awre set the scene for the day with a presentation entitled:. The need for “persistent identifiers”, What are we talking about – “persistence of identifiers”? - Chris Awre Chris' presentation can be found at: http://docs.google.com/leaf?id=0B93tJ2TZe3khYWQxNmQ0ZWYtNzcwZi00ZDZhLWIwNjMtNzc3YTk4Y2IzYzQ0&hl=en Chris's presentation provided an introduction and an overall context for the meeting, examining why the meeting was taking place and the background initiatives that h [...]
(Intentionally blank)
Henry Thompson's presentation can be found at: http://www.ltg.ed.ac.uk/~ht/JISC_2010/ Henry examined the HTTP URI approach for persistent identifiers, presenting some basic assumptions on the sharing of resources and what URIs are for and exploring what it is to name something. He presented definitions of desirable characteristics in persistent identification. He identified that FRBR is a useful starting point for thoughtful ontology in the issue, and concluded with some concrete recommen [...]
Paul Walk's presentation can be found at: http://bit.ly/bgrhWM. Paul examined the Handle / DOI approach, stating that DOI is a business proposition. He gave an overview of what DOIs are, the syntax, metadata and services around DOIs. The business model was examined, and the current coverage of DOIs was presented in terms of the entities they are currently minted for. He examined the general requirements in the UKHE sector and how DOIs might represent a value proposition, concluding that [...]
(Intentionally blank)
Martin Dow and Steve Bayliss's presentation can be found at: http://docs.google.com/fileview?id=0B93tJ2TZe3khMTc1YzJmNTktMjJiYi00ZGIwLTkwNjItOWY2NDVhMjU1YTVj&hl=en Steve gave an overview of the RIDIR project, identifying that the main objectives were to engage with the identifier and repositories communities to understand their requirements and to build a fully working demonstrator, and to raise awareness of persistent identifier interoperability issues. He stated that the project was n [...]
Andrew Treloar's presentation can be found at: http://docs.google.com/fileview?id=0B93tJ2TZe3khMzcxMmMwZGYtYjE4Ny00MmZkLTk3YmEtYzcyOGQxYWZmNTA1&hl=en Andrew's presentation was on the ANDS "Identify My Data" service and related issues. He explained that the function of ANDS was to establish the Australian Research Data Commons, with a vision of more researchers re-using more data more often. IT builds on the work of PILIN through the guides produced, the pilot infrastructure, the softw [...]
Bas Cordewener's presentation can be found at: http://docs.google.com/leaf?id=0B93tJ2TZe3khYjE3YmFjM2UtNzc0YS00N2UxLTljYmMtOWIxZmRkODUzZjQ4&hl=en Bas presented an overview of the persistent identifier activities for SURF and Knowledge Exchange. The SURFShare programme(2007-2010) is based on the Digital Academic Repositories Project (DAREnet) in the areas of interoperability, communication, registration, sustainability and dynamic archiving. The programme is aimed at apparent changes i [...]
Hugh Glaser's presentation can be found at: http://eprints.ecs.soton.ac.uk/18452/ Hugh's presentation was based on the notion of distributed authority – what to do when authorities collided, what is the granularity of an authority, what to do when we have PIDS and how to manage the infrastructure. Hugh established that each metadata owner publishes its own identifiers, and that is the right thing to do as their workflow depends on it, they own the concept of the entity, and there would be [...]
Informational and non-informational resources TheW3C Technical Architecture Group (TAG) identifies two top-level types of entities that need identifiers on the web – informational resources, such as web pages, documents, and data accessible on the web; and non-informational resources, real-world entities such as people and organisations Resolving identifiers A fundamental difference between identifiers for informational and non-informational resources is what they resolve to. Inform [...]
A full snapshot of the Twitter backchannel can be found at: http://docs.google.com/fileview?id=0B93tJ2TZe3khNTM1NmMwZmUtYTY0OC00MTI1LWI1ZDctMmExOTE2NmE2N2Zh&hl=en Presented below are the main (unattributed) tweets captured during and shortly after the meeting, in chronological order. CrossRef was an enabling factor for DOI DOI is not perfect, but it won't die Desire for more than a re-hash of Cool URIs DOI is an implementation ofan HTTP redirect mechanism, that is its technica [...]
This page summarises the outcomes from the JISC Persistent Identifiers (“JISCPID”) Meeting on 3 February 2010. The meeting was the focus of an activity to address an increasing awareness of the importance of persistent identifiers, partly as repository usage matures, identifier systems and schemes proliferate, standardised identifiers for entities such as organisations and people become more important, and interest rises in the use of web identifiers for open and linked data. The scope and [...]
This material was prepared as part of the JISC Persistent Identifiers Activity. One of its objectives is to stimulate discussion around these topics, so please add your thoughts and comments. There are a great many resources available on the subject of Persistent Identifiers and their scope. Some points of reference were created for the Persistent Identifier Meeting [1], and a glossary drafted [2]; recommended introductions are available[3][4]. This page intends to give a flavour of the topic. [...]
If you publish information on the web, you want to ensure that people can find it, and when they go back to URLs they have previously bookmarked, you want that URL to work. If your information is referenced by other people you want to ensure that someone following up that reference in years to come can still access the information you published. Persistence is not just about identifiers. Persistence of access to the resource is important – where the resource is stored and what happens i [...]
Studies[1] have compared various existing schemes by various criteria. A full and up-to-date list of commonly-considered PID schemes is given at http://repinf.pbworks.com/Persistent-identifiers . Amongst the categories most relevant to the PID meeting discussions were those underlying notions of trustworthiness, robustness and reliability: Governance – use of established identifier schemes, conventions, practices and policies Reliability –that an identifier can always be used to [...]
This material was prepared as part of the JISC Persistent Identifiers Activity. One of its objectives is to stimulate discussion around these topics, so please add your thoughts and comments. This section discusses how developments on the web might impinge on the more preservation-oriented practices around persistent identifiers. It puts forwards a viewpoint that these developments require acknowledgement and action. Web terms and processes URLs and HTTP URIs: A Uniform Resource Locato [...]
This material was prepared as part of the JISC Persistent Identifiers Activity. One of its objectives is to stimulate discussion around these topics, so please add your thoughts and comments. Identifiers should deliver value through curation Every institution today mints identifiers on the web, and their web servers resolve them to digital resources. Every institution has an actionable URI space. As the web as a medium becomes increasingly pervasive and reaches into institutional and pr [...]