Personal tools
You are here: Home Tech Tidbits Reliable Electronic Citations

Reliable Electronic Citations

Reliable Electronic Citations are about creating methods to cite documents within HTML and XML either for direct links and/or as metadata.

Using Reliable Electronic Citations

In order to reliably cite legislation or similar types of documents, there is a checklist below to consider:

  • Unless impossible the citation should include an actual link or URL to an authoritatively published version of the document.
  • When there are multiple formats provided by the authoritative publisher, use the link to the one that best represents the document electronically. Avoid invalid, non-well formed, non-accessible and binary formats (e.g. TIFF, jpeg) when more data friendly formats are available.
  • When available use anchors to cite the specific section.

When no Internet posted document is available to cite by the use of a URL, avoid the temptation to create one's own or to use a non-authoritative version. In some cases, there may be non-resolvable URIs or URNs that the authority has decided to use for electronic citations. Always look for an alternative version of the document that the authority has posted and use both.

Conditions for Using Non-Authoritative Electronic Citations

Although non-authorities should avoid creating an alternative electronic citation system, it is still possible to create an alternative copy of the document and cite that version.  In this case it is important to understand the relationship between the authoritative electronic document and the alternate version by always including a link from the alternative version to an authoritative version. In that case, someone else who incorrectly decides to cite the alternative version, there will at least be only one simple link from the actual version (and this there is the understanding of Linked Data now emerging where this helps with automated connecting of data). (Note: the authoritative publisher refers to the body or person that is the author, owner, and/or the official publisher/distributer. In the case of government entities there are probably clear rules on what bodies are designated with some historical exception)

The main reason to cite a non-authoritative version is to make reference to information that is not in the authoritative version, such as annotations, comments, related links, additional related material, additional metadata and applications based on the document. Still in the case of knowingly citing a non-authoritative version, it is recommended to still link to the most authoritative version in addition.

Suggestions for Non-Authoritative Electronic Documents and Their Citations

First make sure the copy of the document(s) is as true to the original as possible and always include a link to the authoritative version if possible. Also document how the document was transformed from the original (e.g. OCR, XSLT, handcoded). Make sure that the copy is in either valid XHTML or XML with a good XML Schema and XSLT or CSS for human readability. Avoid using JavaScript generated text for the document text itself. Always make sure that the document/web page is fully accessible and if copying U.S. government documents, fully compliant with Section 508.

Since the electronic citation will be a URL, try to track as closely as possible with what the written citation would indicate. A good test is whether an average person could understand how the URL is composed and to find the document elsewhere with the information embedded in the URL. The best practice is to use or create an XML Schema for the constituent variables for the citation and/or a Repository Schema (which includes URL discovery). For example, a U.S. bill is composed of four variables for the document (and several more for the parts of the document), including congress, an integer that corresponds loosely with two year periods starting in 1789, bill types, a simple enumeration, bill number, an integer starting with one, and a bill version, based on an enumeration. Then with a simple URL templating system, publish the document(s) with as simple URLs as possible. However, consider using Regular Expressions to both test and to provide a reversing out of the variables contained in the URL. Note the dangers of leading zeros and not separating variables.

Be careful if deciding to divide integral documents into multiple URL/web pages. First try and use anchors, divs and/or other block elements to divide parts of documents. Avoid using variables that are not in the human readable form.

TheRecoveryAct.org Example of Non-Authoritative Citations

There are many groups and individuals who would like to publish data and comments related to the sections of the Recovery Act. Unfortunately, there is no standard for document formats for legislation that allows for specific electronic citations. Since there are no clear formats for legislation within the Thomas site as of 2009, there is no way to cite a bill version or part of a bill that is reliable.

Fortunately, there is a Handle system for linking to a summary page that in turn links to actual bill versions. (It would take several pages to explain the inconsistancies of the use of XML, HTML and text as well as the appearance of disappearing URLs). The GPO is an authoritative publisher of legislation and public laws, but for some reason uses only publishes plain text and the binary format of PDF (with no embedded XML). It appears impossible to link to a section of the text or PDF version. And the use of a digital signature in the PDF inadvertantly raises questions as to other versions hosted by the GPO and Thomas as there is no reason not to trust documents in other formats hosted by GPO and Thomas therefore any other republisher should just link back to the most authoritative version (i.e. the one hosted at Thomas or GPO).

AdvocateHOPE is experimenting by creating an alternative version of the Recovery Act that is in simple XHTML with anchors. This will allow people to make simple links to the section of the bill as hosted at TheRecoveryAct.org. Because that version is admittedly not the authoritative version of the legislation, each citation with be a link that also shows the link to the most authoritative version. Admittedly it is difficult to know what the citation should be based on what version is being referred to such as the introduced in House or Senate, the enrolled version, to an amendment of the legislation or several other versions. And then in the case of the Recovery Act, one can cite one of the H.R. 1 versions or more correctly the Public Law 111-7 (and eventually some of it will end up in the U.S. Code.

People will be informed that the TheRecoveryAct.org is non-authoritative and to use both the alternative citations to allow for use in applications that require the section information as well as the authoritative citation. The hope is that eventually the U.S. Congress will create an  authoritative XML/XHTML formatted document for all versions of all legislation. In addition, the site will allow for Topic Tags, essentially personalized tags/links/cites that can be used with the authoritative cites to convey opinions, annotations and results of the actual Act. The site will provide usable URLs for anyone using the Open Dialog Coalition's Open Public Integrated Architecture to "tag" their information related to parts of the Recovery Act.

 

Examples:

 

Cite Example Using Anchor on Title

This is a section of this page that should be cited.

 

Cite Example Using Div and Span Tags with ID Attribute

This is a paragraph that can be cited with a div id attribute for the whole paragraph. And this sentence contains a list of things which use span ID tags for specific reference and class for classifying them as objects to be cited: $143, Virginia. Note that URLs with hashtags use both the anchor name attribute and the id attribute to have internal document links.

 Cite Example Using LI Tag with ID attribute

  • 1.3.  This section of the law is referenced within an LI tag.

 

Same examples as above in HTML:

 

<p><a name="cite-ex-anchor" rel="subsection">Cite Example Using Anchor on Title</a></p>
<p>
This is a section of this page that should be cited.</p>
<p>&nbsp;</p>
<div id="cite-ex-div-span" class="cite">
<p>Cite Example Using Div and Span Tags with ID Attribute</p>
<p>This is a paragraph that can be cited with a div id attribute for the whole paragraph. And this sentence contains a list of things which use span ID tags for specific reference and class for classifying them as objects to be cited: <span id="cite-ex-div-span.1.143" class="cite">$143</span>, <span id="cite-ex-div-span.1.virginia" class="cite"><a href="http://www.virginia.gov">Virginia</a></span>. Note that URLs with hashtags use both the anchor name attribute and the id attribute to have internal document links.</p>
</div>

<p>&nbsp;Cite Example Using LI Tag with ID attribute</p>
<ul><li id="1.3" class="cite">1.3.&nbsp; This section of the law is referenced within an LI tag.<br /></li></ul>

Document Actions
What's News