Repository Schema + XML Schema = Object Database
One of the great things about the web is that it is an object database. However, doing queries/mashups/aggregating/reusing the data in web documents is difficult when there is not a systematic way to grab mulitple documents and query them. Often to sidestep this problem, APIs are built to allow pulling information from a database. Repository Schemas are a way to make APIs unnecessary by showing a simple map to the data.
Click here for a slide presentation of the Repository Schema idea.
Also there is some important work on what the XQueries would look like if there was a Repository Schema at http://en.wikibooks.org/wiki/XQuery/Link_gathering
One additional piece to connect Repositories is called Rosetta Stone Documents. More on that soon.
Outline of Repository Schema
The Repository Schema includes a portion that sets out to allow for URL discovery both through recreating URL using variables (that XML Schemas can define) as well as discovery through XPath (finding URLs by looking at navigation/index/site maps, as well as search results, Atom/RSS lists, etc). The other pieces of the Repository Schema are about mapping out the components/objects/subobjects (content divs/microformatted/RDFa objects/etc) along with mapping the metadata. Here is a quick outline of all of the parts. Note that a Repository Schema can be built by anyone, not just the repository publisher (as opposed to most APIs that are publisher generated). And this allows for realtime queries/mashups especially using XQuery. And it also allows transparency in creating an audit trail back to the raw data (again especially with the open standard XQuery). Important: the Repository Schema will have a standard XSL to allow it to be human readable and perhaps with building the URLs and starting searches.
The outline of the Repository Schema (thanks to Chris Wallace who has been very helpful):
The Repository Schema is used to define a set of documents, preferably XML, and the objects or useful information contained in them: