Personal tools
You are here: Home Tech Tidbits Theory of the Web as One Big Database

Theory of the Web as One Big Database

If you posit that the web is an object database, then you can start using it like one. Use ReST, XML and XQuery, URLs as UIDs, you can envision the web as a bunch of objects with unique identifications with a simple tool set. Then if the various web sites are set up with this in mind, others can write database applications that use the sites like database files. It is up to the creators of content on the web to produce well designed unique IDs and well formed objects (web pages/files). Sprinkle in some standards within the documents and web database applications can be written that pull from across the entire web.

The Theory

Just imagine the World Wide Web as one big database. Specifically it can be viewed as an object database with the web addresses as unique IDs, the web documents mainly in XHTML/XML and with a native query language, XQuery, you have a functional database.  Although many web sites are generated dynamically from hidden relational databases, the web site generated can function as part of a larger database that is the World Wide Web. And the more simply and predictably constructed the site is especially using XHTML (rather than HTML), the richer that site is as part of the One Big Database. What is driving this move to treat the web as one big database is a realization that the architecture of the World Wide Web provides much of the underpinning of a database system.

The cliche goes, it is hard to see the forest for the trees, and with the web it is easier to see pages and sites than larger structures that exist in the web. And to draw out the analogy, each person tending their own tree often fail to see themselves as part of a forest except to the extent than they can see other trees nearby. Links and HTML are the sight lines and the rough structure that drive web site construction. But as we can step back and see the forest, we can better tend our own trees in the forest.

There has been a recent remarkable insight on how the web functions called ReST (Representation, State, Transfer) that is helping some to see how sites are related and can be better built in a greater Internet ecology. ReST is an almost too simple realization that can sound stupid, like exclaiming E=MC^2 before it was tested and accepted. However, this insight has helped a growing number of web site and application designers to build simple and powerful systems and tools. Sites that are less likely to break, easier to maintain, and can function like mini-databases for other sites without any extra effort.

I feel strongly that it may be easier to convey this revolution as allowing individual sites to be part of a larger, comprehensive world wide database. By making it clear that the sites can be just as easy for people to read the content on the page (and for those with disabilities even easier), that a copy of the site can be copied onto any other drive, easier to grab content and features from other sites and their site will be easier to find, proponents of this new information technology revolution can help non-technical people get the importance without confusing them with jargon.

Please read below and the other Tech Tidbits sections with the idea of the "One Big Database" in mind as they each can help you to make your web sites and web pages better. Then you can put a OBD compliant logo on your site. Most importantly your site will able to be accessed in ways that will make it much more valuable and usable and searchable and be used as a web service with no additional effort. And you can use other web sites that are OBD compliant in the same way.

Components of the One Big Database

  • Each URL is unique and requesting one usually results in getting back a web page or file. This is the record or object in the One Big Database.
  • As there is no "key index" or comprehensive list of all URLs, querying the One Big Database depends on having a list of URLs. Lists of URLs can be gleaned from RSS feeds, site indexes, search results from search engines, etc. In some cases sites might have URLs with predictable patterns that can be generated by others. NEW- Work is being done on two new standards to make it much easier to get related URLs: Repository Schemas and Rosetta Stone documents.
  • Pieces of web pages and web documents in XML compliant format can be addressed with simple statements (see XPath/XQuery). Note that Google Docs Spreadsheet has a new function: importXML(doc,xpath).
  • Query web pages and web documents in XML compliant format using very efficient code (see XQuery).

This Page as an Example

  • This page itself is a part of the One Big Database. It has a unique identifier:
  • The page is an object, in this case a text document in xhtml format (in relational database jargon, an object is roughly equivalent to a record or row).
  • The page is a record that shares the same parts as all other xhtml documents on the web and can be queried jointly with other xhtml pages on the web, to the extent that they have a shared structure.
  • This page share almost all major elements with other pages on this same web site and can be queried across multiple pages (for example, a query could request the meat of each page by asking for the <div> element that has an "id" equal to "content."
  • Because the page is in xhtml format, each tagged element can be addressed specifically one by one or as a group (in relational database jargon each tagged element is roughly equivalent to a field, but not quite). For example, each link is an element.
  • This page is a complex object is itself a mini-database (see the XQuery examples below that can be pasted into ):


Example queries of this Web site:

(not necessary to read or understand the code example to get the point)

The query:

declare namespace q = "";
let $nav := doc("")//q:ul[@class="portletNavigationTree navTreeLevel0"]
for $hrefs in $nav//@href

results in a list of all the links to subsections of this web site by grabbing the links in the navigation box on the left side of this web page.

The almost identical query:

declare namespace q = "";
let $content := doc("")//q:div[@id="content"]
for $hrefs in $content//@href

results in the list of links within the main content section of the page.


Essentially combine the two queries:


declare namespace q = "";
let $nav := doc("")//q:ul[@class="portletNavigationTree navTreeLevel0"]
for $hrefs in $nav//@href
let $pages := data($hrefs)
let $content := doc($pages)//q:div[@id="content"]
for $links in $content//q:a[@title="external-link"]

  to bring back all external links from the content sections of all the pages in the Tech Tidbits section of this web site.


Example Query of Wikipedia

Using an XQuery statement similar to this example to dynamically pull in Wikipedia content--in this case, paragraphs explaining XQuery--into your own web site.

declare namespace q = "";
let $content := doc("")//q:div[@id = "bodyContent"]/q:p


Document Actions
What's News