University of Malta

Department of Computer Science and Artificial Intelligence


Simon Scerri

semantExplorer
A Browser for the Semantic Web

 

Introduction

The Semantic Web as visualized by Tim Berners Lee will be the keystone in the creation of machine accessible domains of information scattered around the globe. All information on the World Wide Web would be semantically enhanced with information that makes sense to intelligent information agents through the use of the Resource Description Framework (RDF) and the associated Schema Language (RDFS). Being specific information describing the contents of web documents, this hidden metadata deserves worthy attention and a means to present it to web users. This information should also lead to more accessible data. In the semantic web, classes of objects and their relationships are described in accessible Ontologies. In turn, resources in a web document are defined as instances of the objects in the applicable Ontologies. Creating relationships between the resources is possible with the use of the OWL language, a Web Ontology Language that is built on top of RDF/RDFS and XML. The ultimate goal of the Semantic Web is to achieve a semantically enabled world wide web, by annotating each and every online document and service with semantic meaning. In this way it will be possible to relate between every defined object on the web and make it easier for agents to understand the content of the web, and ultimately for people to have easier access to concept-oriented data.

Web Page annotation, using domain specific ontologies is the basis of the semantic web, and besides RDF and RDFS, OWL has emerged as the most common language for defining relationships between resources in a web page. A Semantic Web Browser will deal with the annotations embedded in the Head of the HTML of a web page.

Currently there are two approaches to creating Semantic Web Browsers. A Semantic Web Browser has been described (in a paper discussing semantic web browsers in relation with Haystack project) as a browser that explores the semantic web in its own right, and is able to relate and aggregate information about resources located in different web documents. On the other hand  a Semantic Web Browser can be described as a special web browser, that augments standard web browsers with the ability to visualize hidden metadata. While approaches like the latter would be based almost entirely on the present WWW resource sharing technologies, other approaches could involve new  ideas. In particular, special repositories could collect RDF triples from various accessed locations over time. Such triple stores could largely improve the efficiency of locating information on some resource of interest.

 

System Overview
semantExplorer is based on the following architectural components:

1. Extractor

This component extracts the HEAD container from a valid HTML file. If successful, the head is then scanned for any available RDF descriptions. If this is in turn successful, the named RDF descriptions are passed on to the parser.

2. Parser

This component Parses RDF descriptions to obtain a number of RDF triples, namespaces etc. The parser involved is RDF Drive.

3. Knowledge Base

This is a unique, remote database, to which users contribute freshly discovered RDF triples, or update them accordingly. When the Parser obtains such triples, they are stored in the database. This triple stores caters for provenance (the original source of triples is stored).

4. Table Builder

This component gathers information in a document concerning a single (selected) resource, and provides different/simplified ways of displaying it to the user.

5. Graph Builder

The graph builder processes information as in the table builder, with the difference that such information is then passed on to a graph renderer (QuickGraph) and subsequently hooked to a graph visualization unit (.NET version of GraphViz) to display information to the user as a colour-coded graph. This component also provides simplified ways of displaying such information, as well as an option to extract further relevant background data.

6. Lens Collector / Builder

The lens builder component will extract data related to the resource of interest from the current document, as well as from the underlying triple store. These are then displayed to the user as a collection of 'lenses' and the user can view  each lens separately as a graph similar to the one generated by the graph builder with some differences.

7. Triple Processor

Most triples will include blank nodes, namespaces or the fragment symbol '#' and some triples are irrelevant to the average user. Therefore the user has the option to 'clean' and simplify the triples before they are used by the Table, Graph or Lens Builder for output.

8. Collection Saver

The collector component will save selected resources for future use. When the user reselects such a resource, the table, graph and lens builders will present the relevant data.

9. Cache

Albeit being a simple cache, the cache speeds up processing by saving visited document to disk, together with its RDF parse files (if any).

semantExplorer attempts to bridge the gap between the 'Semantic Web Browser' and 'Semantic Web Browser' approaches. While the Table and Graph Builders contribute to the 'Semantic Web Browser', the Lens Builder, by enabling navigation to related data, contributes to the alternate approach.

Figure 1: semantExplorer Architectural Components

 

Application Overview
semantExplorer includes a Navigation Panel ( Back, Forward, Stop, Refresh and the Address Bar ) that provides standard document navigation for '.html' web pages, and '.rdf' and '.owl' ontologies.

Once a document is loaded successfully, any RDF content is extracted and parsed. Resources described through this metadata are listed in the 'Defined Objects' List. In the case of Ontologies, defined classes and properties will also be listed.

The browser includes four views:

  • Web Browser

This will display standard html documents, as well as RDF and OWL ontologies and provide standard navigation via hyperlinks.

Figure 2: semantExplorer after navigation to a web document containing annotated items ( shown in list on the left ). Tab selected is the Web Browser.

  •  Item Description

When an item from the 'Defined Objects' list is selected, triples describing it will be processed and displayed to the user in a simple way. The subject triple (being the selected item in the list) will be given alongside a list of characteristics (predicates) and their values (objects) for the selected item. The user can then navigate to any of these objects (given they are not literal or datatype values).  If 'SimonScerri' is defined to be an instance of a concept 'Student', having a name, and an address, this information will be displayed in the item description and the user can subsequently navigate to the concept 'Person' for further information.

Figure 3: The resource 'DepartmentOfCSAI' was selected. The Item Description tab is selected to give hidden information to the user.

  • Graph Viewer

This view is based on the previous. However, the information is presented in a simpler way by visualizing it as a colour coded graph. Although the graph does not provide any navigation per se (such navigation is possible through the item description), it can extend the amount of information displayed on a resource.  In the semantic web, information on resources will take the form of trees, with branches opening up to a number of other branches. For a user to arrive to the conclusion that 'SimonScerri' is a 'Person' in the previous example, a navigation to the resource 'Student' would need to be performed. This navigation will then show that a 'Student' had been defined as a kind of 'Person'. Through the Graph Viewer's Level Selection, this navigation is not necessary. If the level is set to '1' rather than zero, besides extracting information on the resource 'SimonScerri', the graph viewer will go one step ahead and extract information about EVERY new resource included in the graph. To ensure manageable graphs, level selection is limited to '2'.

Figure 4. The Graph Viewer tab is selected to display the hidden information on the selected resource to the user. The level set is zero (hence displaying what the item description already did, but in a visual graph form). Zoom Pic
  • Lens Viewer

This view attempts to aggregate data relating to a singular resource and displaying it to the user. Lenses widen views on a particular resources by using additional (known) information on the resource. It is based on a remote unique triple store that stores such triples semantExplorer users encounters RDF data on the web. The viewer classifies lenses in four categories. The user can then select lenses (focus them) to view information displayed in a similar fashion to that of the graph viewer with some differences. In particular, triples having the selected resource as an object are also included in the graph.

When an item in the Item Description List is clicked, any information related by the RDFS predicate 'seeAlso' are shown to the user as 'Other References' lens category.

Through checking with the triple store, any URL (be it a web document or an ontology) containing information about the selected item is listed in the 'Located Information' category. When the user focuses such a lens, information from that location is displayed in the lens viewer. 

If the item selected is an instance of the class, other instances of that class present in the triple store, together with the selected instance, are shown to the user as 'Instances'. If the selected item is itself a class, instances of that class are shown in this category.  For the 'SimonScerri' example, this category will include other instances of the concept 'Student'.

Since in the semantic web, concepts will be related to other concepts (via sub classing, etc) such related classes/properties will be displayed under the 'Related Concepts' category. For the concerned example, this category would include the class 'Person' and any other directly related class.

Figure 5: The Lens Viewer tab provided three lens categories (out of four possible ones). The first category provides links to web documents containing any information to the selected resource. The second category contains a number of other University's since the selected resource is one. The third category provides two lenses, being related concepts to the concept University, itself, and the concept Building shown in the viewer, since someplace Building is defined to be the superclass of University. Zoom Pic

Triples can sometimes be difficult to interpret. Many times, a group of triples relating to a single item will include a number of blank nodes, particularly when using RDF lists (rdf:Bag etc) and OWL complex classes. semantExplorer caters for this problem by providing the possibility to process the triples for such RDF/RDFS and OWL constructs to remove blank nodes. As an example, when parsing data describing that some university department has an 'AcademicStaff' consisting of ten academics, parsers will generate 12 triples, 10 of which will have blank nodes as their subject. With these RDF/RDFS/OWL fixes, such a statement will be reduced to just one line. Another option, Tag Fix, enables resources to be capitalized and namespaces to be hidden. Furthermore, in the case of the Lens and Graph Viewers, since many times graphs will include information that is irrelevant to the average user (for example that 'SimonScerri' is an owl:Thing, or that 'Student' is an rdfs:Class), another option will hide such constructs.

Once a resource has been navigated to, it can be saved as a Collection, much like favourites are stored in standard web browsers. They are called collections since once they are re-selected and navigated to, the web browser, item description and graph viewer will provide information on the resource, while the lens viewer will aggregate data that is known to be related to the selected resource. Such collections can be added, modified or deleted. semantExplorer also include local caching of visited locations, to speed up navigation. Such caching will include both the source of the document as well as its RDF parse files, if any. This cache can be turned off or set to a custom limit.

Other Screen shots

Fig 6: This is the same as figure 3, however, the OWL and RDF constructs have been dealt with, hiding blank nodes and providing simpler interpretation. Additionally, the tags are fixed to hide namespaces and user defined shortcuts ( all instances of some string are replaced by a user defined string ) are used to further simplify the output.

Fig 7: In this graph viewer output, the level is set to 1, therefore gathering additional RDF data from namespace-linked documents. Level 1 Data is attached using dashed rather than solid lines. Zoom Pic

Fig 8: In this graph viewer output, the level is set to 2. However, graph fixing is performed, drastically reducing the number of nodes (and blank nodes) and increasing readability. Without such fixing, the graph would be larger than the one in figure 7 and graph interpretability would be close to nil. Level 2 data is attached using dotted rather than solid (level 0) or dashed (level 1) lines.  Zoom Pic

Figure 9: This lens view for the selected resource 'UniversityOfMalta' is also applied fixing, removing namespaces and absolute URI path's ( which otherwise in lens viewing would be compulsory ) as previously shown in figure 5.


Conclusion

semantExplorer has been implemented using the .NET framework and the C# language, thus proving that the Semantic Web does not entirely belong to Java. It is an ongoing project and the first version will soon be available for download together with a user manual. Like all applications, semantExplorer has it's limitations. In particular, RDF data needs to be annotated inline in the document or via an anchor  link to an RDF file. The application caters for RDF,RDFS and OWL languages. Other languages (e.g.. DAML) will not be recognized. Such data will only be extracted from .html , .rdf and .owl documents/ontologies. Transfer Protocol should be the hypertext protocol HTTP.

Proposed future work will include the improvement of cache file generation and the inclusion of a reasoner within the Lens builder.

A user manual for semantExplorer is provided here.

semantExplorer: A Semantic Web Browser, Simon Scerri, Charlie Abela, Matthew Montebello, accepted at the IADIS International Conference WWW/Internet 2005, Lisbon Portugal, October 2005 (pdf)

Any queries and/or suggestions regarding this tool are more than welcome and can be e-mailed to:

Simon Scerri: irrecs at hotmail dot com
Charlie Abela: