Confused Development

I develop software and I often get confused in the process. I usually find the answers after a while, but a month later I can't remember them. So from now on, I will write them down here.

Thursday, October 11, 2007

The Sesame 2 console: Creating Repositories, Loading Data with Context,SPARQL-Querying with Context

I have started using the Sesame 2 (beta 5) RDF store for hosting the conference metadata for ISWC+ASWC2007 at http://data.semanticweb.org. The software is still beta and a lot of the interface funtionality is missing, but when you figure out how everything works (which I did, with a lot of help from Aduna's OpenRDF forum), it seems to do the job. Even though the Web Interface is lacking a lot at the moment, Aduna supplies a console that can do most of what I need. Here is what I did:

Creating Repositories

After having installed Sesame as a Web application in Tomcat (5.5 works for me), I installed the console in a different location. I start it like this:
$ bin/start-console.sh 
OpenRDF Sesame console 2.0-beta5
Using data dir: /home/knumoe/.aduna/openrdf-sesame-console

The following repositories are available:
+----------
|SYSTEM ("System configuration repository")
+----------

Commands end with '.' at the end of a line
Type 'help.' for help
>
At the beginning, the console (which has its own data and doesn't know yet about the Sesame Web app), only knows one repository: its own SYSTEM repository. To get started, we have to let the console know about the Web apps SYSTEM repository. This is done by using the create command:
> create remote.
Please specify values for the following variables:
Sesame server location [http://localhost:8080/openrdf-sesame]: http://localhost:8080/openrdf-http-server-2.0-beta5 
Remote repository ID [SYSTEM]:
Local repository ID [SYSTEM@localhost]: 
Repository title [SYSTEM repository @localhost]: 
Repository created
>
Here is the first pitfall: the default server location is http://localhost:8080/openrdf-sesame, whereas, at least in this release of Sesame, the server location is actually http://localhost:8080/openrdf-http-server-2.0-beta5, so you need to change the default here (and if we don't, the console won't even complain). The rest of the default values are just fine, so we leave them as they are. The console now knows two repositories:
> show r.
+----------
|SYSTEM ("System configuration repository")
|SYSTEM@localhost ("SYSTEM repository @localhost")
+----------
>
Note that we havn't actually created a new repository on the server, we have just made sure the console knows about it. Every repository that the console knows can be opened and manipulated. We open SYSTEM@localhost:
> open SYSTEM@localhost.
Opened repository 'SYSTEM@localhost'
SYSTEM@localhost> 
The SYSTEM@localhost repository is for internal housekeeping, so we don't want to add any actual data to it. Instead we now create a new repository, which I will call test. Because we have previously opened the server's SYSTEM, the new repository will be created on the server. We can either create a native, a memory, or a memory-rdfs store. I choose native:
SYSTEM@localhost> create native.
WARNING: You are about to add a repository configuration to repository SYSTEM@localhost
Proceed? (yes|no) [yes]:    
Please specify values for the following variables:
Repository ID [native]: test
Repository title [Native store]: Test store
Triple indexes [spoc,posc]: 
Repository created
SYSTEM@localhost>
Now, even though we have just created a new repository on the server, the console doesn't know about it (strange, yes):
SYSTEM@localhost> show r.
+----------
|SYSTEM ("System configuration repository")
|SYSTEM@localhost ("SYSTEM repository @localhost")
+----------
SYSTEM@localhost>
To let the console know about it, we need to close SYSTEM@localhost and create another remote repository as a placeholder for the console:
SYSTEM@localhost> close.
Closed repository 'SYSTEM@localhost'
> create remote.
Please specify values for the following variables:
Sesame server location [http://localhost:8080/openrdf-sesame]: http://localhost:8080/openrdf-http-server-2.0-beta5
Remote repository ID [SYSTEM]: test
Local repository ID [SYSTEM@localhost]: test@localhost
Repository title [SYSTEM repository @localhost]: Test Repository
Repository created
> show r.
+----------
|SYSTEM ("System configuration repository")
|SYSTEM@localhost ("SYSTEM repository @localhost")
|test@localhost ("Test Repository")
+----------
> 
Now we are finally done with this bit: we have created a placeholder for the server's SYSTEM repository, created a repository for our data on the server, and created another placeholder for this new data repository.

Loading Data with Context

Loading data from the Web Interface in Tomcat is easy. However, it has one drawback - you cannot specify contexts (or named graphs, if you like). So, if you need named graphs, you need to resort to the console again. It's not hard though. The first thing we do is open our newly created remote repository placeholder:
> open test@localhost.
Opened repository 'test@localhost'
test@localhost> 
Now we can load data from a URL using the load command. If we just use the command like that, it will load the data into the repository without any specific context. To add a context for the new data, you can use the -c option. I want to load Eyal's foaf file. For the context, I just make up a URI (not good practice, I know):
test@localhost> load -c http://context.next/eyalsfoaf http://eyaloren.org/foaf.rdf.
Loading data...
Data has been added to the repository (12736 ms)
test@localhost> 
So now, all the triples from http://eyaloren.org/foaf.rdf have been added to the test repository on the server, using http://context.next/eyalsfoaf as a name for the context (the "next" was a spelling error which I didn't bother to correct). Note that with the current version of Sesame (2b5), if you explore your repository with the Web interface, the contexts don't show up. Don't worry though, they're there. You can see the contexts in the console (in the meantime, I have added another foaf file with another context):
test@localhost> show c.
+----------
|http://context.net/knudsfoaf
|http://context.next/eyalsfoaf
+----------
test@localhost>

Querying with Context

Now that we have data in two different named graphs (Sesame has the concept of contexts internally, but they are the same as named graphs for my purposes), we can also make use of this with SPARQL queries. To do this, we can use SPARQL's GRAPH construct. E.g., if I simply want to find all instances of foaf:Person in the repository, I can use this query, which is just your basic SPARQL, that every self-respecting SW nerd knows:
PREFIX foaf: < http://xmlns.com/foaf/0.1/>
SELECT $person
WHERE {
   $person a foaf:Person
}
This will give me six different instances, ignoring context (I did the queries in the Web interface - you can also do them in the console, but that is kind of unwieldy).

Simple SPARQL query result in Sesame2 Web interface The following query, which uses the GRAPH construct, also tells us from which named graph/context each instance stems from:
PREFIX foaf: < http://xmlns.com/foaf/0.1/>
SELECT $person $g
WHERE {
   GRAPH $g {
      $person a foaf:Person
   }
}
The result of this query also shows us that one instance comes from the named graph http://context.net/knudsfoaf and five from http://context.next/eyalsfoaf.

SPARQL query result with named graphs in Sesame2 Web interface Finally, we can rewrite the query such that we will only get those instances that come from a particular graph:
PREFIX foaf: < http://xmlns.com/foaf/0.1/>
SELECT $person
WHERE {
   GRAPH < http://context.next/eyalsfoaf> {
      $person a foaf:Person
   }
}
The result set of this query now contains those five instances that are in the http://context.next/eyalsfoaf named graph. Hooray!

SPARQL query result with from a particular named graph in Sesame2 Web interface

1 Comments:

At 3:20 pm, Blogger Knud Möller said...

Funnily, just a day or so after I had posted this, Aduna published a new version of Sesame (2b6 now), which improves the console a lot. So, all the stuff about remote repositories and loading with context is now a bit different. Look here.

 

Post a Comment

<< Home