Usted está aquí: Inicio Ingeniería Telemática Web 2.0 and Web 3.0 Technologies RDFMngt Module 5 Project

Module 5 Project

Acciones de Documento
  • Vista de contenidos
  • Marcadores (bookmarks)
  • Exportación de LTI
Autores: Luis Sánchez Fernández, Carlos Delgado Kloos, Vicente Luque Centeno, José Jesús García Rueda, Norberto Fernández García
Using Jena to manage RDF content (Semantic Web Information Management lab)

RDF Resource Description Framework Flyer Icon Module 5: Using Jena to manage RDF data


Goals

  • Practice the concepts learnt in the theoretical lessons, in concrete, the capabilities of SPARQL as RDF query language
  • Discover the possibilities of Jena, a Java tool for RDF storage and querying
  • Learnt how to transform data represented in a certain RDF vocabulary to a different one using SPARQL
  • Combine the inference features provided by Jena with the SPARQL querying capabilities in order to obtain richer information

Installing Jena

As indicated in its homepage, Jena is a Java framework for building Semantic Web applications. It provides a programmatic environment for RDF, RDFS, OWL, SPARQL and includes a rule-based inference engine. Jena is open source and grown out of work with the HP Labs Semantic Web Programme.

The Jena Framework includes:

  • An RDF API
  • Reading and writing RDF in RDF/XML, N3 and N-Triples
  • An OWL API
  • In-memory and persistent storage
  • SPARQL query engine

The source code and binary distributions of Jena can be downloaded from Sourceforge, but, for your convenience, a local copy of Jena 2.5.6 is provided to you. Download the tgz file an decompress it. If everything works as expected, you shall see a new folder named Jena-2.5.6. Inside that folder there is a directory named lib where a set of Java archives (jar) files are included. Add all these files to your CLASSPATH environment variable.

Within the software distribution provided to you, you will find two scripts, test.sh and test.bat, that you can use to test the Jena installation both in Linux and/or Windows environments respectively. These are located inside the root folder of Jena.

Exercises

Exercise 1. Querying with SPARQL

An RDF dataset is provided to you in RDF/XML format. This dataset contains the route planning of an hypothetical airline. Download and edit the file. You shall see that it contains a number of entries like the one shown below:

    <rdf:Description rdf:about="http://www.snee.com/ns/flights#SX0101">
        <fl:flightFromCityName>Aberdeen GB</fl:flightFromCityName>
        <fl:flightFromApCode rdf:resource="http://www.daml.ri.cmu.edu/ont/AirportCodes.daml#ABZ"/>
        <fl:flightToCityName>London-Gatwick GB</fl:flightToCityName>
        <io:miles>426</io:miles>
        <io:depart>6:40a</io:depart>
        <io:arrive>8:15a</io:arrive>
        <io:flight>SX0101</io:flight>
        <io:aircraft>737</io:aircraft>
        <fl:stops>0</fl:stops>
        <io:meals>M</io:meals>
        <io:duration>1:35</io:duration>
    </rdf:Description>

Each of these entries describes a flight, defining properties as the name of the origin and destination (fl:flightFromCityName, fl:flightToCityName) the length of the route (io:miles), the scheduled departure (io:depart) and arrival (io:arrive) times, the kind of aircraft (io:aircraft), and so on.

Apart from the dataset, the source code of a Java client for Jena is provided to you. Download and compile this source code. Do not forget that the CLASSPATH should be properly configured so that the Jena libraries are included.

The main method of the class provided to you can receive two arguments:

  1. The first argument is the path of a text file that contains the SPARQL query (a SELECT) to be carried out. This argument is compulsory.
  2. The second argument is an optional path to an RDF file. If it is provided, the SPARQL query will use this dataset when performing the query. If not provided, the SPARQL query should provide at least one dataset using FROM or FROM NAME reserved keywords.

The code simply carries out the query over the specified dataset(s) and shows the results in standard output.

Using the dataset and the Java client provided to you, answer the following questions:

  • What is the purpose of the following SPARQL query?
        PREFIX  fl:   <http://www.snee.com/ns/flights#>
        PREFIX  acode: <http://www.daml.ri.cmu.edu/ont/AirportCodes.daml#>
        PREFIX  io:   <http://www.daml.org/2001/06/itinerary/itinerary-ont#>
        SELECT  ?flight
        WHERE   { 
                    ?flight  fl:flightFromCityName  "Bangkok TH" .
            }
    
    Execute it and see it the results are the ones expected.
  • Find with a SPARQL query the origin and destinations of all flights.
  • Write a query to find the origin and destination of all flights that have their destination in Spain (ES).
  • Write a query to find the origin and destination of all inner Spanish flights, that is, those that have origin and destination in Spain.
  • Write a query to find all flights with more than 7500 miles. Show for such flights the id and the length in miles. You will need a mechanism to cast the text in the literal value to an integer. Look for such mechanism within the SPARQL specification.
  • Select with a SPARQL query the different airport codes of all the airports in Spain that are origin of a flight. Show for each airport its name and code. Order the results by airport name.
  • Select the distinct names of the airports in US that are destination of a flight. Order them by name, showing at most 20 results.
  • Using a SPARQL query, show the origin and destination of all the inner Spanish flights, including also the information regarding meals when it is available.
  • Select the different airplane names used by the company. Take into account that two different properties are used with this purpose: io:airplane and fl:plane, so there are in the RDF dataset flights that contain both definitions, only one of them or even no one. Due to a mistake in the process of data generation, some of the values of the property io:airplane are wrong, and contain an hour with format (XX:XX) instead of an airplane name. Filter out those wrong entries.
  • Using as RDF data sources the FOAF profiles of the teaching staff, available from the course Web site, write a SPARQL query that processes the profiles and shows the names of the people known by each staff member. Each result should consist of a person name and the URI of the RDF graph where the information was found.

Exercise 2. Using SPARQL for RDF transformation

Using the Jena API docs modify the code provided to you, so that it can execute CONSTRUCT queries instead of SELECT queries. Using an SPARQL construct query, we can transform the information in a certain RDF vocabulary to a different one.

A small subset of the original airline RDF dataset is provided to you. Transform such vocabulary using a SPARQL query, with the aim of obtaining a list of flights, each one represented by an RDF snippet as the following:

    <flightID> <http://example.org/from> "Madrid ES" .
    <flightID> <http://example.org/to> "Barcelona ES" .

The result should be a new RDF dataset that shall be written to standard out in RDF/XML format.

Exercise 3. Reasoning with Jena

If you take a look at the RDF data provided to you in the previous two exercises, you shall see that the vocabulary used to describe the airline flights includes a property named fl:flightFromApCode. Using this property we associate a unique identifier to each airport that is the origin of a flight. This identifier is the IATA code for the airport. Nevertheless, the vocabulary that is being used, does not define a property to associate an identifier to the airports that are destinations of flights. In order to address this situation, we will use the reasoning capabilities provided by Jena.

A detailed description of the reasoning capabilities provided by Jena can be found here. As you can see, the platform implements several types of resoners, which allow reasoning with different expressivity degrees. For instance, both an RDFS reasoner and an OWL reasoner are available. The plaftorm can be extended with external reasoner implementations, so, for instance, it can be used with Pellet.

In our case, we will make use of a third kind of reasoner, the general purpose rule engine that allows users to define their own inference rules. In our case, the following rule (named destCode) is needed:

    [destCode:  (?flight1 fl:flightFromCityName ?name),
        (?flight1 fl:flightFromApCode ?code),
        (?flight2 fl:flightToCityName ?name) -> (?flight2 fl:flightToApCode ?code)]

Can you explain what is the behaviour of this rule? A new property is defined in order to represent the IATA code of the destination airport. Can you say which one?

The Java source code of a class that uses the rule above to infere new knowledge is provided. The inference process is carried out in a method named reason, whose code is shown below:

   private static Model reason(Model input) {
    
        // Register a namespace to be used in the rules
        String flUri = "http://www.snee.com/ns/flights#";
        PrintUtil.registerPrefix("fl", flUri);
    
        // Create an (RDF) specification of a hybrid reasoner which loads its rules from an external file.
        Model m = ModelFactory.createDefaultModel();
        Resource configuration =  m.createResource();
        configuration.addProperty(ReasonerVocabulary.PROPruleMode, "hybrid");
        configuration.addProperty(ReasonerVocabulary.PROPruleSet,  "file.rules");
    
        // Create an instance of such a reasoner
        Reasoner reasoner = GenericRuleReasonerFactory.theInstance().create(configuration);
    
        // Infere new knowledge on the input model, generating a new one
        InfModel infmodel = ModelFactory.createInfModel(reasoner, input);
    
        return infmodel;
    }

The method receives as input a Jena representation of an RDF model (interface Model) with the input data and performs reasoning on the data. The result is a new (an extended, with the new infered triples, if any) model, that is returned for future use.

The code reads the rules from a text file, can you say which one? Write the rule provided above in the required file.

The main method of the code provided to you receives as input two parameters. The first one is a path to a file that contains a SPARQL query, that is being executed over the model obtained after inferencing. Write a SPARQL query that lists the flight IDs and the airport codes of the destinations for such flights. The second parameter is the path to an RDF file that contains the input data set. You can use the airline RDF subset as input RDF data.

References

Places where you can find more RDF data...

Reutilizar Curso
Descargar este curso