A dataset is a data structures used to store a collection of sensor or processed data. Usually, this is data that has been collected together during an exploration missions. For instance, it can be a collection of images. But this can also be the results of processing data, like a collection of salient points. Datasets are handled by the kDBDatasets library (TODO link to API dox).

In kDB, a datset is stored as part of a RDF graphs. Each RDF graph can contains multiple datasets. As part of the kDBDatasets library, there is a Datasets class (TODO link to API dox) which allow to create and retrieve a specific dataset. The Dataset class (TODO link to API dox) allows to access the meta information of a specific dataset.

Enabling kDBDatasets

kDBDatasets is provided as an extension, and need to be enabled in the store before it is used. This needs to be done only once after creating the store. Once an extension has been enabled, it is automatically available to all connections (current and future).

  • connection.enableExtension("kDBDatasets");
    
  • connection.enableExtension("kDBDatasets")
    
  • connection.enableExtension("kDBDatasets")
    

How to create a dataset

To create a dataset, it is first necesserary to select to which collection of datasets the dataset belongs to.

We assume that you have a connection to a database (check the Getting started with a store). The following examples shows how to access a collection:

  • #include <kDBDatasets/Datasets.h>
    
    // This access the set of private datasets.
    kDBDatasets::Datasets dss = kDBDatasets::Datasets::get(
        c, "http://askco.re/graphs#private_datasets"_kCu
    );
    
  • import kDBDatasets
    
    # This access the set of private datasets.
    dss = kDBDatasets.Datasets.get(
        connection, knowCore.Uri("http://askco.re/graphs#private_datasets")
    )
    
  • require 'kDBDatasets'
    
    # This access the set of private datasets.
    dss = KDBDatasets::Datasets.get(
        connection, kCu("http://askco.re/graphs#private_datasets")
    )
    

http://askco.re/graphs#private_datasets refers to a collection of private datasets, that are not supposed to be shared with other agents. The URI can be replaced by any other URI for managing a different collection of datasets.

Then it is possible to create a dataset with:

  • #include <knowGIS/GeometryObject.h>
    
    #include <kDB/Repository/GraphsManager.h>
    #include <kDB/Repository/TriplesStore.h>
    
    #include <kDBDatasets/Dataset.h>
    #include <kDBDatasets/Datasets.h>
    
    // Uri for the dataset.
    knowCore::Uri my_dataset_uri("http://askco.re/examples#my_dataset");
    
    // Polygon area corresponding to the dataset.
    knowGIS::GeometryObject geometry = knowGIS::GeometryObject::fromWKT("POLYGON((16.680529 57.761379, 16.682289 57.760258, 16.680336 57.758964, 16.677911 57.760693, 16.680529 57.761379))")
    
    // Create a dataset of salient regions, covering the area defined by go0
    // and with the given URI.
    kDBDatasets::Dataset ds = dss.createDataset(
        "http://askco.re/sensing#salient_region"_kCu, go0, my_dataset_uri
    );
    
    // It is also possible to let kDB generate a unique URI for the dataset.
    kDBDatasets::Dataset ds = dss.createDataset(
        "http://askco.re/sensing#salient_region"_kCu, go0);
    
  • import knowCore
    import knowGIS
    import kDBDatasets
    
    # Uri for the dataset.
    my_dataset_uri = knowCore.Uri("http://askco.re/examples#my_dataset")
    
    # Polygon area corresponding to the dataset.
    geometry = knowGIS.GeometryObject.fromWKT("POLYGON((16.680529 57.761379, 16.682289 57.760258, 16.680336 57.758964, 16.677911 57.760693, 16.680529 57.761379))")
    
    # Create a dataset of salient regions, covering the area defined by go0
    # and with the given URI.
    ds = dss.createDataset(
        knowCore.Uri("http://askco.re/sensing#salient_region"), go0, my_dataset_uri
    )
    
    # It is also possible to let kDB generate a unique URI for the dataset.
    ds = dss.createDataset(
        knowCore.Uri("http://askco.re/sensing#salient_region"), go0)
    
  • require 'knowGIS'
    require 'knowCore'
    require 'kDBDatasets'
    
    # Uri for the dataset we just created.
    my_dataset_uri = kCu("http://askco.re/examples#granso")
    
    # Polygon area corresponding to the dataset.
    geometry = KnowGIS::GeometryObject.fromWKT("POLYGON((16.680529 57.761379, 16.682289 57.760258, 16.680336 57.758964, 16.677911 57.760693, 16.680529 57.761379))")
    
    # Create a dataset of salient regions, covering the area defined by go0
    # and with the given URI.
    ds = dss.createDataset(
        kCu("http://askco.re/sensing#salient_region"), go0, my_dataset_uri
    )
    
    # This set the status of the dataset to completed.
    ds.setStatus(KDBDatasets::Dataset::Status::Completed)
    
    # This indicates that the current connection has a copy of the dataset.
    ds.associate(connection.serverUri())
    
    # Create a dataset of salient regions, covering the area defined by go0
    # and with the given URI.
    ds = dss.createDataset(
        knowCore.Uri("http://askco.re/sensing#salient_region"), go0, my_dataset_uri
    )
    
    # It is also possible to let kDB generate a unique URI for the dataset.
    ds = dss.createDataset(
        "http://askco.re/sensing#salient_region"_kCu, go0)
    

The URI http://askco.re/sensing#salient_region is specific for creating a dataset of salient regions. At present the following types are accepted:

  • http://askco.re/sensing#salient_region for dataset of salient regions
  • http://askco.re/sensing#image_frame for dataset of images
  • http://askco.re/sensing#point_cloud for dataset of a point cloud
  • http://askco.re/sensing#lidar_scan for dataset of 2D Lidar scan
  • http://askco.re/sensing#lidar3d_scan for dataset of 3D Lidar scan

Dataset metainformation

Some of the metainformation of a dataset is considered static and should not be changed after the dataset creation, such as the timestamp or the geometry. Some metatinformation is dynamic and can change, such as the status of the dataset or the list of agents who have a local copy of the information. The status is used, among other things, to indicate if the dataset is complete or under construction.

  • // This set the status of the dataset to completed
    ds.setStatus(kDBDatasets::Dataset::Status::Completed)
    
    // This indicates that the current connection has a copy of the dataset
    ds.associate(connection.serverUri())
    
  • # This set the status of the dataset to completed
    ds.setStatus(kDBDatasets.Dataset.Status.Completed)
    
    # This indicates that the current connection has a copy of the dataset
    ds.associate(connection.serverUri())
    
  • # This set the status of the dataset to completed
    ds.setStatus(KDBDatasets::Dataset::Status.Completed)
    
    # This indicates that the current connection has a copy of the dataset
    ds.associate(connection.serverUri())
    

The general API for changing or accessing a property of a dataset is to use respectively property and setProperty.

  • // Get the value of a property.
    knowCore::Value value = ds.property("..."_kCu).expectSuccess();
    
    // Set the value of a property.
    ds.setProperty("..."_kCu, value);
    
  • # Get the value of a property.
    value = ds.property("..."_kCu)
    
    # Set the value of a property.
    ds.setProperty("..."_kCu, value)
    
  • # Get the value of a property.
    value = ds.property("..."_kCu)
    
    # Set the value of a property.
    ds.setProperty("..."_kCu, value)
    

The list of possible properties is dependent on the type of data and is defined in kDB/extensions/kDBDatasets/data/datasets_shacl.ttl.

How to query for dataset

Information regarding datasets are stored in triple stores. As such, they can be queried for using SPARQL, however kDB also provide a high level API for accessing dataset and query them according to their properties.

If the URI of the dataset is known, it can easilly be retrieved using the kDBDatasets::Datsets::dataset function:

  • kDBDatasets::Dataset ds = dss.dataset("..."_kCu).expectSucccess();
    
  • ds = dss.dataset(knowCore.Uri("..."))
    
  • ds = dss.dataset(kCu("..."))
    
  • The following can be used to return the list of all datasets

    kdb datasets list –path path/to –port 1242

    The following can be used to return the list of all datasets in a specific collection

    kdb datasets list –path path/to –port 1242 “uri of the collection”

kDBDatasets also provides an advance query mechanism that allow to query according to the different properties. Bellow is an example for retrieving point cloud datasets of density at least 20 points/m^2 from the last 30 minutes.

  • // Create the constraints used to query for the dataset
    QList<QPair<knowCore::Uri, knowCore::ConstrainedValue>> constraints = 
    {
      {
        // Set the uri of the property
        "askcore_sensing:point_density"_kCu,
        // Set a constraint > 20
        knowCore::ConstrainedValue().apply(20, knowCore::ConstrainedValue::Type::Superior)
      },
      {
        // Set the uri of the property
        "http://www.w3.org/2006/time#hasBeginning"_kCu,
        // Set a constraint > now - 30 minutes
        knowCore::ConstrainedValue().apply(
          knowCore::Timestamp::now() - knowCore::Timestamp::from<knowCore::Minutes>(30),
          knowCore::Timestamp::, knowCore::ConstrainedValue::Type::Superior)
      }
    };
    
    // Query using the previously defined constraints
    QList<kDBDatasets::Dataset> list_of_ds = dss.datasets(constraints).expectSucccess();
    
  • import knowCore
    # Create the constraints used to query for the dataset
    constraints = {
      # Set the uri of the property
      knowCore.Uri("askcore_sensing:point_density"):
      # Set a constraint > 20
        knowCore.ConstrainedValue().apply(20, knowCore.ConstrainedValue.Type.Superior),
      # Set the uri of the property
      knowCore.Uri("http://www.w3.org/2006/time#hasBeginning"):
      # Set a constraint > now - 30 minutes
        knowCore.ConstrainedValue().apply(
          knowCore.Timestamp.fromDateTime(datetime.datetime.now() - datetime.timedelta(minutes=30)),
          knowCore.ConstrainedValue.Type.Superior)
    }
    
    # Query using the previously defined constraints
    list_of_ds = dss.datasets(constraints)
    
  • require 'knowCore'
    # Create the constraints used to query for the dataset
    constraints = {
      # Set the uri of the property
      kCu("askcore_sensing:point_density"):
      # Set a constraint > 20
        KnowCore::ConstrainedValue.new().apply(20, KnowCore::ConstrainedValue::Type::Superior),
      # Set the uri of the property
      kCu("http://www.w3.org/2006/time#hasBeginning"):
      # Set a constraint > now - 30 minutes
        KnowCore::ConstrainedValue.new().apply(
          KnowCore.Timestamp.fromTime(Time.new - 30*60),
          KnowCore::ConstrainedValue::Type::Superior)
    }
    
    # Query using the previously defined constraints
    list_of_ds = dss.datasets(constraints)
    

How to access the content of a dataset

The kDBDatasets library provides iterators for accessing the content of a dataset. The iterators are defined in the kDBDatasets::DataInterfaceRegistry class. There are three types of iterators: insert, extract and value. insert and extract are used for copying datasets. value is used to access individual data points (a.k.a images, salient regions…).

  • #include <kDBDatasets/DataInterfaceRegistry.h>
    
    // Create an iterator to access the values
    kDBDatasets::ValueIterator it = kDBDatasets::DataInterfaceRegistry::createValueIterator(connection, ds).expectSuccess();
    
    // Iterate while it still has more values
    while(it.hasNext())
    {
      // Get the next value
      knowCore::Value value = it.next().expectSuccess();
    
      // The value needs to be converted to its relevant C++ class for use.
      // This is specific to each type of data and demonstrated in the next tutorials.
      ...
    }
    
    
  • # Create an iterator to access the values
    it = kDBDatasets::DataInterfaceRegistry::createValueIterator(connection, ds)
    
    # Iterate while it still has more values
    while it.hasNext():
      # Get the next value
      value = it.next()
    
      # How to use the value is specific to each type of data and demonstrated in the next tutorials.
      ...
    
  • # Create an iterator to access the values
    it = KDBDatasets::DataInterfaceRegistry.createValueIterator(connection, ds)
    
    # Iterate while it still has more values
    while it.hasNext()
      # Get the next value
      value = it.next()
    
      # How to use the value is specific to each type of data and demonstrated in the next tutorials.
      ...
    end
    

Import Export/Datasets

It is possible to export a dataset from a database to a file, and then to import it a different database. The exported file contains the metadata and the data. When importing, if the dataset metadata is not in the database, it will be added to the collections given as arguments to the import function, or to the collection of private dataset.

  • #include <kDBDatasets/DataInterfaceRegistry.h>
    
    // Open a file for writting
    QFile file("filename.kdb_dataset");
    file.open(QIODevice::WriteOnly);
    
    // Export to the file
    kDBDatasets::DataInterfaceRegistry::exportTo(connection, ds, &file);
    
    // Open a file for reading
    QFile file("filename.kdb_dataset");
    file.open(QIODevice::ReadOnly);
    
    // Export to the file.
    // The {} can be replaced by a list of collection, when given an empty
    // list, the metadata will be added to the collection of private
    // datasets.
    kDBDatasets::DataInterfaceRegistry::importFrom(connection, {}, &file);
    
  • # Export to the file
    kDBDatasets.DataInterfaceRegistry.exportTo(connection, ds, "filename.kdb_dataset")
    
    # Import from the file.
    # The [] can be replaced by a list of collection, when given an empty
    # list, the metadata will be added to the collection of private
    # datasets.
    kDBDatasets.DataInterfaceRegistry.importFrom(connection, [], "filename.kdb_dataset")
    
  • # Export to the file
    kDBDatasets.DataInterfaceRegistry.exportTo connection, ds, "filename.kdb_dataset"
    
    # Import from the file.
    # The [] can be replaced by a list of collection, when given an empty
    # list, the metadata will be added to the collection of private
    # datasets.
    kDBDatasets.DataInterfaceRegistry.importFrom connection, [], "filename.kdb_dataset"
    
  • # Export to the file.
    kdb datasets export --path path/to --port 1242 --filename filename.kdb_dataset "http://askco.re/examples#dataset_uri"
    
    # Import from the file. Add metadata to private collection.
    kdb datasets import --path path/to --port 1242 --filename filename.kdb_dataset
    
    # Import from the file. Add metadata to a collection called 'http://askco.re/examples#datasets_collection_uri'.
    kdb datasets import --path path/to --port 1242 --filename filename.kdb_dataset "http://askco.re/examples#datasets_collection_uri"
    

    Note, the database needs to be running, for instance using the kdb store command described in Getting Started Tutorial.

Next

Tutorials specific to the different types of data are available:

  • The salient regions tutorial covers an API for storing salient regions in an RDF Graph, which is very similar to the map idea presented in this tutorial.

  • The query images tutorial shows how to query images, display them and process them using OpenCV.

Tutorials covering the exchange of data between several instances are available:

  • RDF Graph Synchronisation covers how to synchronise RDF Graphs between agents. This can be used for the synchronisation of datasets metainformation between agents using ROS.
  • The Dataset Transfer tutorial presents how to exchange large dataset of sensory data between two platforms.