Datasets
WARNING auKsys/5 is still under development, and subject to changes. This documentation has not been fully updated with changes since auKsys/4.
A dataset is a data structures used to store a collection of sensor or processed data. Usually, this is data that has been collected together during an exploration missions. For instance, it can be a collection of images. But this can also be the results of processing data, like a collection of salient points. Datasets are handled by the kDBDatasets
library (TODO link to API dox).
In kDB, a datset is stored as part of a RDF graphs. Each RDF graph can contains multiple datasets. As part of the kDBDatasets
library, there is a Datasets
class (TODO link to API dox) which allow to create and retrieve a specific dataset. The Dataset
class (TODO link to API dox) allows to access the meta information of a specific dataset.
Enabling kDBDatasets
kDBDatasets
is provided as an extension, and need to be enabled in the store before it is used.
This needs to be done only once after creating the store.
Once an extension has been enabled, it is automatically available to all connections (current and future).
-
connection.enableExtension("kDBDatasets");
-
connection.enableExtension("kDBDatasets")
-
connection.enableExtension("kDBDatasets")
How to create a dataset
To create a dataset, it is first necesserary to select to which collection of datasets the dataset belongs to.
We assume that you have a connection
to a database (check the Getting started with a store).
The following examples shows how to access a collection:
-
#include <kDBDatasets/Datasets.h> // This access the set of private datasets. kDBDatasets::Datasets dss = kDBDatasets::Datasets::get( c, "http://askco.re/graphs#private_datasets"_kCu );
-
import kDBDatasets # This access the set of private datasets. dss = kDBDatasets.Datasets.get( connection, knowCore.Uri("http://askco.re/graphs#private_datasets") )
-
require 'kDBDatasets' # This access the set of private datasets. dss = KDBDatasets::Datasets.get( connection, kCu("http://askco.re/graphs#private_datasets") )
http://askco.re/graphs#private_datasets
refers to a collection of private datasets, that are not supposed to be shared with other agents. The URI can be replaced by any other URI for managing a different collection of datasets.
Then it is possible to create a dataset with:
-
#include <knowGIS/GeometryObject.h> #include <kDB/Repository/GraphsManager.h> #include <kDB/Repository/TriplesStore.h> #include <kDBDatasets/Dataset.h> #include <kDBDatasets/Datasets.h> // Uri for the dataset. knowCore::Uri my_dataset_uri("http://askco.re/examples#my_dataset"); // Polygon area corresponding to the dataset. knowGIS::GeometryObject geometry = knowGIS::GeometryObject::fromWKT("POLYGON((16.680529 57.761379, 16.682289 57.760258, 16.680336 57.758964, 16.677911 57.760693, 16.680529 57.761379))") // Create a dataset of salient regions, covering the area defined by go0 // and with the given URI. kDBDatasets::Dataset ds = dss.createDataset( "http://askco.re/sensing#salient_region"_kCu, go0, my_dataset_uri ); // It is also possible to let kDB generate a unique URI for the dataset. kDBDatasets::Dataset ds = dss.createDataset( "http://askco.re/sensing#salient_region"_kCu, go0);
-
import knowCore import knowGIS import kDBDatasets # Uri for the dataset. my_dataset_uri = knowCore.Uri("http://askco.re/examples#my_dataset") # Polygon area corresponding to the dataset. geometry = knowGIS.GeometryObject.fromWKT("POLYGON((16.680529 57.761379, 16.682289 57.760258, 16.680336 57.758964, 16.677911 57.760693, 16.680529 57.761379))") # Create a dataset of salient regions, covering the area defined by go0 # and with the given URI. ds = dss.createDataset( knowCore.Uri("http://askco.re/sensing#salient_region"), go0, my_dataset_uri ) # It is also possible to let kDB generate a unique URI for the dataset. ds = dss.createDataset( knowCore.Uri("http://askco.re/sensing#salient_region"), go0)
-
require 'knowGIS' require 'knowCore' require 'kDBDatasets' # Uri for the dataset we just created. my_dataset_uri = kCu("http://askco.re/examples#granso") # Polygon area corresponding to the dataset. geometry = KnowGIS::GeometryObject.fromWKT("POLYGON((16.680529 57.761379, 16.682289 57.760258, 16.680336 57.758964, 16.677911 57.760693, 16.680529 57.761379))") # Create a dataset of salient regions, covering the area defined by go0 # and with the given URI. ds = dss.createDataset( kCu("http://askco.re/sensing#salient_region"), go0, my_dataset_uri ) # This set the status of the dataset to completed. ds.setStatus(KDBDatasets::Dataset::Status::Completed) # This indicates that the current connection has a copy of the dataset. ds.associate(connection.serverUri()) # Create a dataset of salient regions, covering the area defined by go0 # and with the given URI. ds = dss.createDataset( knowCore.Uri("http://askco.re/sensing#salient_region"), go0, my_dataset_uri ) # It is also possible to let kDB generate a unique URI for the dataset. ds = dss.createDataset( "http://askco.re/sensing#salient_region"_kCu, go0)
The URI http://askco.re/sensing#salient_region
is specific for creating a dataset of salient regions. At present the following types are accepted:
http://askco.re/sensing#salient_region
for dataset of salient regionshttp://askco.re/sensing#image_frame
for dataset of imageshttp://askco.re/sensing#point_cloud
for dataset of a point cloudhttp://askco.re/sensing#lidar_scan
for dataset of 2D Lidar scanhttp://askco.re/sensing#lidar3d_scan
for dataset of 3D Lidar scan
Dataset metainformation
Some of the metainformation of a dataset is considered static and should not be changed after the dataset creation, such as the timestamp or the geometry. Some metatinformation is dynamic and can change, such as the status of the dataset or the list of agents who have a local copy of the information. The status is used, among other things, to indicate if the dataset is complete or under construction.
-
// This set the status of the dataset to completed ds.setStatus(kDBDatasets::Dataset::Status::Completed) // This indicates that the current connection has a copy of the dataset ds.associate(connection.serverUri())
-
# This set the status of the dataset to completed ds.setStatus(kDBDatasets.Dataset.Status.Completed) # This indicates that the current connection has a copy of the dataset ds.associate(connection.serverUri())
-
# This set the status of the dataset to completed ds.setStatus(KDBDatasets::Dataset::Status.Completed) # This indicates that the current connection has a copy of the dataset ds.associate(connection.serverUri())
The general API for changing or accessing a property of a dataset is to use respectively property
and setProperty
.
-
// Get the value of a property. knowCore::Value value = ds.property("..."_kCu).expectSuccess(); // Set the value of a property. ds.setProperty("..."_kCu, value);
-
# Get the value of a property. value = ds.property("..."_kCu) # Set the value of a property. ds.setProperty("..."_kCu, value)
-
# Get the value of a property. value = ds.property("..."_kCu) # Set the value of a property. ds.setProperty("..."_kCu, value)
The list of possible properties is dependent on the type of data and is defined in kDB/extensions/kDBDatasets/data/datasets_shacl.ttl
.
How to query for dataset
Information regarding datasets are stored in triple stores. As such, they can be queried for using SPARQL, however kDB also provide a high level API for accessing dataset and query them according to their properties.
If the URI of the dataset is known, it can easilly be retrieved using the kDBDatasets::Datsets::dataset
function:
-
kDBDatasets::Dataset ds = dss.dataset("..."_kCu).expectSucccess();
-
ds = dss.dataset(knowCore.Uri("..."))
-
ds = dss.dataset(kCu("..."))
-
The following can be used to return the list of all datasets
kdb datasets list –path path/to –port 1242
The following can be used to return the list of all datasets in a specific collection
kdb datasets list –path path/to –port 1242 “uri of the collection”
kDBDatasets also provides an advance query mechanism that allow to query according to the different properties. Bellow is an example for retrieving point cloud datasets of density at least 20 points/m^2
from the last 30 minutes.
-
// Create the constraints used to query for the dataset QList<QPair<knowCore::Uri, knowCore::ConstrainedValue>> constraints = { { // Set the uri of the property "askcore_sensing:point_density"_kCu, // Set a constraint > 20 knowCore::ConstrainedValue().apply(20, knowCore::ConstrainedValue::Type::Superior) }, { // Set the uri of the property "http://www.w3.org/2006/time#hasBeginning"_kCu, // Set a constraint > now - 30 minutes knowCore::ConstrainedValue().apply( knowCore::Timestamp::now() - knowCore::Timestamp::from<knowCore::Minutes>(30), knowCore::Timestamp::, knowCore::ConstrainedValue::Type::Superior) } }; // Query using the previously defined constraints QList<kDBDatasets::Dataset> list_of_ds = dss.datasets(constraints).expectSucccess();
-
import knowCore # Create the constraints used to query for the dataset constraints = { # Set the uri of the property knowCore.Uri("askcore_sensing:point_density"): # Set a constraint > 20 knowCore.ConstrainedValue().apply(20, knowCore.ConstrainedValue.Type.Superior), # Set the uri of the property knowCore.Uri("http://www.w3.org/2006/time#hasBeginning"): # Set a constraint > now - 30 minutes knowCore.ConstrainedValue().apply( knowCore.Timestamp.fromDateTime(datetime.datetime.now() - datetime.timedelta(minutes=30)), knowCore.ConstrainedValue.Type.Superior) } # Query using the previously defined constraints list_of_ds = dss.datasets(constraints)
-
require 'knowCore' # Create the constraints used to query for the dataset constraints = { # Set the uri of the property kCu("askcore_sensing:point_density"): # Set a constraint > 20 KnowCore::ConstrainedValue.new().apply(20, KnowCore::ConstrainedValue::Type::Superior), # Set the uri of the property kCu("http://www.w3.org/2006/time#hasBeginning"): # Set a constraint > now - 30 minutes KnowCore::ConstrainedValue.new().apply( KnowCore.Timestamp.fromTime(Time.new - 30*60), KnowCore::ConstrainedValue::Type::Superior) } # Query using the previously defined constraints list_of_ds = dss.datasets(constraints)
How to access the content of a dataset
The kDBDatasets
library provides iterators for accessing the content of a dataset. The iterators are defined in the kDBDatasets::DataInterfaceRegistry
class. There are three types of iterators: insert
, extract
and value
. insert
and extract
are used for copying datasets. value
is used to access individual data points (a.k.a images, salient regions…).
-
#include <kDBDatasets/DataInterfaceRegistry.h> // Create an iterator to access the values kDBDatasets::ValueIterator it = kDBDatasets::DataInterfaceRegistry::createValueIterator(connection, ds).expectSuccess(); // Iterate while it still has more values while(it.hasNext()) { // Get the next value knowCore::Value value = it.next().expectSuccess(); // The value needs to be converted to its relevant C++ class for use. // This is specific to each type of data and demonstrated in the next tutorials. ... }
-
# Create an iterator to access the values it = kDBDatasets::DataInterfaceRegistry::createValueIterator(connection, ds) # Iterate while it still has more values while it.hasNext(): # Get the next value value = it.next() # How to use the value is specific to each type of data and demonstrated in the next tutorials. ...
-
# Create an iterator to access the values it = KDBDatasets::DataInterfaceRegistry.createValueIterator(connection, ds) # Iterate while it still has more values while it.hasNext() # Get the next value value = it.next() # How to use the value is specific to each type of data and demonstrated in the next tutorials. ... end
Import Export/Datasets
It is possible to export a dataset from a database to a file, and then to import it a different database. The exported file contains the metadata and the data. When importing, if the dataset metadata is not in the database, it will be added to the collections given as arguments to the import function, or to the collection of private dataset.
-
#include <kDBDatasets/DataInterfaceRegistry.h> // Open a file for writting QFile file("filename.kdb_dataset"); file.open(QIODevice::WriteOnly); // Export to the file kDBDatasets::DataInterfaceRegistry::exportTo(connection, ds, &file); // Open a file for reading QFile file("filename.kdb_dataset"); file.open(QIODevice::ReadOnly); // Export to the file. // The {} can be replaced by a list of collection, when given an empty // list, the metadata will be added to the collection of private // datasets. kDBDatasets::DataInterfaceRegistry::importFrom(connection, {}, &file);
-
# Export to the file kDBDatasets.DataInterfaceRegistry.exportTo(connection, ds, "filename.kdb_dataset") # Import from the file. # The [] can be replaced by a list of collection, when given an empty # list, the metadata will be added to the collection of private # datasets. kDBDatasets.DataInterfaceRegistry.importFrom(connection, [], "filename.kdb_dataset")
-
# Export to the file kDBDatasets.DataInterfaceRegistry.exportTo connection, ds, "filename.kdb_dataset" # Import from the file. # The [] can be replaced by a list of collection, when given an empty # list, the metadata will be added to the collection of private # datasets. kDBDatasets.DataInterfaceRegistry.importFrom connection, [], "filename.kdb_dataset"
-
# Export to the file. kdb datasets export --path path/to --port 1242 --filename filename.kdb_dataset "http://askco.re/examples#dataset_uri" # Import from the file. Add metadata to private collection. kdb datasets import --path path/to --port 1242 --filename filename.kdb_dataset # Import from the file. Add metadata to a collection called 'http://askco.re/examples#datasets_collection_uri'. kdb datasets import --path path/to --port 1242 --filename filename.kdb_dataset "http://askco.re/examples#datasets_collection_uri"
Note, the database needs to be running, for instance using the
kdb store
command described in Getting Started Tutorial.
Next
Tutorials specific to the different types of data are available:
-
The salient regions tutorial covers an API for storing salient regions in an RDF Graph, which is very similar to the
map
idea presented in this tutorial. -
The query images tutorial shows how to query images, display them and process them using OpenCV.
Tutorials covering the exchange of data between several instances are available:
- RDF Graph Synchronisation covers how to synchronise RDF Graphs between agents. This can be used for the synchronisation of datasets metainformation between agents using ROS.
- The Dataset Transfer tutorial presents how to exchange large dataset of sensory data between two platforms.