This page demonstrates the process of importing data from RDF file(s) into an Ultipa graphset.
Ultipa Transporter supports data import from RDF files in the format of
N-triples
,Turtle
andRDF/XML
.
The following steps are demonstrated using a Turtle file. See the example file.
data:image/s3,"s3://crabby-images/1a913/1a91357f2486ab1495e011b805b1914be8dfb2fa" alt=""
Generate Configuration File
Execute the following command in your command line tool and select rdf
.
./ultipa-importer --sample
The import.sample.rdf.yml
file will be generated in the same directory as ultipa-importer.exe
. If a import.sample.rdf.yml
file already exists in that directory, it will be overwritten.
Modify Configuration File
The configuration file consists of several sections. Modify the configuration file according to your needs.
# Mode options: csv/json/jsonl/rdf/graphml/bigQuery/sql/kafka/neo4j/salesforce; only one mode can be used
# SQL supports mysql/postgresSQL/sqlserver/snowflake/oracle
mode: rdf
# RDF configurations
rdf:
# Specify the file path
file: "./dataset/account.ttl"
# Specify the rdf format as "ntriples", "turtle" or "rdfxml"
format: "turtle"
# Ultipa server configurations
server:
# Host IP/URI and port
# If it is a cluster, separate hosts with commas, i.e., "<ip1>:<port1>,<ip2>:<port2>,<ip3>:<port3>"
host: "10.11.22.33:1234"
username: "admin"
password: "admin12345"
# The new or existing graphset where data will be imported
graphset: "myGraph"
# If the above graphset is new, specify the shards where it will be stored
shards: "1,2,3"
# If the above graphset is new, specify the partition function (Crc32/Crc64WE/Crc64XZ/CityHash64) used for sharding
partitionBy: "Crc32"
# Path of the certificate file for TLS (optional)
crt: ""
# Global settings
settings:
# Define the path to output the log file
logPath: "./logs"
# Number of rows included in each insertion batch
batchSize: 10000
# Import mode supports insert/overwrite/upsert
importMode: insert
# Automatically create missing end nodes for edges (applicable only when importing edges)
createNodeIfNotExist: false
# Stops the importing process when error occurs
stopWhenError: false
# The maximum threads
threads: 32
# The maximum size (in MB) of each packet
maxPacketSize: 40
# Timezone for the timestamp values
# timeZone: "+0200"
# Timestamp value unit, support ms/s
timestampUnit: s
Configuration Items
RDF settings
Field |
Type |
Description |
---|---|---|
file |
String | Path of the RDF file to be imported. Multiple files are allowed to be speicified. |
format |
String | Supported formats include ntriples , turtle and rdfxml , corresponding to RDF file extensions .ntl , .ttl and .xml , respectively. Ensure the format you specify matches the RDF file to be imported. Otherwise, errors may occur during parsing. |
To specify multiple files, configure in the following way:
- file: "./test_data/file1.ttl"
format: "turtle"
- file: "./test_data/file2.ntl"
format: "ntriples"
Ultipa server configurations
Field |
Type |
Description |
---|---|---|
host |
String | IP address or URI of the source database. |
username |
String | Database username. |
password |
String | Password of the above user. |
graphset |
String | Name of the target graphset for JSONL file import. If the specified graphset does not exist, it will be created automatically. |
shards |
String | Specifies the shards where data will be processed. |
partitionBy |
String | Specifies the patitioning algorithm, including Crc32 , Crc64WE , Crc64XZ and CityHash64 . |
crt |
String | Path to the certificate (CRT) file used for TLS encryption. |
Global settings
Field |
Type |
Default |
Description |
---|---|---|---|
logPath |
String | "./logs" | The path to save the log file. |
batchSize |
Integer | 10000 | Number of nodes or edges to insert per batch. |
importMode |
String | upsert | Specifies how the data is inserted into the graph, including overwrite , insert and upsert . When updating nodes or edges, use the upsert mode to prevent overwriting existing data. |
createNodeIfNotExist |
Bool | false | Whether missing nodes are automatically created when inserting edges:true : The system automatically creates nodes that do not exist.false : The related edges will not be imported. |
stopWhenError |
Bool | false | Whether to stop the import process when an error occurs. |
threads |
Integer | 32 | The maximum number of threads. 32 is suggested. |
maxPacketSize |
Integer | 40 | The maximum size of data packets in MB that can be sent or received. |
timestampUnit |
String | s | The unit of measurement for timestamp data. Supported units are ms (milliseconds) and s (seconds). |
Execute Import
The import process uses the configuration file specified by the -config
parameter to import data from RDF files into the target server and display it in the Ultipa graph structure.
./ultipa-importer --config import.sample.rdf.yml
Mapping Rules
- Subjects are mapped to nodes.
- Predicates are mapped:
- To node property names when the objects are literals.
- To edges when the objects act as subjects in other triples.
- Objects are mapped to property values if they are not subjects in other triples.
- Node schemas are set according to the subject prefix, with the following considerations:
- Blank subjects, treated as blank nodes, are inserted into the "default" schema.
- Subject without a prefix are assigned schema names starting from
ns0
, with subsequent schemas incrementing sequentially (e.g.,ns1
,ns2
, etc.).
- Subject without a prefix are assigned schema names starting from
- Blank subjects, treated as blank nodes, are inserted into the "default" schema.
- Edge schemas are set according to the predicate prefix.
- If the schema name is shorter than 2 characters, the system will duplicate it for schema creation (e.g., the schema "a" is duplicated as "aa").
Example File
In this example, an RDF file in the format of .ttl
is imported.
@prefix ultipaVoc: <http://ultipa.com/vocab/sw#> .
@prefix ultipaInd: <http://ultipa.com/ind#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix ex: <http://example.org/> .
ultipaInd:pythonSDK ultipaVoc:name "SDK" ;
a ultipaVoc:UltipaTool ;
ultipaVoc:version "4.3.0.1" ;
ultipaVoc:releaseDate "2023-03-06" ;
ultipaVoc:runsOn ultipaInd:UltipaGraph432 .
ultipaInd:transporter233 ultipaVoc:name "Transporter" ;
a ultipaVoc:UltipaTool ;
ultipaVoc:version "4.3.1" ;
ultipaVoc:releaseDate "2023-07-20" ;
ultipaVoc:runsOn ultipaInd:UltipaGraph432 .
ultipaInd:manager3028 ultipaVoc:name "UltipaManager" ;
a ultipaVoc:UltipaTool ;
ultipaVoc:version "4.3.0.2" ;
ultipaVoc:releaseDate "2023-05-29" ;
ultipaVoc:runsOn ultipaInd:UltipaGraph432 .
ultipaInd:UltipaGraph432 ultipaVoc:name "UltipaGraph" ;
a ultipaVoc:GraphPlatform , ultipaVoc:InspiringPlatform ;
ultipaVoc:version "4.3.2" .
# Normal nodes
ex:subject1 ex:predicate1 "normal string literal" .
ex:subject2 ex:predicate2 "another string literal" .
# Blank nodes
_:blankNode1 ex:predicate3 "string literal for blank node" .
ex:subject3 ex:predicate4 _:blankNode2 .
# Literals
ex:subject4 ex:predicate5 "365"^^xsd:integer .
ex:subject5 ex:predicate6 "true"^^xsd:boolean .
ex:subject6 ex:predicate7 "3.14"^^xsd:float .
ex:subject7 ex:predicate8 "2024-08-21T00:00:00Z"^^xsd:dateTime .