Overview
A distributed projection resides in the memory of the corresponding shard servers where the data is persistently stored. It can hold either full or partial data from a graphset and supports running graph algorithms, though it doesn't execute graph queries.
Managing Distributed Projections
Showing Distributed Projections
Retrieves information about all distributed projections of the current graphset:
show().project()
It returns a table _projectList
with the following fields:
Field |
Description |
---|---|
project_name |
Name of the projection. |
project_type |
Type of the projection, which is pregel for all distributed projections. |
graph_name |
Name of the current graphset from which the data was loaded. |
status |
Current state of the projection, which can be DONE or CREATING , FAILED or UNKNOWN . |
stats |
Node and edge statistics per shard, including address of the leader replica of the current graphset, edge_in_count , edge_out_count and node_count . |
config |
Configurations for the distributed projection. |
Creating a Distributed Projection
The create.project()
clause creates an in-memory projection of the current graphset to shard servers. The project creation is executed as a job, you may run show().job()
afterward to verify the success of the creation.
create().project(
"<projectName>",
["nodeSchema_1?", "nodeSchema_2?", ...],
["edgeSchema_1?", "edgeSchema_2?", ...],
{
node_properties: ["nodeProperty_1?", "nodeProperty_2?", ...],
edge_properties: ["edgeProperty_1?", "edgeProperty_2?", ...],
orientation: "<edgeDirection?>",
load_id: <boolean?>
}
)
Method | Param | Description | Optional | |
---|---|---|---|---|
project() |
<projectName> |
Name of the projection. Each distributed projection name within a database must be unique and cannot duplicate the name of any HDC projection of the same graphset. | No | |
nodes | Specifies nodes to project based on schemas. Only system properties _uuid and _id are loaded by default, while _id can be excluded by setting load_id to false in the config map. Sets to "*" to specify all schemas or leaves it blank to exclude nodes entirely. |
No | ||
edges | Specifies edges to project based on schemas. Only system properties are loaded by default. Sets to "*" to specify all schemas or leaves it blank to exclude edges entirely. |
No | ||
Config map | node_properties |
Specifies properties to project for the selected node schemas. Sets to "*" to load all available properties; it defaults to an empty list. |
Yes | |
edge_properties |
Specifies properties to project for the selected edge schemas. Sets to "*" to load all available properties; it defaults to an empty list. |
Yes | ||
orientation |
Since each edge is physically stored twice - as an incoming edge along its destination node and an outgoing edge with its source node - you can choose to project only incoming edges with in , only outgoing edges with out , or both with undirected . Please note that in or out restricts graph traversal during computation to the specified direction. |
No | ||
load_id |
Sets to false to project nodes without _id values to save the memory space; it defaults to true . |
Yes |
Examples
To project the entire current graphset to its shard servers as distGraph
:
create().project("distGraph", ["*"], ["*"], {
node_properties: ["*"],
edge_properties: ["*"],
orientation: "undirected",
load_id: true
})
To project @account
and @movie
nodes with selected properties and incoming @rate
edges in the current graphset to its shard servers as distGraph_1
, while omitting nodes' _id
values:
create().project("distGraph_1", ["account", "movie"], ["rate"], {
node_properties: ["name", "year", "gender"],
edge_properties: ["*"],
orientation: "in",
load_id: false
})
Dropping a Distributed Projection
You can drop any distributed projection of the current graphset from the shard servers using the drop().project()
clause.
The following example deletes the distributed projection named distGraph_1
:
drop().project("distGraph_1")
Example Graph and Projection
To create the graph, execute each of the following UQL queries sequentially in an empty graphset:
create().node_schema("entity").edge_schema("link")
create().edge_property(@link, "weight", float)
insert().into(@entity).nodes([{_id:"A"},{_id:"B"},{_id:"C"},{_id:"D"}])
insert().into(@link).edges([{_from:"A", _to:"B", weight:1},{_from:"A", _to:"C", weight:1.5},{_from:"A", _to:"D", weight:0.5},{_from:"B", _to:"C", weight:2},{_from:"C", _to:"D", weight:0.5}])
To create a distributed projection distGraph
of the entire graph:
create().project("distGraph", ["*"], ["*"], {
node_properties: ["*"],
edge_properties: ["*"],
orientation: "undirected",
load_id: true
})
Executing Algorithms
Distributed projections run distributed algorithms. Distributed algorithms run in File and DB writeback modes with the syntax algo().params().write()
. In the params()
method, you must include the parameter project
to specify the name of the projection.
File Writeback
Runs the Degree Centrality algorithm on distGraph
to compute the out-degree of all nodes and write the results back to a file degree.txt
:
algo(degree).params({
project: "distGraph",
return_id_uuid: "id",
direction: "out"
}).write({
file: {
filename: "degree.txt"
}
})
Result:
C,1
A,3
B,1
D,0
DB Writeback
Runs the Degree Centrality algorithm on distGraph
to compute the out-degree of all nodes and write the results back to the node property degree
:
algo(degree).params({
project: "distGraph",
return_id_uuid: "id",
direction: "out"
}).write({
db: {
property: "degree"
}
})
Graph Traversal Direction
If a distributed projection is created with the orientation
option set to in
or out
, graph traversal is restricted to incoming or outgoing edges, respectively. Algorithms attempting to traverse in the missing direction throws errors or yields empty results.
To create a distributed projection distGraph_in_edges
of the graph with nodes and incoming edges:
create().project("distGraph_in_edges", ["*"], ["*"], {
node_properties: ["*"],
edge_properties: ["*"],
orientation: "in",
load_id: true
})
The Degree Centrality algorithm computes the out-degree of all nodes on distGraph_in_edges
, they are all 0:
algo(degree).params({
project: "distGraph_in_edges",
return_id_uuid: "id",
direction: "out"
}).write({
file: {
filename: "degree.txt"
}
})
C,0
A,0
D,0
B,0
Exclusion of Node IDs
If a distributed projection is created with the load_id
option set to false
, it does not contain the _id
values for nodes. Algorithms referencing _id
throws errors or yields empty results. In algorithm writeback files, _id
values are replaced with _uuid
values instead.
To create a distributed projection distGraph_no_id
of the graph without nodes' _id
values:
create().project("distGraph_no_id", ["*"], ["*"], {
node_properties: ["*"],
edge_properties: ["*"],
orientation: "undirected",
load_id: false
})
The Degree Centrality algorithm computes the degree of all nodes on distGraph_no_id
and writes the results back to a file degree.txt
, nodes' _id
are replaced with _uuid
:
algo(degree).params({
project: "distGraph_no_id",
return_id_uuid: "id"
}).write({
file: {
filename: "degree.txt"
}
})
12033620403357220866,1
10016007770295238657,3
288232575174967298,0
3530824306881724417,1
Exclusion of Properties
If a distributed projection is created without certain properties, algorithms referencing those properties throws errors or yields empty results.
To create a distributed projection distGraph_no_weight
of the graph containing nodes and only system properties of edges:
create().project("distGraph_no_weight", ["*"], ["link"], {
node_properties: ["*"],
edge_properties: [],
orientation: "undirected",
load_id: true
})
The Degree Centrality algorithm computes the degree of all nodes weighted by the edge property @link.weight
on distGraph_no_weight
, error occurs as the weight
property is missing:
algo(degree).params({
project: "distGraph_no_weight",
edge_property: "@link.weight"
}).write({
file: {
filename: "degree.txt"
}
})