Overview
In mathematics, the Euclidean distance between two points in Euclidean space is the length of a line segment between the two points. In the graph, specifying N numeric properties (features) of nodes to indicate the location of the node in an N-dimensional Euclidean space.
Concepts
Euclidean Distance
In 2-dimensional space, the formula to compute the Euclidean distance between points A(x1, y1) and B(x2, y2) is:
In 3-dimensional space, the formula to compute the Euclidean distance between points A(x1, y1, z1) and B(x2, y2, z2) is:
Generalize to N-dimensional space, the formula to compute the Euclidean distance is:
where xi1 represents the i-th dimensional coordinates of the first point, xi2 represents the i-th dimensional coordinates of the second point.
The Euclidean distance ranges from 0 to +∞; the smaller the value, the more similar the two nodes.
Normalized Euclidean Distance
Normalized Euclidean distance scales the Euclidean distance into range from 0 to 1; the closer to 1, the more similar the two nodes.
Ultipa adopts the following formula to normalize the Euclidean distance:
Considerations
- Theoretically, the calculation of Euclidean distance between two nodes does not depend on their connectivity.
Example Graph
To create this graph:
// Runs each row separately in order in an empty graphset
create().node_schema("product")
create().node_property(@product, "price", int32).node_property(@product, "weight", int32).node_property(@product, "width", int32).node_property(@product, "height", int32)
insert().into(@product).nodes([{_id:"product1", price:50, weight:160, width:20, height:152}, {_id:"product2", price:42, weight:90, width:30, height:90}, {_id:"product3", price:24, weight:50, width:55, height:70}, {_id:"product4", price:38, weight:20, width:32, height:66}])
Creating HDC Graph
To load the entire graph to the HDC server hdc-server-1
as hdc_sim_prop
:
CALL hdc.graph.create("hdc-server-1", "hdc_sim_prop", {
nodes: {"*": ["*"]},
edges: {"*": ["*"]},
direction: "undirected",
load_id: true,
update: "static",
query: "query",
default: false
})
hdc.graph.create("hdc_sim_prop", {
nodes: {"*": ["*"]},
edges: {"*": ["*"]},
direction: "undirected",
load_id: true,
update: "static",
query: "query",
default: false
}).to("hdc-server-1")
Parameters
Algorithm name: similarity
Name |
Type |
Spec |
Default |
Optional |
Description |
---|---|---|---|---|---|
ids |
[]_id |
/ | / | No | Specifies the first group of nodes for computation by their _id ; computes for all nodes if it is unset. |
uuids |
[]_uuid |
/ | / | No | Specifies the first group of nodes for computation by their _uuid ; computes for all nodes if it is unset. |
ids2 |
[]_id |
/ | / | No | Specifies the second group of nodes for computation by their _id ; computes for all nodes if it is unset. |
uuids2 |
[]_uuid |
/ | / | No | Specifies the second group of nodes for computation by their _uuid ; computes for all nodes if it is unset. |
type |
String | euclideanDistance , euclidean |
cosine |
No | Specifies the type of similarity to compute; use euclideanDistance to compute Euclidean Distance, and use euclidean to compute Normalized Euclidean Distance. |
node_schema_property |
[]"<@schema.?><property> " |
/ | / | No | Numeric node properties to form a vector for each node; all specified properties must belong to the same label (schema). |
return_id_uuid |
String | uuid , id , both |
uuid |
Yes | Includes _uuid , _id , or both to represent nodes in the results. |
order |
String | asc , desc |
/ | Yes | Sorts the results by similarity . |
limit |
Integer | ≥-1 | -1 |
Yes | Limits the number of results returned; -1 includes all results. |
top_limit |
Integer | ≥-1 | -1 |
Yes | Limits the number of results returned for each node specified with ids /uuids in selection mode; -1 includes all results with a similarity greater than 0. This parameter is invalid in pairing mode. |
The algorithm has two calculation modes:
- Pairing: When both
ids
/uuids
andids2
/uuids2
are configured, each node inids
/uuids
is paired with each node inids2
/uuids2
(excluding self-pairing), and pairwise similarities are computed. - Selection: When only
ids
/uuids
is configured, pairwise similarities are computed between each target node and all other nodes in the graph. The results include all or a limited number of nodes with a similarity > 0 to the target node, ordered in descending similarity.
File Writeback
Computes similarities in pairing mode:
CALL algo.similarity.write("hdc_sim_prop", {
params: {
return_id_uuid: "id",
ids: "product1",
ids2: ["product2", "product3", "product4"],
node_schema_property: ["price", "weight", "width", "height"],
type: "euclideanDistance"
},
return_params: {
file: {
filename: "euclideanDistance"
}
}
})
algo(similarity).params({
project: "hdc_sim_prop",
return_id_uuid: "id",
ids: "product1",
ids2: ["product2", "product3", "product4"],
node_schema_property: ["price", "weight", "width", "height"],
type: "euclideanDistance"
}).write({
file: {
filename: "euclideanDistance"
}
})
Result:
_id1,_id2,similarity
product1,product2,94.3822
product1,product3,143.962
product1,product4,165.179
Full Return
CALL algo.similarity("hdc_sim_prop", {
params: {
return_id_uuid: "id",
ids: ["product1","product2"],
ids2: ["product2","product3","product4"],
node_schema_property: ["price", "weight", "width", "height"],
type: "euclideanDistance"
},
return_params: {}
}) YIELD distance
RETURN distance
exec{
algo(similarity).params({
return_id_uuid: "id",
ids: ["product1","product2"],
ids2: ["product2","product3","product4"],
node_schema_property: ["price", "weight", "width", "height"],
type: "euclideanDistance"
}) as distance
return distance
} on hdc_sim_prop
Result:
_id1 | _id2 | similarity |
---|---|---|
product1 | product2 | 94.382202 |
product1 | product3 | 143.961807 |
product1 | product4 | 165.178696 |
product2 | product3 | 54.304695 |
product2 | product4 | 74.135010 |
Stream Return
CALL algo.similarity("hdc_sim_prop", {
params: {
return_id_uuid: "id",
ids: ["product1", "product3"],
node_schema_property: ["price", "weight", "width", "height"],
type: "euclideanDistance",
top_limit: 1
},
return_params: {
stream: {}
}
}) YIELD top
RETURN top
exec{
algo(similarity).params({
return_id_uuid: "id",
ids: ["product1", "product3"],
node_schema_property: ["price", "weight", "width", "height"],
type: "euclideanDistance",
top_limit: 1
}).stream() as top
return top
} on hdc_sim_prop
Result:
_id1 | _id2 | similarity |
---|---|---|
product1 | product4 | 165.178696 |
product3 | product1 | 143.961807 |