Overview
ArticleRank has been derived from PageRank to measure the influence of journal articles.
- J. Li, P. Willett, ArticleRank: a PageRank-based Alternative to Numbers of Citations for Analysing Citation Networks (2009)
Concepts
ArticleRank
Similar to links between webpages, citations between articles (books, reports, etc.) represent authoritativeness and high quality. It is normally assumed that the greater the number of citations that an article receives, the greater impact that article has within its particular research area.
However, not all articles are equally important. Hence, this approach based on PageRank was proposed to rank articles.
ArticleRank retains the basic PageRank methodology while making some modifications. When an article passes its rank among its forward links, it does not divide the rank equally by the out-degree of that article, but by the sum of the out-degree of that article and the average out-degree of all articles. The rank of article u after one iteration is:
where Bu is the backlink set of u, d is the damping factor. This change of the denominator reduces the bias that an article with very small out-degree makes a greater contribution to its forward links.
The denominator of Ultipa's ArticleRank is different from the original paper while the core idea is the same.
Considerations
In comparison with WWW, some features have to be considered for citation networks, such as:
- An article cannot cite itself, i.e., there is no self-loop in the network.
- Two articles cannot cite each other, i.e., an article cannot be both the forward link and the backlink of another article.
- The citations in a published article will not change, i.e., the forward links of an article is fixed.
Example Graph
To create this graph:
// Runs each row separately in order in an empty graphset
create().node_schema("book").edge_schema("cite")
insert().into(@book).nodes([{_id:"book1"}, {_id:"book2"}, {_id:"book3"}, {_id:"book4"}, {_id:"book5"}, {_id:"book6"}, {_id:"book7"}])
insert().into(@cite).edges([{_from:"book1", _to:"book4"}, {_from:"book1", _to:"book5"}, {_from:"book2", _to:"book4"}, {_from:"book3", _to:"book4"}, {_from:"book4", _to:"book5"}, {_from:"book4", _to:"book6"}])
Running on HDC Graphs
Creating HDC Graph
To load the entire graph to the HDC server hdc-server-1
as hdc_article_rank
:
CALL hdc.graph.create("hdc-server-1", "hdc_article_rank", {
nodes: {"*": ["*"]},
edges: {"*": ["*"]},
direction: "undirected",
load_id: true,
update: "static",
query: "query",
default: false
})
hdc.graph.create("hdc_article_rank", {
nodes: {"*": ["*"]},
edges: {"*": ["*"]},
direction: "undirected",
load_id: true,
update: "static",
query: "query",
default: false
}).to("hdc-server-1")
Parameters
Algorithm name: page_rank
Name |
Type |
Spec |
Default |
Optional |
Description |
---|---|---|---|---|---|
init_value |
Float | >0 | 0.2 |
Yes | The initial rank assigned to all nodes. |
loop_num |
Integer | ≥1 | 5 |
Yes | The maximum number of iteration rounds. The algorithm will terminate after completing all rounds. |
damping |
Float | (0,1) | 0.8 |
Yes | The damping factor. |
weaken |
Integer | 1 , 2 |
1 |
Yes | Keeps it as 2 for ArticleRank. Sets to 1 will run PageRank. |
return_id_uuid |
String | uuid , id , both |
uuid |
Yes | Includes _uuid , _id , or both values to represent nodes in the results. |
limit |
Integer | ≥-1 | -1 |
Yes | Limits the number of results returned; -1 includes all results. |
order |
String | asc , desc |
/ | Yes | Sorts the results by rank . |
File Writeback
CALL algo.page_rank.write("hdc_article_rank", {
params: {
return_id_uuid: "id",
init_value: 1,
loop_num: 50,
damping: 0.8,
weaken: 2,
order: "desc"
},
return_params: {
file: {
filename: "article_rank"
}
}
})
algo(page_rank).params({
project: "hdc_article_rank",
return_id_uuid: "id",
init_value: 1,
loop_num: 50,
damping: 0.8,
weaken: 2,
order: "desc"
}).write({
file: {
filename: "article_rank"
}
})
Result:
_id,rank
book4,0.428308
book5,0.375926
book6,0.319926
book2,0.2
book3,0.2
book7,0.2
book1,0.2
DB Writeback
Writes the rank
values from the results to the specified node property. The property type is float
.
CALL algo.page_rank.write("hdc_article_rank", {
params: {
loop_num: 50,
weaken: 2
},
return_params: {
db: {
property: "rank"
}
}
})
algo(page_rank).params({
project: "hdc_article_rank",
loop_num: 50,
weaken: 2
}).write({
db:{
property: 'rank'
}
})
Full Return
CALL algo.page_rank("hdc_article_rank", {
params: {
return_id_uuid: "id",
init_value: 1,
loop_num: 50,
damping: 0.8,
weaken: 2,
order: "desc",
limit: 3
},
return_params: {}
}) YIELD AR
RETURN AR
exec{
algo(page_rank).params({
return_id_uuid: "id",
init_value: 1,
loop_num: 50,
damping: 0.8,
weaken: 2,
order: "desc",
limit: 3
}) as AR
return AR
} on hdc_article_rank
Result:
_id | rank |
---|---|
book4 | 0.428308 |
book5 | 0.375926 |
book6 | 0.319926 |
Stream Return
CALL algo.page_rank("hdc_article_rank", {
params: {
return_id_uuid: "id",
loop_num: 50,
damping: 0.8,
weaken: 2,
order: "desc",
limit: 3
},
return_params: {
stream: {}
}
}) YIELD AR
RETURN AR
exec{
algo(page_rank).params({
return_id_uuid: "id",
loop_num: 50,
damping: 0.8,
weaken: 2,
order: "desc",
limit: 3
}).stream() as AR
return AR
} on hdc_article_rank
Result:
_id | rank |
---|---|
book4 | 0.428308 |
book5 | 0.375926 |
book6 | 0.319926 |
Running on Distributed Projections
Creating Distributed Projection
To project the entire graph to its shard servers as dist_article_rank
:
create().project("dist_article_rank", {
nodes: {"*": ["*"]},
edges: {"*": ["*"]},
direction: "undirected",
load_id: true
})
Parameters
Algorithm name: page_rank
Name |
Type |
Spec |
Default |
Optional |
Description |
---|---|---|---|---|---|
init_value |
Float | >0 | 0.2 |
Yes | The initial rank assigned to all nodes. |
max_iterations |
Integer | ≥1 | 5 |
Yes | The maximum number of iteration rounds. The algorithm will terminate after completing all rounds. |
damping |
Float | (0,1) | 0.8 |
Yes | The damping factor. |
weaken |
Integer | 1 , 2 |
1 |
Yes | Keeps it as 2 for ArticleRank. Sets to 1 will run PageRank. |
limit |
Integer | ≥-1 | -1 |
Yes | Limits the number of results returned; -1 includes all results. |
order |
String | asc , desc |
/ | Yes | Sorts the results by rank . |
File Writeback
CALL algo.page_rank.write("dist_article_rank", {
params: {
init_value: 1,
loop_num: 50,
damping: 0.8,
weaken: 2,
order: "desc"
},
return_params: {
file: {
filename: "article_rank"
}
}
})
algo(page_rank).params({
project: "dist_article_rank",
init_value: 1,
loop_num: 50,
damping: 0.8,
weaken: 2,
order: "desc"
}).write({
file: {
filename: "article_rank"
}
})
Result:
_id,rank
book4,0.5999999999999999778
book5,0.52000000000000001776
book6,0.44000000000000000222
book7,0.2000000000000000111
book3,0.2000000000000000111
book2,0.2000000000000000111
book1,0.2000000000000000111
DB Writeback
Writes the rank
values from the results to the specified node property. The property type is double
.
CALL algo.page_rank.write("dist_article_rank", {
params: {
loop_num: 50,
weaken: 2
},
return_params: {
db: {
property: "rank"
}
}
})
algo(page_rank).params({
project: "dist_article_rank",
loop_num: 50,
weaken: 2
}).write({
db:{
property: 'rank'
}
})