Skip to content

Commit 4a12d01

Browse files
lennessyyparkererickson-tgYimingPan1997alexthomasTGwyattjoynertg
authored
[ALGOS-74] feat(algos): added yml descriptions to GDS algorithms; (#8)
* feat(improve fastrp): rand function improvement for 14x speedup * added yml descriptions * updated modified queries * update modified queires * add yml file for cycle detection batch * Update tg_sub_estimated_diameter.yml * Update tg_category_topological_link_prediction.yml * Update tg_category_topological_link_prediction.yml * Update tg_algo_closeness_cent_approx.yml removed extra period * Update tg_algo_closeness_cent.yml rephrase schema constraints * Update tg_algo_degree_cent.yml rephrase schema constraints * Update tg_algo_eigenvector.yml rephrase schema constraints and description * Update tg_algo_harmonic_cent.yml update schema constraints * Update tg_algo_influence_maximization_CELF.yml * Update tg_sub_pagerank.yml * Update tg_algo_greedy_graph_coloring.yml * Update tg_sub_k_nearest_neighbors.yml * Update tg_category_classification.yml * incorporate feedback * Update tg_algo_astar.yml Currently, remove this algo from graph studio * docs(knn_cosine): fix minor type * docs(knn_all): add schema constraints * docs(knn_cosine_cv): add schema constraints * docs(knn_cosine_ss): add schema constraints Co-authored-by: Parker Erickson <[email protected]> Co-authored-by: Yiming Pan <[email protected]> Co-authored-by: a-m-thomas <[email protected]> Co-authored-by: a-m-thomas <[email protected]> Co-authored-by: wyatt-joyner-tg <[email protected]> Co-authored-by: [email protected] <[email protected]>
1 parent 2016312 commit 4a12d01

File tree

89 files changed

+231
-205
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

89 files changed

+231
-205
lines changed

algorithms/.DS_Store

-6 KB
Binary file not shown.

algorithms/Centrality/article_rank/tg_algo_article_rank.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
name: Article Rank
1111
filename: "tg_article_rank.gsql"
1212
sha_id: ed6ea869749977cc0f3df71225d7325fb81c9767
13-
description: ArticleRank is an algorithm that has been derived from the PageRank algorithm to measure the influence of journal articles. PageRank assumes that relationships originating from low-degree nodes have a higher influence than relationships from high-degree nodes. Article Rank modifies the formula in such a way that it retains the basic PageRank methodology but lowers the influence of low-degree nodes.
13+
description: "Measures the influence of vertices in a graph. ArticleRank retains the basic PageRank methodology but lowers the influence of low-degree nodes."
1414
schema_constraints: If you want to write the results of this algorithm (FLOAT) back to the vertices, you must have a FLOAT attribute on the target vertex type.
1515
version: lib3.0
1616
include: true

algorithms/Centrality/betweenness/tg_algo_betweenness_cent.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
name: Betweenness Centrality
1111
filename: "tg_betweenness_cent.gsql"
1212
sha_id: ed6ea869749977cc0f3df71225d7325fb81c9767
13-
description: The Betweenness Centrality of a vertex is defined as the number of shortest paths that pass through this vertex, divided by the total number of shortest paths.
13+
description: "Calculates the betweenness centrality of vertices in a graph."
1414
schema_constraints: If you want to write the results of this algorithm (FLOAT) back to the vertices, you must have a FLOAT attribute on the target vertex type.
1515
version: lib3.0
1616
include: true

algorithms/Centrality/closeness/approximate/tg_algo_closeness_cent_approx.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,6 @@
1010
name: Approximate Closeness Centrality
1111
filename: "tg_closeness_cent_approx.gsql"
1212
sha_id: ed6ea869749977cc0f3df71225d7325fb81c9767
13-
description: The Approximate Closeness Centrality algorithm (based on Cohen et al. 2014) calculates the approximate closeness centrality score for each vertex by combining two estimation approaches - sampling and pivoting. This hybrid estimation approach offers near-linear time processing and linear space overhead within a small relative error. It runs on graphs with unweighted edges (directed or undirected).
13+
description: "Calculates the approximate closeness centrality score for each vertex. This algorithm offers near-linear time processing and linear space overhead within a small relative error."
1414
version: lib3.0
1515
include: true

algorithms/Centrality/closeness/exact/tg_algo_closeness_cent.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
name: Closeness Centrality
1111
filename: "tg_closeness_cent.gsql"
1212
sha_id: ed6ea869749977cc0f3df71225d7325fb81c9767
13-
description: TigerGraph’s closeness centrality algorithm uses multi-source breadth-first search (MS-BFS) to traverse the graph and calculate the sum of a vertex’s distance to every other vertex in the graph, which vastly improves the performance of the algorithm.
14-
schema_constraints: If you want to write the results of this algorithm (FLOAT) back to the vertices, you must have a FLOAT attribute on the target vertex type.
13+
description: "Calculates the exact closeness centrality of an algorithm. This algorithm might be time-consuming on large graphs when compared to the approximate version."
14+
schema_constraints: If you want to write the results of this algorithm (FLOAT) back to the vertices, the target vertex type must have a FLOAT attribute.
1515
version: lib3.0
1616
include: true
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11
---
2-
description: The closeness centrality score is calculated as the inverse of the average of distances from each vertex to every other vertex in the graph. TigerGraph offers an exact and approximate version.
2+
description: "Calculates the closeness centrality of vertices in a graph. TigerGraph offers different algorithms that calculate approximate or exact closeness centrality."

algorithms/Centrality/degree/tg_algo_degree_cent.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
name: Degree Centrality
1111
filename: "tg_degree_cent.gsql"
1212
sha_id: ed6ea869749977cc0f3df71225d7325fb81c9767
13-
description: Degree centrality is defined as the number of edges incident upon a node (i.e., the number of ties that a node has). The degree can be interpreted in terms of the immediate risk of a node for catching whatever is flowing through the network (such as a virus, or some information).
14-
schema_constraints: If you want to write the results of this algorithm (FLOAT) back to the vertices, you must have a FLOAT attribute on the target vertex type.
13+
description: "Calculates the degree centrality of vertices in a graph."
14+
schema_constraints: If you want to write the results of this algorithm (FLOAT) back to the vertices, the target vertex type must have a FLOAT attribute.
1515
version: lib3.0
1616
include: true

algorithms/Centrality/eigenvector/tg_algo_eigenvector.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
name: Eigenvector Centrality
1111
filename: "tg_eigenvector_cent.gsql"
1212
sha_id: ed6ea869749977cc0f3df71225d7325fb81c9767
13-
description: Eigenvector centrality (also called eigencentrality or prestige score) is a measure of the influence of a vertex in a network. Relative scores are assigned to all vertices in the network based on the concept that connections to high-scoring vertices contribute more to the score of the vertex in question than equal connections to low-scoring vertices. A high eigenvector score means that a vertex is connected to many vertices who themselves have high scores.
14-
schema_constraints: If you want to write the results of this algorithm (FLOAT) back to the vertices, you must have a FLOAT attribute on the target vertex type.
13+
description: "Calculates the eigenvector centrality of vertices in a graph. A high eigenvector centrality score means that a vertex is connected to many vertices that themselves have high scores."
14+
schema_constraints: If you want to write the results of this algorithm (FLOAT) back to the vertices, the target vertex type must have a FLOAT attribute.
1515
version: lib3.0
1616
include: true

algorithms/Centrality/harmonic/tg_algo_harmonic_cent.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
name: Harmonic Centrality
1111
filename: "tg_harmonic_cent.gsql"
1212
sha_id: ed6ea869749977cc0f3df71225d7325fb81c9767
13-
description: The Harmonic Centrality algorithm calculates the harmonic centrality of each vertex in the graph. Harmonic Centrality is a variant of Closeness Centrality. In a (not necessarily connected) graph, the harmonic centrality reverses the sum and reciprocal operations in the definition of closeness centrality.
14-
schema_constraints: If you want to write the results of this algorithm (FLOAT) back to the vertices, you must have a FLOAT attribute on the target vertex type.
13+
description: "Calculates the harmonic centrality of each vertex in the graph. Harmonic centrality is a variant of closeness centrality."
14+
schema_constraints: If you want to write the results of this algorithm (FLOAT) back to the vertices, the target vertex type must have a FLOAT attribute.
1515
version: lib3.0
1616
include: true

algorithms/Centrality/influence_maximization/CELF/tg_algo_influence_maximization_CELF.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
name: Cost Effective Lazy Forward (CELF) Influence Maximization
1111
filename: "tg_influence_maximization_CELF.gsql"
1212
sha_id: ed6ea869749977cc0f3df71225d7325fb81c9767
13-
description: Influence maximization is the problem of finding a small subset of vertices in a social network that could maximize the spread of influence. There are two versions of the Influence Maximization algorithm. Both versions find k vertices that maximize the expected spread of influence in the network. The CELF version improves upon the efficiency of the greedy version and should be preferred in analyzing large networks.
13+
description: "This version is more efficient than the greedy version and should be preferred in analyzing large networks."
1414
schema_constraints: This algorithm also requires a FLOAT attribute on the target edge types representing weight or influence.
1515
version: lib3.0
1616
include: false

algorithms/Centrality/influence_maximization/greedy/tg_algo_influence_maximization_greedy.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
name: Greedy Influence Maximization
1111
filename: "tg_influence_maximization_greedy.gsql"
1212
sha_id: ed6ea869749977cc0f3df71225d7325fb81c9767
13-
description: Influence maximization is the problem of finding a small subset of vertices in a social network that could maximize the spread of influence. There are two versions of the Influence Maximization algorithm. Both versions find k vertices that maximize the expected spread of influence in the network.
13+
description: "This version is more time-consuming than the CELF version."
1414
schema_constraints: This algorithm also requires a FLOAT attribute on the target edge types representing weight or influence.
1515
version: lib3.0
1616
include: false
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11
---
2-
description: The library is currently under construction! Descriptions will be added soon.
2+
description: "Influence Maximization algorithms find a specified number of vertices that maximize the expected spread of influence in a network. The CELF version improves upon the efficiency of the greedy version and should be preferred in analyzing large networks."
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11
---
2-
description: The library is currently under construction! Descriptions will be added soon.
2+
description: "In the global versions, the imaginary user can start browsing from any page as opposed to a specific set of pages."

algorithms/Centrality/pagerank/global/unweighted/tg_algo_pagerank.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
name: Pagerank
1111
filename: "tg_pagerank.gsql"
1212
sha_id: ed6ea869749977cc0f3df71225d7325fb81c9767
13-
description: "The PageRank algorithm measures the influence of each vertex on every other vertex. PageRank influence is defined recursively: a vertex’s influence is based on the influence of the vertices which refer to it. A vertex’s influence tends to increase if (1) it has more referring vertices or if (2) its referring vertices have higher influence. The analogy to social influence is clear."
13+
description: "Measures the influence of each vertex on every other vertex in a graph with unweighted edges"
1414
schema_constraints: If you want to write the results of this algorithm (FLOAT) back to the vertices, you must have a FLOAT attribute on the target vertex type.
1515
version: lib3.0
1616
include: true

algorithms/Centrality/pagerank/global/weighted/tg_algo_pagerank_wt.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
name: Weighted Pagerank
1111
filename: "tg_pagerank_wt.gsql"
1212
sha_id: ed6ea869749977cc0f3df71225d7325fb81c9767
13-
description: "The PageRank algorithm measures the influence of each vertex on every other vertex. PageRank influence is defined recursively: a vertex’s influence is based on the influence of the vertices which refer to it. A vertex’s influence tends to increase if (1) it has more referring vertices or if (2) its referring vertices have higher influence. The analogy to social influence is clear. The only difference between weighted PageRank and standard PageRank is that edges have weights, and the influence that a vertex receives from an in-neighbor is multiplied by the weight of the in-edge."
14-
schema_constraints: If you want to write the results of this algorithm (FLOAT) back to the vertices, you must have a FLOAT attribute on the target vertex type. This algorithm also requires a FLOAT attribute on the target edge types representing weight or influence.
13+
description: "Measures the influence of each vertex on every other vertex in a graph with weighted edges. Multiplies a vertex's received influence by the weight of the in-edge."
14+
schema_constraints: If you want to write the results of this algorithm (FLOAT) back to the vertices, the target vertex type must have a FLOAT attribute. This algorithm also requires a FLOAT attribute on the target edge types representing weight or influence.
1515
version: lib3.0
1616
include: true

algorithms/Centrality/pagerank/personalized/all_pairs/tg_algo_pagerank_pers_ap_batch.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,6 @@
1010
name: Personalized Pagerank (All Pairs, Batch)
1111
filename: "tg_pagerank_pers_ap_batch.gsql"
1212
sha_id: ed6ea869749977cc0f3df71225d7325fb81c9767
13-
description: "The PageRank algorithm measures the influence of each vertex on every other vertex. PageRank influence is defined recursively: a vertex’s influence is based on the influence of the vertices which refer to it. A vertex’s influence tends to increase if (1) it has more referring vertices or if (2) its referring vertices have higher influence. The analogy to social influence is clear. In the original PageRank, the damping factor is the probability of the surfer continues browsing at each step. The surfer may also stop browsing and start again from a random vertex. In personalized PageRank, the surfer can only start browsing from a given set of source vertices both at the beginning and after stopping. "
13+
description: "Calculates the personalized PageRank score starting from each vertex to every other vertex."
1414
version: lib3.0
1515
include: false

algorithms/Centrality/pagerank/personalized/multi_source/tg_algo_pagerank_pers.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,6 @@
1010
name: Personalized Pagerank
1111
filename: "tg_pagerank_pers.gsql"
1212
sha_id: ed6ea869749977cc0f3df71225d7325fb81c9767
13-
description: "The PageRank algorithm measures the influence of each vertex on every other vertex. PageRank influence is defined recursively: a vertex’s influence is based on the influence of the vertices which refer to it. A vertex’s influence tends to increase if (1) it has more referring vertices or if (2) its referring vertices have higher influence. The analogy to social influence is clear. In the original PageRank, the damping factor is the probability of the surfer continues browsing at each step. The surfer may also stop browsing and start again from a random vertex. In personalized PageRank, the surfer can only start browsing from a given set of source vertices both at the beginning and after stopping. "
13+
description: "Calculates the personalized PageRank score starting from a specific set of source vertices."
1414
version: lib3.0
1515
include: false
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
11
---
2-
description: The library is currently under construction! Descriptions will be added soon.
2+
description: "Measures the influence of each vertex on every other vertex.
3+
PageRank influence is defined recursively: a vertex’s influence is based on the influence of the vertices which refer to it."
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11
---
2-
description: Centrality algorithms calculate the 'importance' of each vertex given a particular metric. These metrics generally revolve around density of a vertex's connectivity or the importance of that vertex to the general connectivity of the entire graph. Some widely used examples include Betweenness Centrality, which produces scores for vertices based on the number of shortest paths that they appear in and Closeness Centrality, which measures importance inversely proportional to how 'far' the vertex is away from every other vertex.
2+
description: "Centrality algorithms calculate the 'importance' of each vertex given a particular metric. These metrics generally revolve around density of a vertex's connectivity or the importance of that vertex to the general connectivity of the entire graph. Some widely used examples include Betweenness Centrality, which produces scores for vertices based on the number of shortest paths that they appear in and Closeness Centrality, which measures importance inversely proportional to how 'far' the vertex is away from every other vertex."

algorithms/Classification/greedy_graph_coloring/tg_algo_greedy_graph_coloring.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,6 @@
1010
name: Greedy Graph Coloring
1111
filename: "tg_greedy_graph_coloring.gsql"
1212
sha_id: ed6ea869749977cc0f3df71225d7325fb81c9767
13-
description: This algorithm assigns a unique integer value known as its color to the vertices of a graph such that no neighboring vertices share the same color. The reason why this is called color is that this task is equivalent to assigning a color to each nation on a map so that no neighboring nations share the same color.
13+
description: "Assigns a unique integer value, known as 'color', to the vertices of a graph such that no neighboring vertices share the same color."
1414
version: lib3.0
1515
include: false

algorithms/Classification/k_nearest_neighbors/all_pairs/tg_algo_knn_cosine_all.yml

+2-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,8 @@
1010
name: K Nearest Neighbors (All Pairs)
1111
filename: "tg_knn_cosine_all.gsql"
1212
sha_id: ed6ea869749977cc0f3df71225d7325fb81c9767
13-
description: The k-Nearest Neighbors (kNN) algorithm is one of the simplest classification algorithms. It assumes that some or all the vertices in the graph have already been classified. The classification is stored as an attribute called the label. The goal is to predict the label of a given vertex, by seeing what are the labels of the nearest vertices. This algorithm is a batch version of the k-Nearest Neighbors, Cosine Neighbor Similarity, single vertex. It makes a prediction for every vertex whose label is not known (i.e., the attribute for the known label is empty), based on its k nearest neighbors' labels.
13+
description: "This algorithm makes a prediction for every vertex whose label is not known based on its k nearest neighbors' labels."
14+
schema_constraints: "This algorithm requires a FLOAT attribute on the target edge types representing weight, and a STRING attribute representing the label."
1415
version: lib3.0
1516
include: false
1617
dependencies:

algorithms/Classification/k_nearest_neighbors/cross_validation/tg_algo_knn_cosine_cv.yml

+2-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,8 @@
1010
name: K Nearest Neighbors (Cross Validation)
1111
filename: "tg_knn_cosine_cv.gsql"
1212
sha_id: ed6ea869749977cc0f3df71225d7325fb81c9767
13-
description: k-Nearest Neighbors (kNN) is often used for machine learning. You can choose the value for topK based on your experience, or using cross-validation to optimize the hyperparameters. In our library, Leave-one-out cross-validation for selecting optimal k is provided. Given a k value, we run the algorithm repeatedly using every vertex with a known label as the source vertex and predict its label. We assess the accuracy of the predictions for each value of k, and then repeat for different values of k in the given range. The goal is to find the value of k with highest predicting accuracy in the given range, for that dataset.
13+
description: "This algorithm runs the single source version repeatedly using every vertex with a known label as the source vertex and predicts its label. It assesses the accuracy of the predictions for each value of k, and then repeats for different values of k in the given range. The goal is to find the value of k with highest predicting accuracy in the given range, for that dataset."
14+
schema_constraints: "This algorithm requires a FLOAT attribute on the target edge types representing weight, and a STRING attribute representing the label."
1415
version: lib3.0
1516
include: false
1617
dependencies:

algorithms/Classification/k_nearest_neighbors/single_source/tg_algo_knn_cosine_ss.yml

+2-1
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
name: K Nearest Neighbors (Single Source)
1111
filename: "tg_knn_cosine_ss.gsql"
1212
sha_id: ed6ea869749977cc0f3df71225d7325fb81c9767
13-
description: The library is currently under construction! Descriptions will be added soon.
13+
description: "This algorithm calculates the distance between a single source vertex and all other vertices and selects the k vertices that are nearest. "
14+
schema_constraints: "This algorithm requires a FLOAT attribute on the target edge types representing weight, and a STRING attribute representing the label."
1415
version: lib3.0
1516
include: false
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11
---
2-
description: The library is currently under construction! Descriptions will be added soon.
2+
description: "Predicts the label of a given vertex based on the labels of its nearest vertices. The label is a vertex attribute that stores the classification of a vertex. This algorithm assumes that the vertices have already been classified."

algorithms/Classification/maximal_independent_set/deterministic/tg_algo_maximal_indep_set.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,6 @@
1010
name: Maximal Independent Set (Deterministic)
1111
filename: "tg_maximal_indep_set.gsql"
1212
sha_id: ed6ea869749977cc0f3df71225d7325fb81c9767
13-
description: "An independent set of vertices does not contain any pair of vertices that are neighbors, i.e., ones which have an edge between them. A maximal independent set (MIS) is the largest independent set that contains those vertices; you cannot improve upon it unless you start over with a different independent set. However, the search for the largest possible independent set is an NP-hard problem: there is no known algorithm that can find that answer in polynomial time. So we settle for the maximal independent set. The deterministic version makes sure that you get the same results every time."
13+
description: "The deterministic version of the MIS algorithm returns the same results every time it runs."
1414
version: lib3.0
1515
include: true

0 commit comments

Comments
 (0)