Modern Graph Query Language – GSQL

This post introduces the prospect of fulfilling the need for a modern graph query language with GSQL



By Dr. Yu Xu, CEO of TigerGraph

Graph database technology is the fastest growing category in all of data management, according to consultancy DB-Engines.com. A recent Forrester survey shows more than half of global data and analytics technology decision-makers employ graph databases today.

Today, enterprises use graph technology as a competitive edge for customer analytics, fraud detection, risk assessment and other complex data challenges. The technology offers the ability to quickly and efficiently explore, discover and predict relationships to reveal critical patterns and insights to support business goals.

It’s Time for a Modern Graph Query Language

With the widespread adoption of graph databases, the time is right for a standard graph query language. Enterprises are successfully using a range of query languages today, such as Neo4j’s Cypher, Apache TinkerPop Gremlin, but these are not without their own limitations.

While Cypher is high-level and user-friendly, it is not Turing-complete. There are many graph algorithms and business logic rules that are not possible to implement using Cypher. An example is a PageRank/Label Propagation style algorithm, where variances are important for verticals such as power flow computation and optimization, risk analytics and more.

Gremlin is Turing-complete but low level. The language works well for very simple queries, but when it comes to real-life business problems, advanced programming skills are needed, and the resulting query can be hard to understand and maintain. A technical comparison of graph query languages is available here.

As data and use cases grow and become more complex, it’s clear organizations need both the ability to scale up and out, and the ability to generate the material performance impact needed for big data, for complex analytics such as machine learning and AI, and for real-time operations.

An international standard will also help solve pervasive challenges in the graph market, which include the difficulty of finding proficient graph developers and the need to speed up enterprise adoption. Lowering the barrier to learning and implementing graph applications — via a High Level Graph Query Language — makes it easier for more developers to bridge the gap between asking high-level real-life questions and easily crafting a graph-based solution.
A modern graph query language, along with a super-fast and scalable system to support it, is also key to helping enterprises achieve digital transformation at scale.

What Should a Modern Graph Query Language Look Like? 

Let's take a step back and understand why users are choosing graph databases over RDBMS, key-value, document store and other types of databases. Typically it is because they want to solve complex problems with satisfying performance. Non-graph databases have shortcomings such as the following::

- Extreme difficulty expressing real-world business problems. For example, how are entities such as accounts or people connected in various often previously unknown ways (especially to known bad actors)?

- Dismal real-time performance when accessing 10s to 100s millions of entities and their relationships. Speed is of the essence when it comes to applications such as real-time personalized recommendations, fraud detection and more.

A graph database provides the platform for solving these problems, but users still need the appropriate query language: to define graph schemas to model complex entities and relationships, to easily map all kinds of data sources to a graph, to load data to a graph quickly, and to be expressive enough (Turing-complete) to model and solve real-life business problems in all kinds of industries. These are key to enabling graph technology to cross the chasm for more widespread enterprise adoption.

Given this, experts have called out eight key characteristics important to a modern, graph query language: 1) schema-based with the capability of dynamic schema change, 2) high-level expression of graph traversal, 3) fine control of graph traversal, 4) built-in parallel semantics to guarantee high performance, 5) a highly expressive loading language, 6) data security and privacy, 7) support for queries calling queries (recursively) and 8) SQL user-friendly. More detail about the need for these requirements can be found here.

Introducing GSQL

GSQL is a user-friendly, highly expressive and Turing-complete graph query language. Unlike other options, GSQL supports the real-life business requirements organizations already experience and is designed from the ground up to meet the criteria above.

In fact, development of GSQL was sparked after an unsuccessful search for a modern graph query language that could adequately address real business needs. When TigerGraph was founded six years ago, we decided to put in the time (several years) to create GSQL to alleviate this pain point. We engineered it from square one to support parallel loading, querying and updates for real-time transactions and ultra-fast analytics.

Since TigerGraph’s public launch late last year, customer feedback has been incredible. We continually hear from customers gaining data insights they previously thought were impossible. In other words, GSQL has enabled them to make the impossible possible. GSQL is in use by some of the largest companies including Alipay, the world’s largest graph deployment, with over two billion real-time transactions per day.

Users also report GSQL’s ease of use for creating queries over big data. Speed has been another benefit, as GSQL queries deliver results in seconds, compared to hours using solutions such as Cypher or Gremlin - that is, if the query is even possible at all. In short, GSQL satisfies what matters most in a graph query language: performance, expressiveness and ease-of-use. To learn more about Tigergraph and GSQL, visit www.tigergraph.com.

I encourage the industry to campaign for an international standard to maximize their graph database investments. It will benefit everyone. However, key considerations need to be made in selecting a language designed to meet modern business requirements.

Example GSQL Query 1: PageRank, which computes the relative authority of nodes, is essential for many applications such a community clustering, impact analysis, power flow convergence, label propagation for predictive analytics in many verticals, etc. Being able to implement and customize PageRank-style (iterative, full control over the number of iterations and terminating conditions, etc.) graph algorithms purely in a high-level graph query language is critical for such use cases. Conversely, a hard-coded PageRank algorithm does not provide users with the expressive power to solve real-life problems.

CREATE QUERY pageRank (FLOAT diffLimit, INT maxIter, FLOAT damping) FOR GRAPH G
{
# @@ = global accumulator; @ = per-vertex accumulators

MaxAccum @@maxDiff = 9999; # max score change in an iteration
SumAccum @rcvd_score = 0; # sum of scores each vertex gets from neighbors
SumAccum @score = 1; # initial score for every vertex is 1.

Vertices = {ANY};
WHILE @@maxDiff > diffLimit LIMIT maxIter DO
@@maxDiff = 0;
Vertices = SELECT s FROM Vertices:s-(:e)->:t
ACCUM t.@rcvd_score += s.@score/(s.outdegree())
POST-ACCUM s.@score = (1-damping) + damping * t.@rcvd_score,
s.@rcvd_score = 0,
@@maxDiff += abs(s.@score - s.@score');
END;
PRINT Vertices;
}

Bio: Dr. Yu Xu is the founder and CEO of TigerGraph, the world’s first native parallel graph database. Dr. Xu received his Ph.D. in Computer Science and Engineering from the University of California San Diego. He is an expert in big data and parallel database systems and also graph databases. He has 26 patents in parallel data management and optimization. Prior to founding TigerGraph, Dr. Xu worked on Twitter’s data infrastructure for massive data analytics. Before that, he worked as Teradata’s Hadoop architect where he led the company’s big data initiatives.

Related: