DEV Community

Cover image for Mastering the percentileCont Function in Apache AGE
Nnaemeka Daniel John
Nnaemeka Daniel John

Posted on • Edited on

Mastering the percentileCont Function in Apache AGE

Apache AGE (A Graph Extension) is a powerful graph extension for the PostgreSQL Database. It provides a rich set of statistical and mathematical functions to analyze and manipulate data efficiently. In this blog post, we will delve into an essential function offered by Apache AGE which is the percentileCont() function. This function is used to calculate percentiles in a dataset. Let's explore this function in detail and understand when to use it.

What are Percentiles?

Before we dive into the specifics of the function, let us briefly understand the concept of percentiles. A percentile is a statistical measure that indicates the relative position of a particular value within a dataset. It represents the value below which a given percentage of observations in the dataset falls. For example, the 75th percentile (also known as the third quartile) indicates that 75% of the values in the dataset are below that specific value.


The percentileCont Function

The percentileCont() function in Apache AGE calculates the continuous percentile. It provides an estimate of the value at a given percentile based on linear interpolation between adjacent values in the dataset. The function returns the percentile of the given value over a group, with a percentile from 0.0 to 1.0.
The function takes two arguments (the expression and the percentile) and returns a float.

Query Syntax
Given this dataset

demo=# SELECT * FROM cypher('percentile', $$
demo$# CREATE (:Person {name: 'Paul', age: 20}), (:Person {name: 'Mark', age: 22}),
demo$# (:Person {name: 'Peter', age: 42}), (:Person {name: 'Bob', age: 12}),
demo$# (:Person {name: 'Robin', age: 30}), (:Person {name: 'Grace', age: 24}),
demo$# (:Person {name: 'Martha', age: 32}), (:Person {name: 'Keith', age: 28})
demo$# $$) AS (a agtype);
 a
---
(0 rows)
Enter fullscreen mode Exit fullscreen mode

demo=# SELECT * FROM cypher('percentile', $$
demo$# MATCH (n:Person)
demo$# RETURN percentileCont(n.age, 0.4)
demo$# $$) as (percentile_cont_age agtype);
 percentile_cont_age
---------------------
 23.6
(1 row)
Enter fullscreen mode Exit fullscreen mode

In this case, 0.4 is the median, or 40th percentile. The 40th percentile of the values in the property age will be returned, calculated with a weighted average.

Things to consider when using this function would be;

  1. Null values will be excluded from the calculation.
  2. percentileCont(null, percentile) will return null.

Things to Note:

  1. The percentile argument is a numeric value between 0 and 1, representing the desired percentile. For example, 0.5 represents the 50th percentile (median).
  2. The expression argument defines the dataset from which the percentile will be calculated.
  3. The percentileCont() function returns a continuous value, even if the percentile falls between two existing values in the dataset.
  4. The percentileCont() function assumes a continuous distribution of data and performs linear interpolation to estimate the percentile value.

Use Cases

  1. The percentileCont() function is useful when you need to calculate percentile values within a continuous dataset.
  2. This function is often used in scenarios where you need a precise estimate of the percentile, even if it falls between two actual data points.

Conclusion

Understanding the percentileCont() function in Apache AGE is crucial for accurate percentile calculations. The percentileCont() provides a continuous estimate through linear interpolation. The use of this function depends on the nature of your data and the level of precision required in your analysis. By considering the characteristics of your dataset and the desired outcome, you can leverage this function effectively to derive meaningful insights from your data.


References

Top comments (0)