DEV Community

Cover image for Mastering the percentileDisc Function in Apache AGE
Nnaemeka Daniel John
Nnaemeka Daniel John

Posted on

Mastering the percentileDisc Function in Apache AGE

Apache AGE (A Graph Extension) is a powerful graph extension for the PostgreSQL Database. It provides a rich set of statistical and mathematical functions to analyze and manipulate data efficiently.

In the last blog post we explored the features and syntax of the percentileCont() function in Apache AGE Click here to read.

Today, we will delve into another essential function offered by Apache AGE which is the percentileDisc() function. This function is also used to calculate percentiles in a dataset. Let's explore this function in detail and understand how and when to use it.

What are Percentiles?

Before we dive into the specifics of the function, let us briefly understand the concept of percentiles. A percentile is a statistical measure that indicates the relative position of a particular value within a dataset. It represents the value below which a given percentage of observations in the dataset falls. For example, the 75th percentile (also known as the third quartile) indicates that 75% of the values in the dataset are below that specific value.


The percentileDisc Function

The percentileDisc() function in Apache AGE uses a rounding method and calculates the nearest value to the percentile. It returns the value from the dataset that represents the nearest data point to the requested percentile. Unlike percentileCont(), it does not perform any interpolation. The function returns the percentile of the given value over a group, with a percentile from 0.0 to 1.0.
The function also takes two arguments (the expression and the percentile) and returns a float.

Query Syntax
Given this dataset

demo=# SELECT * FROM cypher('percentile', $$
demo$# CREATE (:Person {name: 'Paul', age: 20}), (:Person {name: 'Mark', age: 22}),
demo$# (:Person {name: 'Peter', age: 42}), (:Person {name: 'Bob', age: 12}),
demo$# (:Person {name: 'Robin', age: 30}), (:Person {name: 'Grace', age: 24}),
demo$# (:Person {name: 'Martha', age: 32}), (:Person {name: 'Keith', age: 28})
demo$# $$) AS (a agtype);
 a
---
(0 rows)
Enter fullscreen mode Exit fullscreen mode

demo=# SELECT * FROM cypher('percentile', $$
demo$# MATCH (n:Person)
demo$# RETURN percentileDisc(n.age, 0.5)
demo$# $$) as (percentile_disc_age agtype);
 percentile_disc_age
---------------------
 24.0
(1 row)
Enter fullscreen mode Exit fullscreen mode

In this case, The 50th percentile of the values in the property age is returned.


Things to Note:

  1. The percentile argument is a numeric value between 0 and 1, representing the desired percentile. For example, 0.5 represents the 50th percentile (median).
  2. If the percentile falls between two data points, this function returns the value from the dataset that is closest to the requested percentile.
  3. The percentileDisc() function returns a discrete value that exists in the dataset, not an interpolated estimate.

Use Cases

  1. The percentileDisc() function is appropriate when you need to find the closest actual value in the dataset for a given percentile.
  2. The percentileDisc() function is commonly used when dealing with discrete or categorical data, where linear interpolation does not make sense.

Conclusion

It is important to grasp the difference between the percentileCont() and percentileDisc() functions in Apache AGE as it is crucial for accurate percentile calculations. While the percentileCont() function provides a continuous estimate through linear interpolation, The percentileDisc() function returns the closest actual value from the dataset. Choosing the appropriate function depends on the nature of your data and the level of precision required in your analysis. By considering the characteristics of your dataset and the desired outcome, you can leverage these functions effectively to derive meaningful insights from your data.


References

Top comments (0)