There are situations where we want to query Table data, such as analysing load test results.
Since the maximum number of entities* that are returned in a single query with LINQ Take operator is 1,000 (Ref - Returning the Top n Entities), you may need to code more to retrieve what you want.
*Entities are sets of properties and can be thought of like rows in a database.
LINQ has an upper limit of 1000, while Data Explorer with Kusto allows to query large numbers of entities.
In this article, how to query with Kusto in Azure Data Explorer (ADX) will be explained in particular.
- Example table data
- Kusto queries to calculate processing time
First of all, you need to complete the steps to ingest data from Table Storage into Data Explorer via Data Factory to prepare to query large numbers of entities with Kusto.
Please refer to the post "Ingest data from Azure Table Storage into Data Explorer".
Once done with the steps above, you're ready to query data ingested from Azure Table Storage with Kusto.
In this article, I will use an example from the post "Cost comparison between Azure services to determine architecture" to illustrate how to query Table data with Kusto.
The purpose of querying the table data here is to make sure that one of the requirements, the processing time between device and storage is less than 10 seconds, is met.
Let's see how an example table data looks like. It is assumed that there are around 1.2 million entities adding into a single table.
Table Name : telemetry202108180820
timestamp in column Data is the time when telemetry message is sent from each device. Column Timestamp is the time when telemetry message is ingested into Table Storage after processing it with Function App.
These timestamp data will be used to calculate processing time between device and table storage.
Here is an example of Kusto queries to calculate processing time between device and table storage.
telemetry202108180820 | project data = parse_json(Data), ingestedTime = Timestamp | project generatedTime = todatetime(data, timestamp), ingestedTime | project diff = datetime_diff("Millisecond", ingestedTime, generatedTime) | summarize avg(diff), max(diff), min(diff), percentiles(diff, 5, 90, 99)
What each step is doing will be explained as the following.
The statement starts with a reference to a table. In this article, the table 'telemetry202108180820' is being used as shown in the example table data section above.
| project data = parse_json(Data), // Interpret column Data as a JSON and rename it to data ingestedTime = Timestamp // Rename column Timestamp to ingestedTime
'project' is an operator to select the columns to include, rename or drop, and insert new computed columns.
In the example here, Data is a string and it needs to be interpreted as a JSON value to extract just some properties from the JSON column later.
Also, column Timestamp is renamed to ingestedTime to clarify the difference from another timestamp.
| project generatedTime = todatetime(data, timestamp), // Convert timestamp in column data to datetime scalar and rename it to generatedTime ingestedTime // Include ingestedTime
'todatetime' is a function to convert input to datetime scalar. Timestamp in column data is renamed to generatedTime after converting with 'todatetime' function.
ingestedTime which was renamed above is also included again.
| project diff = datetime_diff("Millisecond", ingestedTime, generatedTime) // Calculates calendarian difference (millisecond) between two datetime values: ingestedTime and generatedTime, and rename it to diff
'datetime_diff' is a function to calculates calendarian difference between two datetime values.
By calculating the difference in millisecond between the ingestedTimeand (when telemetry message is ingested into Table Storage) and the geteratedTime (when telemetry message is sent from each device), the processing time can be obtained. The value is renamed to diff.
| summarize avg(diff), max(diff), min(diff), percentiles(diff, 5, 50, 90, 99) // Produce a table including the average, maximum, minimum, percentile approximate of diff which is calendarian difference (millisecond) between ingestedTime and generatedTime
'summarize' is an operator to produce a table that aggregates the content of the input table.
As an example, the following four aggregation functions are used.
- avg : Returns an average value across the group
- max : Returns the maximum value across the group
- min : Returns the minimum value across the group
- percentiles : Returns the percentile approximate of the group
Here is the result of the Kusto queries explained ealier. The average is about 5,023 milliseconds which means 5.023 seconds.
The result shows that the requirement which the processing time between device and storage should be less than 10 seconds.
This is just an example of how to query table data in Azure Data Explorer with Kusto to analyse load test results.
If you want to retrieve/query large numbers of entities from Table Storage, much more than 1000, one way to do it is to use Kusto instead of LINQ.