loading...

Analyzing Google Cloud Egress Usage with VPC Logs, BigQuery and Grafana

bzon profile image Bryan Sazon Updated on ・4 min read

It is essential to have a dashboard that shows how much data are sent within and outside of your infrastructure because:

  • Data sent between two different zones has cost.
    E.g. Regional GKE clusters.

  • Data sent between two regions has cost.
    E.g. Inter-region communicating GKE clusters.

  • Data sent outside (internet) has a cost.
    E.g. APIs responses at a high rate can incur a lot of egress cost.

See https://cloud.google.com/compute/network-pricing.

Solution

Aggregate the VPC Flow logs to BigQuery and visualize it in Grafana.

Enable VPC Flow Logs

Read and follow the simple guide from here https://cloud.google.com/vpc/docs/using-flow-logs#enabling_vpc_flow_logging.

Summary:

If you are using terraform, this is what you may need:

resource "google_compute_subnetwork" "my_subnet" {
  #
  # ... some code here ...
  #

  enable_flow_logs = true
  log_config {
    aggregation_interval = "INTERVAL_15_MIN"
    flow_sampling        = 0.9
    metadata             = "INCLUDE_ALL_METADATA"
  }
}

Go to Stackdriver logs and wait for the VPC logs to show using this filter.

resource.type="gce_subnetwork"
logName="projects/<YOUR GOOGLE CLOUD PROJECT ID>/logs/compute.googleapis.com%2Fvpc_flows"

Export Logs to BigQuery

Create a new Logs router sink and name it vpc_flow_logs.

Alt Text

Choose the option to automatically create a BigQuery data set. If you follow the example, it will name the data set vpc_flow_logs.

Use the same logs filter for the sink.

resource.type="gce_subnetwork"
logName="projects/<YOUR GOOGLE CLOUD PROJECT ID>/logs/compute.googleapis.com%2Fvpc_flows"

Using BigQuery

Go to BigQuery console, and copy-paste the following example queries.

A little SQL knowledge is required. If the SQL queries are not working, please let me know in the comments section!

Query 1: How much internet bandwidth was transmitted from a VPC to a continent since the date 2020-04-01?.

SELECT CONCAT(jsonPayload.src_vpc.vpc_name, " vpc to ", jsonPayload.dest_location.continent)  AS vpc_dest,
       sum(CAST (jsonPayload.bytes_sent AS INT64)) AS bytes_sent
FROM `vpc_flow_logs.compute_googleapis_com_vpc_flows_*`
WHERE TIMESTAMP >= "2020-04-01 00:00:00 UTC"
  AND jsonPayload.dest_location.continent IS NOT NULL
  AND jsonPayload.dest_location.continent != ""
GROUP BY vpc_dest
ORDER BY bytes_sent

Example result

vpc_dest bytes_sent
foo vpc to Europe 133674777320
bar vpc to Asia 3013921321
baz vpc to America 934811223

Query 2: How much infrastructure bandwidth internally was transmitted from one region or zone to another region or zone since the date 2020-04-01?

SELECT 
      TIMESTAMP_TRUNC(TIMESTAMP, HOUR) AS timestamp,
      CONCAT(jsonPayload.src_vpc.vpc_name, " ", jsonPayload.src_instance.zone, " to ", jsonPayload.dest_vpc.vpc_name, " ", jsonPayload.dest_instance.zone)  AS route,
       sum(CAST (jsonPayload.bytes_sent AS INT64)) AS bytes_sent
FROM `vpc_flow_logs.compute_googleapis_com_vpc_flows_*`
WHERE TIMESTAMP >= "2020-04-01 00:00:00 UTC"
  AND jsonPayload.src_instance.zone != ""
  AND jsonPayload.dest_instance.zone != ""
  AND jsonPayload.src_location IS null
  AND jsonPayload.dest_location IS null
GROUP BY 
timestamp, route
ORDER BY
timestamp ASC

Example result

route bytes_sent
foo europe-west1-b to foo europe-west1-a 2132121
bar us-west1-a to foo europe-west1-b 765756
baz us-west1-a to bar europe-west1-b 634334

For column names reference, see vpc flow record format and stackdriver log format.

Using Grafana

It is required that you install and configure the BigQuery Datasource plugin first before proceeding.

Building the Graphs

Here, you need to change the example queries and make use of Grafana SQL Macros.

Changes:

  • $__timeGroupAlias(TIMESTAMP, 1h) is used to aggregate the TIMESTAMP column with 1 hour resolution.
  • The column name that will be aliased as metric will be the label that is shown or grouped within the Grafana graph.
  • $__timeFilter(TIMESTAMP) is used to use the Grafana time range filter as the TIMESTAMP value.
  • Numbers that represent the column order are used in GROUP BY and ORDER BY for simplification.

Build the Internet Egress graph

SELECT 
      $__timeGroupAlias(TIMESTAMP, 1h),
      CONCAT(jsonPayload.src_vpc.vpc_name, " vpc to ", jsonPayload.dest_location.continent)  AS metric,
       sum(CAST (jsonPayload.bytes_sent AS INT64)) AS bytes_sent
FROM `vpc_flow_logs.compute_googleapis_com_vpc_flows_*`
WHERE $__timeFilter(TIMESTAMP)
  AND jsonPayload.dest_location.continent IS NOT NULL
  AND jsonPayload.dest_location.continent != ""
GROUP BY 
1, 2
ORDER BY
1, 2

Alt Text

Build the Inter VPC Egress graph

SELECT 
      $__timeGroupAlias(TIMESTAMP, 1h),
      CONCAT(jsonPayload.src_vpc.vpc_name, " ", jsonPayload.src_instance.zone, " to ", jsonPayload.dest_vpc.vpc_name, " ", jsonPayload.dest_instance.zone)  AS metric,
       sum(CAST (jsonPayload.bytes_sent AS INT64)) AS bytes_sent
FROM `vpc_flow_logs.compute_googleapis_com_vpc_flows_*`
WHERE $__timeFilter(TIMESTAMP)
  AND jsonPayload.src_instance.zone != ""
  AND jsonPayload.dest_instance.zone != ""
  AND jsonPayload.src_location IS null
  AND jsonPayload.dest_location IS null
GROUP BY 
1, 2
ORDER BY
1, 2

Alt Text

The End

Ensure that you watch your Stackdriver logs ingestion rate to prevent unwanted cost!

Thanks for reading!

In my next post, I will write a guide on how to visualize GCP Billing data in Grafana.

References:

Posted on by:

bzon profile

Bryan Sazon

@bzon

8 years in the biz. Backend Engineer (Go). System Administrator (Kubernetes, Linux, Cloud).

Discussion

markdown guide