DEV Community

loading...
Cover image for Export AWS/GCP cost and billing data to BigQuery for analytics

Export AWS/GCP cost and billing data to BigQuery for analytics

kokospapa8 profile image Jinwook Baek ・9 min read

Original post has better layout with images.


Intro

aws
gcp

I am currently using both GCP and AWS for a certain project. While each cloud providers provides very nice tool and visualizations for their own spending (Cloud billing for GCP and Cost&Usage report for AWS respectively), I wanted to consolidate both cloud providers usage report and visualize using single BI tool. The report does not need to be realtime, I only need daily granularity on my report. (No streaming)

So I had to spin up something and these were couple options

  • House data to AWS using Athena+Quicksight(or other BI tool)
  • House data to GCP using Bigquery+datastudio(or other BI tool)
  • Spend more time searching for 3rd party tools

I am a big fan of BigQuery and I prefer to to ELT rather than ETL on BQ. Moreover, I was already using BigQuery for the project with holistics.io, I chose to house usage and cost data to bigquery.

Move data to BQ

GCP Cloud Billing

Setup

This is easy, cloud billing natively support bigquery export.

  • Enable billing for the GCP project.

    I assume that you already have billing account setup. (You wouldn't need data analytics unless you are spending any moeny) But if you need to enable billing for the project, refer to the following link.

    Modify a project's billing settings | Cloud Billing | Google Cloud

  • Create a dataset in BigQuery whthin the GCP project you desire - I named it billing_export

    Creating datasets | BigQuery | Google Cloud

  • You need appropriate permissions to setup export

    • Billing account Admin
    • Bigquery Admin for the project
    • resourcemanager.projects.update permission

Alt Text

  • Enable Cloud Billing export to BigQuery dataset
    • Go to Billing menu in Navigation menu (https://console.cloud.google.com/billing/)
    • Go to linked billing account (this should be already set, if not you need to add payment info)
    • Select Billing export → Select BigQuery export tab
    • Select edit setting on Daily cost detail
      • Select the project and dataset where you want the data to be sinked. (note) The BigQuery API is required to export data to BigQuery. If the project you selected doesn't have the BigQuery API enabled, you will be prompted to enable it. Click Enable BigQuery API and the API will be enabled for you

Once everything is set, you will be able to see your screen like this.

When you first enable the daily cost detail export to BigQuery, it might take a few hours to start seeing your Google Cloud cost data. Table with gcp_billing_export_v1_<some_hash> will be generated. Table will be automatically partitioned by day. (you can query partitioned data to save cost)

Alt Text

Schema

After a while you will notice a being exported to a table. You can click table and click preview or query.

SELECT usage_start_time, location.region, cost, currency, usage.amount, usage.unit, credits
FROM `<project>.billing_export.gcp_billing_export_v1_<hash>`
WHERE _PARTITIONTIME >= "2020-12-01 00:00:00"
Enter fullscreen mode Exit fullscreen mode

Alt Text

If you need specific information for each column, refer to this link. It will help you understand each columns of data.

Understanding the Cloud Billing data tables in BigQuery | Google Cloud

Special note on credits nested field if you are using free tier.

That's it for GCP! now to the hard part.

AWS Cost & Usage Report

I didn't want to reinvent the wheel, therefore as all the good developers do, I googled. I found couple of great articles regarding this subject, but they seemed to be complicated and outdated. Luckly, there were already a natively supported S3 transfer service in BigQuery. It supports transfer csv and parquet.

Alt Text
seems too complicated.

Amazon S3 transfers | BigQuery Data Transfer Service | Google Cloud

Let's save the cost and usage report to S3 first. Go to AWS Cost and Usage Reports. You can create a report here.

aws usage and cost reports

Alt Text
Alt Text

  • Report name - daily_report_gzip
  • Include resource IDs - check
  • Data refresh settings - uncheck
  • S3 bucket - create a bucket or select a bucket
  • prefix - billing/daily_report_gzip
  • Time granularity - Daily
  • Report versioning - overwrite
  • Compression type - gzip

After creating report, it takes couple hours to show up on S3. Once report is created, you will see gz file exported under following URI.

s3://<bucket>/billing/daily_report_gzip/20201201-20210101/

  • data - daily_report_gzip-00001.csv.gz
  • metadata - daily_report_gzip-Manifest.json

Alt Text

Schema

We need to know the schema in order to house the data in BigQuery table. If you want to know what each fields represents, consult the document.

Data dictionary

When you open up the daily_report_gzip-Manifest.json file, you will discover how each columns are structued. Take a note on variety of types - string, OptionalBigDecimal, DateInterval and so on.

  • Json - manifest file example

Alt Text

Converting manifest to bq table schema

In order to transfer the data, the BQ table needs to have correct schema to house the csv data. BQ have different types and there are too many fields to just copy and paste. You can configure table chema with Edit as text. (ex: Field1:TYPE, Field2:TYPE) Let's use jq to extract types as bq table schema.

I also need to replace some fields type to BigQuery compatible types.

  • BigDecimal, OptionalBigDecimal → bignumeric
  • Interval → string
  • datetime → timestamp
  • OptionalString → string

Download the json file and pipe with following jq command

cat daily_report_gzip-Manifest.json | jq -jr '.columns[] | .category+"_"+.name+":"+ .type +","' | sed s/.$// | sed 's/OptionalString/String/g' | sed 's/\<Interval\>/String/g' | sed 's/OptionalBigDecimal/BigNumeric/g' | sed 's/BigDecimal/BigNumeric/g' | sed 's/DateTime/Timestamp/g'
Enter fullscreen mode Exit fullscreen mode
  • jq -jr '.columns[] | .category+"_"+.name+":"+ .type +","'
    • this will extract each items in array and struct filed as <category>_<name>:<type>. I needed category prepended because there are duplicate names
    • -r for raw output
  • sed s/.$//
    • remove last ,
  • sed 's/<original type>/<bq type>/g'
    • replace type to bq compatible types

This command will result in following texts

identity_LineItemId:String,identity_TimeInterval:String,bill_InvoiceId:String,bill_BillingEntity:String,bill_BillType:String,bill_PayerAccountId:String,bill_BillingPeriodStartDate:Timestamp,bill_BillingPeriodEndDate:Timestamp,lineItem_UsageAccountId:String,lineItem_LineItemType:String,lineItem_UsageStartDate:Timestamp,lineItem_UsageEndDate:Timestamp,lineItem_ProductCode:String,lineItem_UsageType:String,lineItem_Operation:String,lineItem_AvailabilityZone:String,lineItem_ResourceId:String,lineItem_UsageAmount:BigNumeric,lineItem_NormalizationFactor:BigNumeric,lineItem_NormalizedUsageAmount:BigNumeric,lineItem_CurrencyCode:String,lineItem_UnblendedRate:String,lineItem_UnblendedCost:BigNumeric,lineItem_BlendedRate:String,lineItem_BlendedCost:BigNumeric,lineItem_LineItemDescription:String,lineItem_TaxType:String,lineItem_LegalEntity:String,product_ProductName:String,product_accountAssistance:String,product_alarmType:String,product_architecturalReview:String,product_architectureSupport:String,product_availability:String,product_baseProductReferenceCode:String,product_bestPractices:String,product_cacheEngine:String,product_capacitystatus:String,product_caseSeverityresponseTimes:String,product_clientLocation:String,product_clockSpeed:String,product_computeFamily:String,product_computeType:String,product_cputype:String,product_currentGeneration:String,product_customerServiceAndCommunities:String,product_databaseEngine:String,product_deploymentOption:String,product_description:String,product_dominantnondominant:String,product_durability:String,product_ecu:String,product_endpointType:String,product_engineCode:String,product_enhancedNetworkingSupported:String,product_eventType:String,product_feeDescription:String,product_fromLocation:String,product_fromLocationType:String,product_group:String,product_groupDescription:String,product_includedServices:String,product_insightstype:String,product_instanceFamily:String,product_instanceType:String,product_instanceTypeFamily:String,product_intelAvx2Available:String,product_intelAvxAvailable:String,product_intelTurboAvailable:String,product_launchSupport:String,product_licenseModel:String,product_location:String,product_locationType:String,product_logsDestination:String,product_maxIopsBurstPerformance:String,product_maxIopsvolume:String,product_maxThroughputvolume:String,product_maxVolumeSize:String,product_memory:String,product_memoryGib:String,product_memorytype:String,product_messageDeliveryFrequency:String,product_messageDeliveryOrder:String,product_minVolumeSize:String,product_networkPerformance:String,product_normalizationSizeFactor:String,product_operatingSystem:String,product_operation:String,product_operationsSupport:String,product_physicalProcessor:String,product_preInstalledSw:String,product_pricingUnit:String,product_proactiveGuidance:String,product_processorArchitecture:String,product_processorFeatures:String,product_productFamily:String,product_programmaticCaseManagement:String,product_queueType:String,product_region:String,product_requestDescription:String,product_requestType:String,product_resourcePriceGroup:String,product_routingTarget:String,product_routingType:String,product_servicecode:String,product_servicename:String,product_sku:String,product_storage:String,product_storageClass:String,product_storageMedia:String,product_storageType:String,product_technicalSupport:String,product_tenancy:String,product_thirdpartySoftwareSupport:String,product_tiertype:String,product_toLocation:String,product_toLocationType:String,product_trafficDirection:String,product_training:String,product_transferType:String,product_usagetype:String,product_vcpu:String,product_version:String,product_volumeApiName:String,product_volumeType:String,product_whoCanOpenCases:String,pricing_LeaseContractLength:String,pricing_OfferingClass:String,pricing_PurchaseOption:String,pricing_RateId:String,pricing_currency:String,pricing_publicOnDemandCost:BigNumeric,pricing_publicOnDemandRate:String,pricing_term:String,pricing_unit:String,reservation_AmortizedUpfrontCostForUsage:BigNumeric,reservation_AmortizedUpfrontFeeForBillingPeriod:BigNumeric,reservation_EffectiveCost:BigNumeric,reservation_EndTime:String,reservation_ModificationStatus:String,reservation_NormalizedUnitsPerReservation:String,reservation_NumberOfReservations:String,reservation_RecurringFeeForUsage:BigNumeric,reservation_ReservationARN:String,reservation_StartTime:String,reservation_SubscriptionId:String,reservation_TotalReservedNormalizedUnits:String,reservation_TotalReservedUnits:String,reservation_UnitsPerReservation:String,reservation_UnusedAmortizedUpfrontFeeForBillingPeriod:BigNumeric,reservation_UnusedNormalizedUnitQuantity:BigNumeric,reservation_UnusedQuantity:BigNumeric,reservation_UnusedRecurringFee:BigNumeric,reservation_UpfrontValue:BigNumeric,savingsPlan_TotalCommitmentToDate:BigNumeric,savingsPlan_SavingsPlanARN:String,savingsPlan_SavingsPlanRate:BigNumeric,savingsPlan_UsedCommitment:BigNumeric,savingsPlan_SavingsPlanEffectiveCost:BigNumeric,savingsPlan_AmortizedUpfrontCommitmentForBillingPeriod:BigNumeric,savingsPlan_RecurringCommitmentForBillingPeriod:BigNumeric
Enter fullscreen mode Exit fullscreen mode

Create BQ Table

Let's create a BQ table. You should have a dataset created already if you exported a Cloud billing. If not, let's create now.

Once dataset if created, click create table.

  • table from - Empty table
  • Destination - search for a project
  • Project name - project you created a dataset
  • Dataset name - select one you created
  • Talbe name - aws
  • For Schema, toggle Edit as text and paste the schema you generated with jq command
  • Partion - you can skip this or select a timestap filed that you will be commonly filter with. I chose billingPeriodStartData and partition by day.

Alt Text
Alt Text

Transfer Prereq

Once the table is ready, let's create a transfer.

You need to have following permissions first.

  • bigquery.transfers.update
  • bigquery.datasets.update

These permissions are includede in predefiend bigquery.admin role.

AWS credential

Create IAM for AWS credential for transfer. You only need AmazonS3ReadOnlyAccess permission on specific bucket you created, but if you are lazy, just use AmazonS3ReadOnlyAccess AWS managed policy on all resources. If you don't know how to create a IAM user, please refer to following link.

Amazon S3 - Creating an IAM user

Configure Transfer

On bigquery console, click transfer menu on left pane. You will be redirected to a transfer configuration page. Click + Create Transfer button.
Alt Text

  • Source - Amazon S3
  • Display Name - aws-billing-export
  • Schedule options - starts now, daily
  • Destination

    • Dataset ID - dataset you created
    • Destination table - aws
    • Amazon S3 URI - s3://<bucket>/billing/daily_report_gzip/*/*.gz
    • Access key ID
    • Secert access key
    • File format - csv
      • Bq support gzip natively
    • Transfer option
      • number of errors allowed - 0
      • ignore unknown values - check
      • field delimieter - ,
      • Header rows to skip - 1
    • Notification
      • you can create new pub/sub and configure notification here. You can configure this later. So leave it blank for now.

    Click Save!

Alt Text
Alt Text

You willing following green check mark if the transfer succeeds.

Alt Text

If you see error with red marker, exame the error logs on right pane. Common erros consist,

  • Wrong S3 URI
  • Wrong AWs credential
  • Wrong table schema
  • No objects

Cleaning Data

You have two tables populated with billing data! It's time do some analytics and visualizations.

Let's review fields

AWS

Data dictionary

GCP

Understanding the Cloud Billing data tables in BigQuery | Google Cloud

You can use BQ console to query and testout data for integrity to makesure the load is intact. I won't go into details with sql query.

Once Extracted and Loaded, I do Transform on Holistics.io leaving the data intact, but you can also create new view to minimize, clean and consolidate the data into single table or a view using scheduled query feature of BigQuery.

Visualize data

After inspecting the with couple queries, and I am just a newbie in sql world, I use Data Studio to inspect the data.

When you use BigQuery, You can connect data to Data Studio without extra configuration.
Alt Text
Alt Text

You can use other BI tool to connect BQ to visualize and analyze you data from here.

Extra

  • [ ] Check data integrity with daily and monthly roll over

Conclusion

As BQ support S3 transfer, it was a lot easier for me to move data around but still there were some work to be done.

Please let me know if you find more concrete solution to consolidate billing data in single place!

Thank you for reading here are some pictures of Kokos!

Alt Text
Alt Text

Reference

Export Cloud Billing data to BigQuery | Google Cloud

Overview of Amazon S3 transfers | BigQuery Data Transfer Service

Using runtime parameters in transfers | BigQuery Data Transfer Service

Discussion (0)

pic
Editor guide