loading...

QuickTip: Ingesting Google Analytics API with Apache NiFi

tspannhw profile image Timothy Spann Originally published at datainmotion.dev on ・2 min read

QuickTip: Ingesting Google Analytics API with Apache NiFi

Design your query / test the API here:

https://ga-dev-tools.appspot.com/query-explorer/

Building this NiFi flow is trivial.

Add your URL with tokens from the Query Explorer console.

You will need to reference the JRE that NiFi is using and it's cacerts if you don't want to build your own trust store. The default password for JDK 8 is changeit. No really.

Here are our results in clean JSON

Here are some attributes NiFi shows.

Example JSON Results

{

** "kind": "analytics#gaData",**

** "id": "https://www.googleapis.com/analytics/v3/data/ga?ids=ga:33&metrics=ga:users,ga:percentNewSessions,ga:sessions&start-date=30daysAgo&end-date=yesterday",**

** "query": {**

** "start-date": "30daysAgo",**

** "end-date": "yesterday",**

** "ids": "ga:33",**

** "metrics": [**

** "ga:users",**

** "ga:percentNewSessions",**

** "ga:sessions"**

** ],**

** "start-index": 1,**

** "max-results": 1000**

** },**

** "itemsPerPage": 1000,**

** "totalResults": 0,**

** "selfLink": "https://www.googleapis.com/analytics/v3/data/ga?ids=ga:33&metrics=ga:users,ga:percentNewSessions,ga:sessions&start-date=30daysAgo&end-date=yesterday",**

** "profileInfo": {**

** "profileId": "333",**

** "accountId": "333",**

** "webPropertyId": "UA-333-3",**

** "internalWebPropertyId": "33",**

** "profileName": "monitorenergy.blogspot.com/",**

** "tableId": "ga:33"**

** },**

** "containsSampledData": false,**

** "columnHeaders": [**

** {**

** "name": "ga:users",**

** "columnType": "METRIC",**

** "dataType": "INTEGER"**

** },**

** {**

** "name": "ga:percentNewSessions",**

** "columnType": "METRIC",**

** "dataType": "PERCENT"**

** },**

** {**

** "name": "ga:sessions",**

** "columnType": "METRIC",**

** "dataType": "INTEGER"**

** }**

** ],**

** "totalsForAllResults": {**

** "ga:users": "0",**

** "ga:percentNewSessions": "0.0",**

** "ga:sessions": "0"**

** }**

}

You should have a lot more data depending on what you have Google Analytics pointing to. From here you can use QueryRecord or another record processor to automatically covert, query or route this data. You can infer a schema or build up a permanent one and store it in Cloudera Schema Registry. I recommend doing that if this is a frequent process.

Download a reference NiFi flow here:

https://github.com/tspannhw/flows

References:

https://developers.google.com/analytics/devguides/reporting/core/v4

https://developers.google.com/analytics

Posted on Jan 27 by:

tspannhw profile

Timothy Spann

@tspannhw

I am a Principal Field Engineer for Data in Motion at Cloudera. I work with Apache NiFi, Apache Kafka, Apache Spark, Apache Flink, IoT, MXNet, DLJ.AI, Deep Learning, Machine Learning, Streaming...

Discussion

markdown guide