I'm starting to learn Python to update a data pipeline and had to upload some JSON files to Google BigQuery. Hope this helps people in need!
See GCP documentation (for a CSV example).
Steps before running the script:
- Create a Google service account with BigQuery permissions.
- Download the json key. Do not commit into git! Use .gitignore if needed.
- Add the key to your .env variable. This will get load via
load_dotenv
library. Again, do not commit .env into git!
Example of your .env
GOOGLE_APPLICATION_CREDENTIALS=your-gcp-project-name-aaa333111aaa.json
- Create the dataset via GCP Console, in the BigQuery section.
- Run the script!
upload_json_to_bq.py script:
from google.cloud import bigquery
from google.oauth2 import service_account
from dotenv import load_dotenv
load_dotenv()
client = bigquery.Client()
filename = '/path/to/file/in/nd-format.json'
dataset_id = 'DatasetName'
table_id = 'TableName'
dataset_ref = client.dataset(dataset_id)
table_ref = dataset_ref.table(table_id)
job_config = bigquery.LoadJobConfig()
job_config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON
job_config.autodetect = True
with open(filename, "rb") as source_file:
job = client.load_table_from_file(
source_file,
table_ref,
location="europe-west1", # Must match the destination dataset location.
job_config=job_config,
) # API request
job.result() # Waits for table load to complete.
print("Loaded {} rows into {}:{}.".format(job.output_rows, dataset_id, table_id))
Note: the file must be a JSON newline delimited file.
Top comments (0)