gfluent’s documentation 0.1.13

This is a wrapper on Google Cloud Platform Python SDK client library. It provides a fluent-style to call the methods, here is an example,

from gfluent import BQ

project_id = "here-is-you-project-id"
bq = BQ(project_id, table="mydataset.table")

count = (
   bq.mode("WRITE_APPEND")
     .sql("SELECT name, age from dataset.tabble")
     .query()
   )

print(f"{count} rows loaded")

API Reference

class gfluent.BQ(project: str, **kwargs)[source]

The fluent-style BigQuery client for chaining calls

Example:

# run the query and save to the table dataset.name
bq = BQ(project='you-project-id', table='dataset.name')
num_rows = bq.mode('CREATE_TRUNCATE').sql('select * from table').query()

bq = BQ(project='you-project-id')

rows = bq.sql('select id, name from abc.tab').query()
for row in rows:
    print(row.id, row.name)

Allowed additional arguments,

table: The BigQuery full tablename with dataset,
gcs: The GCS location with gs:// prefix,
sql: SQL Statement should start with SELECT,
schema: The BigQuery standard Schema structure,
mode: override or append mode,
create_mode: create or never create
Parameters
  • project_id (str) – The GCP Project id

  • kwargs (dict) – Additional arguments

table(table: str)[source]

Specify the table name with dataset.name format

Parameters

table (str) – BigQuery full table name without project id

format(format_: str)[source]

Specify the format of import/export files, default NEWLINE_DELIMITED_JSON

  • AVRO Specifies Avro format.

  • CSV Specifies CSV format.

  • DATASTORE_BACKUP Specifies datastore backup format

  • NEWLINE_DELIMITED_JSON Specifies newline delimited JSON format.

  • ORC Specifies Orc format.

  • PARQUET Specifies Parquet format.

Parameters

format (str) – [description]

gcs(gcs: str)[source]

Specify the GCS location, single file or wildcard

Parameters

gcs (str) – must start with gs://

sql(sql: str)[source]

Specify the SQL statement

Only one statement is allowed, and only support SELECT and WITH as of now

Parameters

sql (str) – must start with select

schema(schema: List[google.cloud.bigquery.schema.SchemaField])[source]

Specify the table schema

Parameters

schema (List[bigquery.SchemaField]) – A list of SchemaField definition

mode(mode: str)[source]

Set the bigquery write_disposition parameter, default WRITE_APPEND

  • WRITE_EMPTY This job should only be writing to empty tables.

  • WRITE_TRUNCATE This job will truncate table data and write from the beginning.

  • WRITE_APPEND This job will append to a table.

Parameters

mode (str) – must be one of above value

Raises

ValueError – when the value is not allowed

create_mode(create_mode: str)[source]

Set the bigquery create_disposition parameter, default CREATE_IF_NEEDED

  • CREATE_NEVER This job should never create tables.

  • CREATE_IF_NEEDED This job should create a table if it doesn’t already exist.

Parameters

create_mode (str) – must be one of above value

query()[source]

Run the given sql query, return rows or save to table

If the table attribute is set, it will save the query result to that table, otherwise it returns the BigQuery rows

load(location: str = 'US')int[source]

Run the LoadJob, and return number of rows loaded

.table(), .gcs() must be called to run this method. .schema() is optional, if not specified, using autodetect

.mode(), .create_mode() and .format() are optional, as they have default values.

Parameters

location (str) – must be same as your dataset, default US

export()[source]

Not implemented yet

truncate()[source]

Delete all rows in the given table

.table() must be called before calling this method to speicfy which table to be truncated

create(ok_exists=False)[source]

Create the table with given schema

.schema() and .table() must be called before invoke this method The schema should be a list of bigquery.SchemaField()

drop()[source]

Alias of .delete()

delete()[source]

Drop the given table

.table() must be called before calling this method to speicfy which table to be dropped. No error will be raised if the table is not found.

create_dataset(dataset: str, location='US', timeout=30)[source]

Create the given dataset

Parameters
  • dataset (str) – The dataset id without project id

  • location (str, optional) – A BigQuery location, defaults to “US”

  • timeout (int, optional) – The timeout in second, defaults to 30

delete_dataset(dataset: str)[source]

Delete (or drop) the given dataset

Parameters

dataset (str) – the dataset id, without project_id

is_exist()bool[source]

Check if a given table exists

.table() must be called before calling this method to speicfy which table to be checked.

class gfluent.GCS(project: str, **kwargs)[source]
local(path: str, suffix: Optional[str] = None)[source]

Specify the local path, could be a directory or a file

Parameters
  • path (str, Optional) – directory or file

  • path – the suffix of included files

Raises

ValueError – if path not found as a file or directory

bucket(bucket: str)[source]

Specify the bucket name without gs://

Parameters

bucket (str) – bucket name without gs://

prefix(prefix: str)[source]

Specify the blob prefix

Parameters

prefix (str) – without the ending /

upload()[source]

Upload file(s) to GCS with given prefix

download()[source]

Download file from the given prefix to local folder

The prefix of the blob object will be ignored,

gs://bucket/folder1/abc.txt will be downloaded to /var/temp/abc.txt if the .local('var/temp') is set.

class gfluent.Sheet(credential_or_path: Union[google.oauth2.service_account.Credentials, str], **kwargs)[source]

The fluent-style Google Sheet for chaining class

This Sheet class provides the interface to load Spreadsheet data to Bigquery table even in one line. The destiniation table must be a new table, and not exist in the same dataset.

Examples:

# use the headers from spread sheet and auto detect the type
(
    Sheet('google-sa-credential-or-path')
    .sheet_id('your-sheet-id')
    .worksheet('sheet_name!A1:B100')    # provide the range in one-go
    .bq(BQ(projec_id='project-id', table='dataset.table')
).load()

# use given schema definition
schemas = [
    bigquery.SchemeField(...),
    bigquery.SchemeField(...),
    bigquery.SchemeField(...),
]
(
    Sheet('google-sa-credential-or-path')
    .url('google-sheet-url') # the sheet id will be extracted automatically
    .worksheet('sheet_name')
    .range('A1:B100') # provide the range in separate call
    .bq(BQ(projec_id='project-id', table='dataset.table')
    .schema(schemas)
).load()
Parameters

credential_or_path (Union[Credentials, str]) – the ``service_account.Credentials` object or file path

sheet_id(sheet_id: str)[source]

Specify the UID of Google Sheet

Parameters

sheet_id (str) – The UID of Google Spreadsheet

url(url: str)[source]

Pass the Google sheet URL

Parameters

url (str) – The full URL of Google Sheet

schema(schema: List[google.cloud.bigquery.schema.SchemaField])[source]

Set the schema for desitnation table

Parameters

schema (List[bigquery.SchemaField]) – The list of fields

worksheet(worksheet: str)[source]

Specify the worksheet name with or without range

The first row in the range is considered as header row, and it is not able to be skipped.

Valid values: sheet_name!A1:B3 or just sheet_name

Parameters

worksheet (str) – The worksheet with range or only worksheet name

range(range: str)[source]

Specify the worksheet data range in A1:B4 form

The client library doesn’t check if the range is valid, if any syntax error with the range, Google Sheet will raise the exception

The first row in the range is considered as header row, and it is not able to be skipped.

bq(bq: gfluent.bq.BQ)[source]

use project id and other params to initial bq object

Parameters

bq – The BQ instance

Type

gfluent.BQ

load(location: str = 'US')[source]

Load the Data to BigQuery table

Please use .bq.table() to set the destination table name, and the table must not exists, otherwise the Conflict exception will be raised.

Parameters

location – The BigQuery location, default is US

Type

str

Raises

google.api_core.exceptions.Conflict – table already exists exception.

Indices and tables