gfluent’s documentation 0.1.13¶

This is a wrapper on Google Cloud Platform Python SDK client library. It provides a fluent-style to call the methods, here is an example,

from gfluent import BQ

project_id = "here-is-you-project-id"
bq = BQ(project_id, table="mydataset.table")

count = (
   bq.mode("WRITE_APPEND")
     .sql("SELECT name, age from dataset.tabble")
     .query()
   )

print(f"{count} rows loaded")

API Reference¶

class gfluent.BQ(project: str, **kwargs)[source]¶

The fluent-style BigQuery client for chaining calls

Example:

# run the query and save to the table dataset.name
bq = BQ(project='you-project-id', table='dataset.name')
num_rows = bq.mode('CREATE_TRUNCATE').sql('select * from table').query()

bq = BQ(project='you-project-id')

rows = bq.sql('select id, name from abc.tab').query()
for row in rows:
    print(row.id, row.name)

Allowed additional arguments,

table: The BigQuery full tablename with dataset,
gcs: The GCS location with gs:// prefix,
sql: SQL Statement should start with SELECT,
schema: The BigQuery standard Schema structure,
mode: override or append mode,
create_mode: create or never create

Parameters

project_id (str) – The GCP Project id
kwargs (dict) – Additional arguments

table(table: str)[source]¶

Specify the table name with dataset.name format

Parameters: table (str) – BigQuery full table name without project id

format(format_: str)[source]¶

Specify the format of import/export files, default NEWLINE_DELIMITED_JSON

AVRO Specifies Avro format.
CSV Specifies CSV format.
DATASTORE_BACKUP Specifies datastore backup format
NEWLINE_DELIMITED_JSON Specifies newline delimited JSON format.
ORC Specifies Orc format.
PARQUET Specifies Parquet format.

Parameters: format (str) – [description]

gcs(gcs: str)[source]¶

Specify the GCS location, single file or wildcard

Parameters: gcs (str) – must start with gs://

sql(sql: str)[source]¶

Specify the SQL statement

Only one statement is allowed, and only support SELECT and WITH as of now

Parameters: sql (str) – must start with select

schema(schema: List[google.cloud.bigquery.schema.SchemaField])[source]¶

Specify the table schema

Parameters: schema (List[bigquery.SchemaField]) – A list of SchemaField definition

mode(mode: str)[source]¶

Set the bigquery write_disposition parameter, default WRITE_APPEND

WRITE_EMPTY This job should only be writing to empty tables.
WRITE_TRUNCATE This job will truncate table data and write from the beginning.
WRITE_APPEND This job will append to a table.

Parameters: mode (str) – must be one of above value
Raises: ValueError – when the value is not allowed

create_mode(create_mode: str)[source]¶

Set the bigquery create_disposition parameter, default CREATE_IF_NEEDED

CREATE_NEVER This job should never create tables.
CREATE_IF_NEEDED This job should create a table if it doesn’t already exist.

Parameters: create_mode (str) – must be one of above value

query()[source]¶

Run the given sql query, return rows or save to table

If the table attribute is set, it will save the query result to that table, otherwise it returns the BigQuery rows

load(location: str = 'US') → int[source]¶

Run the LoadJob, and return number of rows loaded

.table(), .gcs() must be called to run this method. .schema() is optional, if not specified, using autodetect

.mode(), .create_mode() and .format() are optional, as they have default values.

Parameters: location (str) – must be same as your dataset, default US

export()[source]¶: Not implemented yet

truncate()[source]¶

Delete all rows in the given table

.table() must be called before calling this method to speicfy which table to be truncated

create(ok_exists=False)[source]¶

Create the table with given schema

.schema() and .table() must be called before invoke this method The schema should be a list of bigquery.SchemaField()

drop()[source]¶: Alias of .delete()

delete()[source]¶

Drop the given table

.table() must be called before calling this method to speicfy which table to be dropped. No error will be raised if the table is not found.

create_dataset(dataset: str, location='US', timeout=30)[source]¶

Create the given dataset

Parameters

dataset (str) – The dataset id without project id
location (str, optional) – A BigQuery location, defaults to “US”
timeout (int, optional) – The timeout in second, defaults to 30

delete_dataset(dataset: str)[source]¶

Delete (or drop) the given dataset

Parameters: dataset (str) – the dataset id, without project_id

is_exist() → bool[source]¶

Check if a given table exists

.table() must be called before calling this method to speicfy which table to be checked.

class gfluent.GCS(project: str, **kwargs)[source]¶

local(path: str, suffix: Optional[str] = None)[source]¶

Specify the local path, could be a directory or a file

Parameters

path (str, Optional) – directory or file
path – the suffix of included files

Raises

ValueError – if path not found as a file or directory

bucket(bucket: str)[source]¶

Specify the bucket name without gs://

Parameters: bucket (str) – bucket name without gs://

prefix(prefix: str)[source]¶

Specify the blob prefix

Parameters: prefix (str) – without the ending /

upload()[source]¶: Upload file(s) to GCS with given prefix

download()[source]¶

Download file from the given prefix to local folder

The prefix of the blob object will be ignored,

gs://bucket/folder1/abc.txt will be downloaded to /var/temp/abc.txt if the .local('var/temp') is set.

class gfluent.Sheet(credential_or_path: Union[google.oauth2.service_account.Credentials, str], **kwargs)[source]¶

The fluent-style Google Sheet for chaining class

This Sheet class provides the interface to load Spreadsheet data to Bigquery table even in one line. The destiniation table must be a new table, and not exist in the same dataset.

Examples:

# use the headers from spread sheet and auto detect the type
(
    Sheet('google-sa-credential-or-path')
    .sheet_id('your-sheet-id')
    .worksheet('sheet_name!A1:B100')    # provide the range in one-go
    .bq(BQ(projec_id='project-id', table='dataset.table')
).load()

# use given schema definition
schemas = [
    bigquery.SchemeField(...),
    bigquery.SchemeField(...),
    bigquery.SchemeField(...),
]
(
    Sheet('google-sa-credential-or-path')
    .url('google-sheet-url') # the sheet id will be extracted automatically
    .worksheet('sheet_name')
    .range('A1:B100') # provide the range in separate call
    .bq(BQ(projec_id='project-id', table='dataset.table')
    .schema(schemas)
).load()

Parameters: credential_or_path (Union[Credentials, str]) – the ``service_account.Credentials` object or file path

sheet_id(sheet_id: str)[source]¶

Specify the UID of Google Sheet

Parameters: sheet_id (str) – The UID of Google Spreadsheet

url(url: str)[source]¶

Pass the Google sheet URL

Parameters: url (str) – The full URL of Google Sheet

schema(schema: List[google.cloud.bigquery.schema.SchemaField])[source]¶

Set the schema for desitnation table

Parameters: schema (List[bigquery.SchemaField]) – The list of fields

worksheet(worksheet: str)[source]¶

Specify the worksheet name with or without range

The first row in the range is considered as header row, and it is not able to be skipped.

Valid values: sheet_name!A1:B3 or just sheet_name

Parameters: worksheet (str) – The worksheet with range or only worksheet name

range(range: str)[source]¶

Specify the worksheet data range in A1:B4 form

The client library doesn’t check if the range is valid, if any syntax error with the range, Google Sheet will raise the exception

The first row in the range is considered as header row, and it is not able to be skipped.

bq(bq: gfluent.bq.BQ)[source]¶

use project id and other params to initial bq object

Parameters: bq – The BQ instance
Type: gfluent.BQ

load(location: str = 'US')[source]¶

Load the Data to BigQuery table

Please use .bq.table() to set the destination table name, and the table must not exists, otherwise the Conflict exception will be raised.

Parameters: location – The BigQuery location, default is US
Type: str
Raises: google.api_core.exceptions.Conflict – table already exists exception.

gfluent’s documentation 0.1.13¶

API Reference¶

Indices and tables¶

gfluent

Navigation

Related Topics