gfluent’s documentation 0.1.8

This is a wrapper on Google Cloud Platform Python SDK client library. It provides a fluent-style to call the methods, here is an example,

from gfluent import BQ

project_id = "here-is-you-project-id"
bq = BQ(project_id, table="mydataset.table")

count = (
   bq.mode("WRITE_APPEND")
     .sql("SELECT name, age from dataset.tabble")
     .query()
   )

print(f"{count} rows loaded")

API Reference

class gfluent.bq.BQ(project: str, **kwargs)[source]

The fluent-style BigQuery client for chaining calls

Example:

# run the query and save to the table dataset.name
bq = BQ(project='you-project-id', table='dataset.name')
num_rows = bq.mode('CREATE_TRUNCATE').sql('select * from table').query()

bq = BQ(project='you-project-id')

rows = bq.sql('select id, name from abc.tab').query()
for row in rows:
    print(row.id, row.name)

Allowed additional arguments,

table: The BigQuery full tablename with dataset,
gcs: The GCS location with gs:// prefix,
sql: SQL Statement should start with SELECT,
schema: The BigQuery standard Schema structure,
mode: override or append mode,
create_mode: create or never create
Parameters
  • project_id (str) – The GCP Project id

  • kwargs (dict) – Additional arguments

table(table: str)[source]

Specify the table name with dataset.name format

Parameters

table (str) – BigQuery full table name without project id

format(format_: str)[source]

Specify the format of import/export files, default NEWLINE_DELIMITED_JSON

  • AVRO Specifies Avro format.

  • CSV Specifies CSV format.

  • DATASTORE_BACKUP Specifies datastore backup format

  • NEWLINE_DELIMITED_JSON Specifies newline delimited JSON format.

  • ORC Specifies Orc format.

  • PARQUET Specifies Parquet format.

Parameters

format (str) – [description]

gcs(gcs: str)[source]

Specify the GCS location, single file or wildcard

Parameters

gcs (str) – must start with gs://

sql(sql: str)[source]

Specify the SQL statement

Only one statement is allowed, and only support SELECT as of now

Parameters

sql (str) – must start with select

schema(schema: List[google.cloud.bigquery.schema.SchemaField])[source]

Specify the table schema

Parameters

schema (List[bigquery.SchemaField]) – A list of SchemaField definition

mode(mode: str)[source]

Set the bigquery write_disposition parameter, default WRITE_APPEND

  • WRITE_EMPTY This job should only be writing to empty tables.

  • WRITE_TRUNCATE This job will truncate table data and write from the beginning.

  • WRITE_APPEND This job will append to a table.

Parameters

mode (str) – must be one of above value

Raises

ValueError – when the value is not allowed

create_mode(create_mode: str)[source]

Set the bigquery create_disposition parameter, default CREATE_IF_NEEDED

  • CREATE_NEVER This job should never create tables.

  • CREATE_IF_NEEDED This job should create a table if it doesn’t already exist.

Parameters

create_mode (str) – must be one of above value

query()[source]

Run the given sql query, return rows or save to table

If the table attribute is set, it will save the query result to that table, otherwise it returns the BigQuery rows

load(location: str = 'US')int[source]

Run the LoadJob, and return number of rows loaded

.table(), .gcs() must be called to run this method. .schema() is optional, if not specified, using ``autodetect`

.mode(), .create_mode() and .format() are optional, as they have default values.

Parameters

location (str) – must be same as your dataset, default US

export()[source]

Not implemented yet

truncate()[source]

Delete all rows in the given table

.table() must be called before calling this method to speicfy which table to be truncated

create()[source]

Not implemented yet

delete()[source]

Drop the given table

.table() must be called before calling this method to speicfy which table to be dropped. No error will be raised if the table is not found.

create_dataset(dataset: str, location='US', timeout=30)[source]

Create the given dataset

Parameters
  • dataset (str) – The dataset id without project id

  • location (str, optional) – A BigQuery location, defaults to “US”

  • timeout (int, optional) – The timeout in second, defaults to 30

delete_dataset(dataset: str)[source]

Delete (or drop) the given dataset

Parameters

dataset (str) – the dataset id, without project_id

is_exist()bool[source]

Check if a given table exists

.table() must be called before calling this method to speicfy which table to be checked.

class gfluent.gcs.GCS(project: str, **kwargs)[source]
local(path: str, suffix: Optional[str] = None)[source]

Specify the local path, could be a directory or a file

Parameters
  • path (str, Optional) – directory or file

  • path – the suffix of included files

Raises

ValueError – if path not found as a file or directory

bucket(bucket: str)[source]

Specify the bucket name without gs://

Parameters

bucket (str) – bucket name without gs://

prefix(prefix: str)[source]

Specify the blob prefix

Parameters

prefix (str) – without the ending /

upload()[source]

Upload file(s) to GCS with given prefix

download()[source]

Download file from the given prefix to local folder

The prefix of the blob object will be ignored,

gs://bucket/folder1/abc.txt will be downloaded to /var/temp/abc.txt if the .local('var/temp') is set.

Indices and tables