gfluent’s documentation 0.1.8¶

This is a wrapper on Google Cloud Platform Python SDK client library. It provides a fluent-style to call the methods, here is an example,

from gfluent import BQ

project_id = "here-is-you-project-id"
bq = BQ(project_id, table="mydataset.table")

count = (
   bq.mode("WRITE_APPEND")
     .sql("SELECT name, age from dataset.tabble")
     .query()
   )

print(f"{count} rows loaded")

API Reference¶

class gfluent.bq.BQ(project: str, **kwargs)[source]¶

The fluent-style BigQuery client for chaining calls

Example:

# run the query and save to the table dataset.name
bq = BQ(project='you-project-id', table='dataset.name')
num_rows = bq.mode('CREATE_TRUNCATE').sql('select * from table').query()

bq = BQ(project='you-project-id')

rows = bq.sql('select id, name from abc.tab').query()
for row in rows:
    print(row.id, row.name)

Allowed additional arguments,

table: The BigQuery full tablename with dataset,
gcs: The GCS location with gs:// prefix,
sql: SQL Statement should start with SELECT,
schema: The BigQuery standard Schema structure,
mode: override or append mode,
create_mode: create or never create

Parameters

project_id (str) – The GCP Project id
kwargs (dict) – Additional arguments

table(table: str)[source]¶

Specify the table name with dataset.name format

Parameters: table (str) – BigQuery full table name without project id

format(format_: str)[source]¶

Specify the format of import/export files, default NEWLINE_DELIMITED_JSON

AVRO Specifies Avro format.
CSV Specifies CSV format.
DATASTORE_BACKUP Specifies datastore backup format
NEWLINE_DELIMITED_JSON Specifies newline delimited JSON format.
ORC Specifies Orc format.
PARQUET Specifies Parquet format.

Parameters: format (str) – [description]

gcs(gcs: str)[source]¶

Specify the GCS location, single file or wildcard

Parameters: gcs (str) – must start with gs://

sql(sql: str)[source]¶

Specify the SQL statement

Only one statement is allowed, and only support SELECT as of now

Parameters: sql (str) – must start with select

schema(schema: List[google.cloud.bigquery.schema.SchemaField])[source]¶

Specify the table schema

Parameters: schema (List[bigquery.SchemaField]) – A list of SchemaField definition

mode(mode: str)[source]¶

Set the bigquery write_disposition parameter, default WRITE_APPEND

WRITE_EMPTY This job should only be writing to empty tables.
WRITE_TRUNCATE This job will truncate table data and write from the beginning.
WRITE_APPEND This job will append to a table.

Parameters: mode (str) – must be one of above value
Raises: ValueError – when the value is not allowed

create_mode(create_mode: str)[source]¶

Set the bigquery create_disposition parameter, default CREATE_IF_NEEDED

CREATE_NEVER This job should never create tables.
CREATE_IF_NEEDED This job should create a table if it doesn’t already exist.

Parameters: create_mode (str) – must be one of above value

query()[source]¶

Run the given sql query, return rows or save to table

If the table attribute is set, it will save the query result to that table, otherwise it returns the BigQuery rows

load(location: str = 'US') → int[source]¶

Run the LoadJob, and return number of rows loaded

.table(), .gcs() must be called to run this method. .schema() is optional, if not specified, using ``autodetect`

.mode(), .create_mode() and .format() are optional, as they have default values.

Parameters: location (str) – must be same as your dataset, default US

export()[source]¶: Not implemented yet

truncate()[source]¶

Delete all rows in the given table

.table() must be called before calling this method to speicfy which table to be truncated

create()[source]¶: Not implemented yet

delete()[source]¶

Drop the given table

.table() must be called before calling this method to speicfy which table to be dropped. No error will be raised if the table is not found.

create_dataset(dataset: str, location='US', timeout=30)[source]¶

Create the given dataset

Parameters

dataset (str) – The dataset id without project id
location (str, optional) – A BigQuery location, defaults to “US”
timeout (int, optional) – The timeout in second, defaults to 30

delete_dataset(dataset: str)[source]¶

Delete (or drop) the given dataset

Parameters: dataset (str) – the dataset id, without project_id

is_exist() → bool[source]¶

Check if a given table exists

.table() must be called before calling this method to speicfy which table to be checked.

class gfluent.gcs.GCS(project: str, **kwargs)[source]¶

local(path: str, suffix: Optional[str] = None)[source]¶

Specify the local path, could be a directory or a file

Parameters

path (str, Optional) – directory or file
path – the suffix of included files

Raises

ValueError – if path not found as a file or directory

bucket(bucket: str)[source]¶

Specify the bucket name without gs://

Parameters: bucket (str) – bucket name without gs://

prefix(prefix: str)[source]¶

Specify the blob prefix

Parameters: prefix (str) – without the ending /

upload()[source]¶: Upload file(s) to GCS with given prefix

download()[source]¶

Download file from the given prefix to local folder

The prefix of the blob object will be ignored,

gs://bucket/folder1/abc.txt will be downloaded to /var/temp/abc.txt if the .local('var/temp') is set.

gfluent’s documentation 0.1.8¶

API Reference¶

Indices and tables¶

gfluent

Navigation

Related Topics