gfluent’s documentation 0.1.13¶
This is a wrapper on Google Cloud Platform Python SDK client library. It provides a fluent-style to call the methods, here is an example,
from gfluent import BQ
project_id = "here-is-you-project-id"
bq = BQ(project_id, table="mydataset.table")
count = (
bq.mode("WRITE_APPEND")
.sql("SELECT name, age from dataset.tabble")
.query()
)
print(f"{count} rows loaded")
API Reference¶
-
class
gfluent.BQ(project: str, **kwargs)[source]¶ The fluent-style BigQuery client for chaining calls
Example:
# run the query and save to the table dataset.name bq = BQ(project='you-project-id', table='dataset.name') num_rows = bq.mode('CREATE_TRUNCATE').sql('select * from table').query() bq = BQ(project='you-project-id') rows = bq.sql('select id, name from abc.tab').query() for row in rows: print(row.id, row.name)
Allowed additional arguments,
table: The BigQuery full tablename with dataset, gcs: The GCS location with gs:// prefix, sql: SQL Statement should start with SELECT, schema: The BigQuery standard Schema structure, mode: override or append mode, create_mode: create or never create
- Parameters
project_id (str) – The GCP Project id
kwargs (dict) – Additional arguments
-
table(table: str)[source]¶ Specify the table name with dataset.name format
- Parameters
table (str) – BigQuery full table name without project id
-
format(format_: str)[source]¶ Specify the format of import/export files, default NEWLINE_DELIMITED_JSON
AVROSpecifies Avro format.CSVSpecifies CSV format.DATASTORE_BACKUPSpecifies datastore backup formatNEWLINE_DELIMITED_JSONSpecifies newline delimited JSON format.ORCSpecifies Orc format.PARQUETSpecifies Parquet format.
- Parameters
format (str) – [description]
-
gcs(gcs: str)[source]¶ Specify the GCS location, single file or wildcard
- Parameters
gcs (str) – must start with
gs://
-
sql(sql: str)[source]¶ Specify the SQL statement
Only one statement is allowed, and only support
SELECTandWITHas of now- Parameters
sql (str) – must start with
select
-
schema(schema: List[google.cloud.bigquery.schema.SchemaField])[source]¶ Specify the table schema
- Parameters
schema (List[bigquery.SchemaField]) – A list of
SchemaFielddefinition
-
mode(mode: str)[source]¶ Set the bigquery
write_dispositionparameter, default WRITE_APPENDWRITE_EMPTY This job should only be writing to empty tables.
WRITE_TRUNCATE This job will truncate table data and write from the beginning.
WRITE_APPEND This job will append to a table.
- Parameters
mode (str) – must be one of above value
- Raises
ValueError – when the value is not allowed
-
create_mode(create_mode: str)[source]¶ Set the bigquery
create_dispositionparameter, default CREATE_IF_NEEDEDCREATE_NEVER This job should never create tables.
CREATE_IF_NEEDED This job should create a table if it doesn’t already exist.
- Parameters
create_mode (str) – must be one of above value
-
query()[source]¶ Run the given sql query, return rows or save to table
If the
tableattribute is set, it will save the query result to that table, otherwise it returns the BigQuery rows
-
load(location: str = 'US') → int[source]¶ Run the
LoadJob, and return number of rows loaded.table(),.gcs()must be called to run this method..schema()is optional, if not specified, usingautodetect.mode(),.create_mode()and.format()are optional, as they have default values.- Parameters
location (str) – must be same as your dataset, default
US
-
truncate()[source]¶ Delete all rows in the given table
.table()must be called before calling this method to speicfy which table to be truncated
-
create(ok_exists=False)[source]¶ Create the table with given schema
.schema()and.table()must be called before invoke this method The schema should be a list ofbigquery.SchemaField()
-
delete()[source]¶ Drop the given table
.table()must be called before calling this method to speicfy which table to be dropped. No error will be raised if the table is not found.
-
create_dataset(dataset: str, location='US', timeout=30)[source]¶ Create the given dataset
- Parameters
dataset (str) – The dataset id without project id
location (str, optional) – A BigQuery location, defaults to “US”
timeout (int, optional) – The timeout in second, defaults to 30
-
class
gfluent.GCS(project: str, **kwargs)[source]¶ -
local(path: str, suffix: Optional[str] = None)[source]¶ Specify the local path, could be a directory or a file
- Parameters
path (str, Optional) – directory or file
path – the suffix of included files
- Raises
ValueError – if path not found as a file or directory
-
-
class
gfluent.Sheet(credential_or_path: Union[google.oauth2.service_account.Credentials, str], **kwargs)[source]¶ The fluent-style Google Sheet for chaining class
This
Sheetclass provides the interface to load Spreadsheet data to Bigquery table even in one line. The destiniation table must be a new table, and not exist in the same dataset.Examples:
# use the headers from spread sheet and auto detect the type ( Sheet('google-sa-credential-or-path') .sheet_id('your-sheet-id') .worksheet('sheet_name!A1:B100') # provide the range in one-go .bq(BQ(projec_id='project-id', table='dataset.table') ).load() # use given schema definition schemas = [ bigquery.SchemeField(...), bigquery.SchemeField(...), bigquery.SchemeField(...), ] ( Sheet('google-sa-credential-or-path') .url('google-sheet-url') # the sheet id will be extracted automatically .worksheet('sheet_name') .range('A1:B100') # provide the range in separate call .bq(BQ(projec_id='project-id', table='dataset.table') .schema(schemas) ).load()
- Parameters
credential_or_path (Union[Credentials, str]) – the ``service_account.Credentials` object or file path
-
sheet_id(sheet_id: str)[source]¶ Specify the UID of Google Sheet
- Parameters
sheet_id (str) – The UID of Google Spreadsheet
-
url(url: str)[source]¶ Pass the Google sheet URL
- Parameters
url (str) – The full URL of Google Sheet
-
schema(schema: List[google.cloud.bigquery.schema.SchemaField])[source]¶ Set the schema for desitnation table
- Parameters
schema (List[bigquery.SchemaField]) – The list of fields
-
worksheet(worksheet: str)[source]¶ Specify the worksheet name with or without range
The first row in the range is considered as
header row, and it is not able to be skipped.Valid values:
sheet_name!A1:B3or justsheet_name- Parameters
worksheet (str) – The worksheet with range or only worksheet name
-
range(range: str)[source]¶ Specify the worksheet data range in A1:B4 form
The client library doesn’t check if the range is valid, if any syntax error with the range, Google Sheet will raise the exception
The first row in the range is considered as
header row, and it is not able to be skipped.
-
bq(bq: gfluent.bq.BQ)[source]¶ use project id and other params to initial bq object
- Parameters
bq – The
BQinstance- Type
-
load(location: str = 'US')[source]¶ Load the Data to BigQuery table
Please use
.bq.table()to set the destination table name, and the table must not exists, otherwise theConflictexception will be raised.- Parameters
location – The BigQuery location, default is
US- Type
str
- Raises
google.api_core.exceptions.Conflict – table already exists exception.