gfluent’s documentation 0.1.8¶
This is a wrapper on Google Cloud Platform Python SDK client library. It provides a fluent-style to call the methods, here is an example,
from gfluent import BQ
project_id = "here-is-you-project-id"
bq = BQ(project_id, table="mydataset.table")
count = (
bq.mode("WRITE_APPEND")
.sql("SELECT name, age from dataset.tabble")
.query()
)
print(f"{count} rows loaded")
API Reference¶
-
class
gfluent.bq.BQ(project: str, **kwargs)[source]¶ The fluent-style BigQuery client for chaining calls
Example:
# run the query and save to the table dataset.name bq = BQ(project='you-project-id', table='dataset.name') num_rows = bq.mode('CREATE_TRUNCATE').sql('select * from table').query() bq = BQ(project='you-project-id') rows = bq.sql('select id, name from abc.tab').query() for row in rows: print(row.id, row.name)
Allowed additional arguments,
table: The BigQuery full tablename with dataset, gcs: The GCS location with gs:// prefix, sql: SQL Statement should start with SELECT, schema: The BigQuery standard Schema structure, mode: override or append mode, create_mode: create or never create
- Parameters
project_id (str) – The GCP Project id
kwargs (dict) – Additional arguments
-
table(table: str)[source]¶ Specify the table name with dataset.name format
- Parameters
table (str) – BigQuery full table name without project id
-
format(format_: str)[source]¶ Specify the format of import/export files, default NEWLINE_DELIMITED_JSON
AVROSpecifies Avro format.CSV SpecifiesCSV format.DATASTORE_BACKUPSpecifies datastore backup formatNEWLINE_DELIMITED_JSONSpecifies newline delimited JSON format.ORCSpecifies Orc format.PARQUETSpecifies Parquet format.
- Parameters
format (str) – [description]
-
gcs(gcs: str)[source]¶ Specify the GCS location, single file or wildcard
- Parameters
gcs (str) – must start with
gs://
-
sql(sql: str)[source]¶ Specify the SQL statement
Only one statement is allowed, and only support
SELECTas of now- Parameters
sql (str) – must start with
select
-
schema(schema: List[google.cloud.bigquery.schema.SchemaField])[source]¶ Specify the table schema
- Parameters
schema (List[bigquery.SchemaField]) – A list of
SchemaFielddefinition
-
mode(mode: str)[source]¶ Set the bigquery
write_dispositionparameter, default WRITE_APPENDWRITE_EMPTY This job should only be writing to empty tables.
WRITE_TRUNCATE This job will truncate table data and write from the beginning.
WRITE_APPEND This job will append to a table.
- Parameters
mode (str) – must be one of above value
- Raises
ValueError – when the value is not allowed
-
create_mode(create_mode: str)[source]¶ Set the bigquery
create_dispositionparameter, default CREATE_IF_NEEDEDCREATE_NEVER This job should never create tables.
CREATE_IF_NEEDED This job should create a table if it doesn’t already exist.
- Parameters
create_mode (str) – must be one of above value
-
query()[source]¶ Run the given sql query, return rows or save to table
If the
tableattribute is set, it will save the query result to that table, otherwise it returns the BigQuery rows
-
load(location: str = 'US') → int[source]¶ Run the
LoadJob, and return number of rows loaded.table(),.gcs()must be called to run this method..schema()is optional, if not specified, using ``autodetect`.mode(),.create_mode()and.format()are optional, as they have default values.- Parameters
location (str) – must be same as your dataset, default
US
-
truncate()[source]¶ Delete all rows in the given table
.table()must be called before calling this method to speicfy which table to be truncated
-
delete()[source]¶ Drop the given table
.table()must be called before calling this method to speicfy which table to be dropped. No error will be raised if the table is not found.
-
create_dataset(dataset: str, location='US', timeout=30)[source]¶ Create the given dataset
- Parameters
dataset (str) – The dataset id without project id
location (str, optional) – A BigQuery location, defaults to “US”
timeout (int, optional) – The timeout in second, defaults to 30
-
class
gfluent.gcs.GCS(project: str, **kwargs)[source]¶ -
local(path: str, suffix: Optional[str] = None)[source]¶ Specify the local path, could be a directory or a file
- Parameters
path (str, Optional) – directory or file
path – the suffix of included files
- Raises
ValueError – if path not found as a file or directory
-