# Google Cloud Platform

# BigQuery Analytics Hub

Analytics Hub is a mechanism for sharing datasets between BigQuery users. Google's official product documentation for Analytics Hub is [here](https://cloud.google.com/bigquery/docs/analytics-hub-introduction#subscriber_workflow)

### Subscriber Workflow

- A dataset publisher creates and shares a Linked Dataset with the subscriber - this essentially becomes a symbolic link to the original dataset that the subscriber can access.
- In this model the person querying the data pays for it.

# BigQuery

Google BigQuery is a data warehouse platform.

### Schema

You can define table schemas via JSON documents which get ingested at the same time as your data using the `bq` tool.

#### Data Types

Data Types List from [google documentation](https://cloud.google.com/bigquery/docs/schemas#standard_sql_data_types)

<table id="bkmrk-name-data-type-descr"><thead><tr><th>Name</th><th>Data type</th><th>Description</th></tr></thead><tbody><tr><td>[Integer](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#integer_type)</td><td>`INT64`</td><td>Numeric values without fractional components</td></tr><tr><td>[Floating point](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#floating_point_types)</td><td>`FLOAT64`</td><td>Approximate numeric values with fractional components</td></tr><tr><td>[Numeric](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#numeric_type)</td><td>`NUMERIC`</td><td>Exact numeric values with fractional components</td></tr><tr><td>[BigNumeric](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#bignumeric_type)</td><td>`BIGNUMERIC`</td><td>Exact numeric values with fractional components</td></tr><tr><td>[Boolean](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#boolean_type)</td><td>`BOOL`</td><td>TRUE or FALSE (case-insensitive)</td></tr><tr><td>[String](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#string_type)</td><td>`STRING`</td><td>Variable-length character (Unicode) data</td></tr><tr><td>[Bytes](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#bytes_type)</td><td>`BYTES`</td><td>Variable-length binary data</td></tr><tr><td>[Date](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#date_type)</td><td>`DATE`</td><td>A logical calendar date</td></tr><tr><td>[Date/Time](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#datetime_type)</td><td>`DATETIME`</td><td>A year, month, day, hour, minute, second, and subsecond</td></tr><tr><td>[Time](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#time_type)</td><td>`TIME`</td><td>A time, independent of a specific date</td></tr><tr><td>[Timestamp](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#timestamp_type)</td><td>`TIMESTAMP`</td><td>An absolute point in time, with microsecond precision</td></tr><tr><td>[Struct (Record)](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#struct_type)</td><td>`STRUCT`</td><td>Container of ordered fields each with a type (required) and field name (optional)</td></tr><tr><td>[Geography](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#geography_type)</td><td>`GEOGRAPHY`</td><td>A pointset on the Earth's surface (a set of points, lines and polygons on the [WGS84](http://earth-info.nga.mil/GandG/update/index.php) reference spheroid, with geodesic edges)</td></tr><tr><td>[JSON](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#json_type)</td><td>`JSON`</td><td>Represents JSON, a lightweight data-interchange format</td></tr></tbody></table>

####  

#### Differences between JSON and Record/Struct

JSON type allows you to ingest JSON without pre-defining the schema whereas a record/struct must be pre-defined and all the fields must be known in advance.

JSON fields are more fiddly to query and work with in general. It seems like you can't do things like [UNNEST](https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays) them.

#### Nested/Repeated Columns

To allow a column (or object) to repeat (e.g. to have an array of values) you must use `mode: NESTED` in your schema.

[See Nested and repeated columns documentation](https://cloud.google.com/bigquery/docs/nested-repeated)

# BigQuery Working with Array Structures

On a companies table where you can have multiple websites per company:

```sql
SELECT * 
	FROM 
    	project_namespace.dataset.companies as companies, 
    	UNNEST (companies.websites) as website 
    LIMIT 1000
```

The above would produce a flattened table structure where company id is repeated per website.