# GCS Object Storage

{% tabs %}
{% tab title="Installing" %}
We recommend that you follow the Installation instructions from the UI. This can be found on app.lariatdata.com by clicking on the Integrations tab and going to "Add new integration" as outlined in the [Installation & Configuration ](/fundamentals/installation-and-configuration.md)page. &#x20;

To describe what the install entails, we outline how you could do this outside of the UI. If you are interested in the code that powers the install, please take a look [here](https://github.com/lariat-data/install-gcp-gcs-agent)&#x20;

#### What do you need for the install?&#x20;

* docker&#x20;
* Your Organization ID&#x20;
* Your Project ID&#x20;
* Your application-default access-tokens in a file called `gcloud_access_token`
  * `gcloud auth application-default print-access-token > gcloud_access_token`

Note: If you are running the install outside of the UI, you will need your Lariat API Key and Lariat Application Key and Lariat generated Cloud access keys (e.g. AWS key and secret key)

#### Installation command&#x20;

If using the UI, copy the installation command and fill in the unpopulated fields.&#x20;

Here is what the command looks like:&#x20;

{% code overflow="wrap" %}

```bash
docker run -it --pull=always --mount type=bind,source=$PWD/gcs_agent.yaml,target=/workspace/gcs_agent.yaml,readonly --mount type=bind,source=$PWD/gcloud_access_token,target=/workspace/gcloud_access_token,readonly -e LARIAT_API_KEY={YOUR_API_KEY} -e LARIAT_APPLICATION_KEY={YOUR_APPLICATION_KEY} -e LARIAT_TMP_AWS_ACCESS_KEY_ID={PREFILLED_BY_UI} -e LARIAT_TMP_AWS_SECRET_ACCESS_KEY={PREFILLED_BY_UI} -e GCP_ORGANIZATION_ID={YOUR_GCP_ORGANIZATION_ID} -e GCP_REGION={GCP_REGION} -e GCP_PROJECT_ID={YOUR_PROJECT_ID} lariatdata/install-gcp-gcs-agent:latest install
```

{% endcode %}

If you do not have access to the UI and need LARIAT\_TMP\_AWS\_ACCESS\_KEY\_ID and LARIAT\_TMP\_AWS\_SECRET\_ACCESS\_KEY you will have to reach out to <support@lariatdata.com>
{% endtab %}

{% tab title="Configuring" %}
The config below matches an gcs path like so:&#x20;

`gcs://gcs-bucket-prefix/partition_seperated_val_1=1010/partition_seperated_val_2=some_source/fixed_value_1/2024-10/day=05`&#x20;

The config is stored on your object storage path that has the following format: `lariat-gcs-config` followed by a timestamp prefix.

{% code overflow="wrap" %}

```yaml
 your_bucket_name:
           - prefix: "gcs-bucket-prefix"
             key_val_partition_separator: "="
             suffix_template: "{partition_seperated_var_1}/{partition_seperated_var_2}/fixed_value_1/<unpartitioned_var_1>/{partition_seperated_var_3}"
             file_type: avro
             name: my_dataset_name 
             columns:
               string:
                 -  string_column_for_stats_1
                 -  string_column_for_stats_2
               number:
                - num_column_for_stats_1
                - num_column_for_stats_2
             dimensions:
              - dimension_field_1
              - dimension_field_2
             timestamp:
               timestamp_col_name_1:
                column: "{unpartitioned_var_1}-{partition_seperated_var_2}"
                format: "%Y-%m-%d"
                timezone: "UTC"
               timestamp_col_name_2:
                column: timestamp_column
                format: "unixtime"
                primary: True
               timestamp_col_name_3:
                column: timestamp_column_other
                format: "ISO8601"
                primary: True
source_id: unique-source-id-name
```

{% endcode %}

The above is the structure of the configuration YAML.

Below are descriptions for each field, and a demonstrative fully filled in YAML. &#x20;

* `your_bucket_name:` This is the bucket name being tracked by the monitoring agent
* &#x20;`gcs-bucket-prefix:`Only monitor objects under this prefix (i.e. follows the pattern `gcs://your-bucket-name/gcs-bucket-prefix/{partitions}`
* `key_val_partition_separator:` The seperator used in object keys to denote key-value pairs. Any character allowed here. (E.g. if you have any gcs path like gcs\://my-bucket/partner\_id=1234, the partition\_separator for {partner\_id} would be "=")  \
  **N.B: This isn't the "/" file directory partition separator.**&#x20;
* `suffix_template` The variable patterns that you want to track objects for, including any partition variables that need to be captured.&#x20;
  * `{partition_seperated_var_1}:` when wrapped in braces the pattern being matched is a key value partition pair that is separated by partition\_separator.  The variable will be named `partition_seperated_var_1` and can be referred to as such when naming columns and defining timestamps
  * `fixed_value_1:` when there are no braces or angle-brackets, this part of the suffix just represents a static part of the object key&#x20;
  * `<unpartitioned_var_1>:` when wrapped in angular brackets, this represents a variable that doesn't have a partition key but that we still want to assign a name and can be referred to as such when naming columns and defining timestamps
  * `file_type` we currently support the following file\_types:&#x20;
    * `jsonl` - Line seperated json (every line is a json record)&#x20;
    * `json` - Json file&#x20;
    * `csv` - CSV file&#x20;
    * `parquet` - parquet file&#x20;
    * `avro` - avro file&#x20;
  * `name` - Name to refer to this dataset by&#x20;
  * `columns` The columns expected in the dataset that we want to track statistics&#x20;
    * `string` - The listed columns here represent the columns of type string in the object
    * `number` - The columns under this list represent the columns of type number in the object&#x20;
  * `dimensions` - This is the granularity at which we want the above statistics to be computed. They represent the columns we want to group & filter by.&#x20;
  * `timestamp` - Time is a first class citizen in the Lariat platform. You can construct a time field so that you can see time series of statistics over time.&#x20;
    * `timestamp_col_name_1` Represents the name of the timestamp column to be defined. You can either construct by combining existing fields or by directly using a field.&#x20;
      * `column`Definition of the column. There are two ways to define a column:&#x20;
        * col\_name - directly put in a column name. You will be able to specify a timezone and format&#x20;
        * combine columns - combine columns like so "{partition\_seperated\_var\_1}-{partition\_seperated\_var\_2}" and specify a format. So if the values of partition\_seperated\_var\_1 is 2023-02 and partition\_seperated\_var\_2 is 10, you can specify a format of `%Y-%m-%d` to construct a time field.
      * `format` - This is time format code that extends the 1989 C standard. Details can be found [here](https://docs.python.org/3/library/datetime.html#format-codes). We also support a special "unixtime" format that parses an int unixtime.&#x20;
      * `primary` - If specified and set to True, this will be the primary timestamp used for dashboarding and downstream analytics. If not set, the first defined timestamp is treated as primary.&#x20;
  * `source_id` This is the unique source\_id representing the agent. This is used to make sure we don't conflate data across regions or permission boundaries. ***N.B: Make sure no other gcs object storage configuration shares the same source\_id*****&#x20;**&#x20;
    {% endtab %}
    {% endtabs %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.lariatdata.com/integrations-data-storage/gcs-object-storage.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
