βοΈS3 Object Storage
Instructions for installation and configuration of S3 Object Storage agent
We recommend that you follow the Installation instructions from the UI. This can be found on app.lariatdata.com by clicking on the Integrations tab and going to "Add new integration" as outlined in the Installation & Configuration page.
To describe what the install entails, we outline how you could do this outside of the UI. If you are interested in the code that powers the install, please take a look here https://github.com/lariat-data/install-aws-s3-agent.
What do you need for the install?
docker
Your Cloud Account ID and region
Your own Cloud account keys
Note: If you are running the install outside of the UI, you will need your Lariat API Key and Lariat Application Key and Lariat generated Cloud access keys (e.g. AWS key and secret key)
Installation command
If using the UI, copy the installation command and fill in the unpopulated fields.
Here is what the command looks like:
docker run -it \
--mount type=bind,source=/local/path/to/config/s3_agent.yaml,target=/workspace/s3_agent.yaml,readonly \
-e AWS_REGION={YOUR_AWS_REGION} \
-e AWS_ACCOUNT_ID={YOUR_AWS_ACCOUNT_ID} \
-e AWS_ACCESS_KEY_ID=$(aws configure get aws_access_key_id) \
-e AWS_SECRET_ACCESS_KEY=$(aws configure get aws_secret_access_key) \
-e AWS_SESSION_TOKEN=$(aws configure get aws_session_token) \
-e LARIAT_TMP_AWS_ACCESS_KEY_ID={PREFILLED_BY_UI} \
-e LARIAT_TMP_AWS_SECRET_ACCESS_KEY={PREFILLED_BY_UI} \
-e LARIAT_API_KEY={PREFILLED_BY_UI} \
-e LARIAT_APPLICATION_KEY={PREFILLED_BY_UI} \
-e LARIAT_EVENT_NAME=sns_s3_trigger \
-e LARIAT_PAYLOAD_SOURCE=s3 \
lariatdata/install-aws-s3-agent:latest installIf you do not have access to the UI and need LARIAT_TMP_AWS_ACCESS_KEY_ID and LARIAT_TMP_AWS_SECRET_ACCESS_KEY you will have to reach out to [email protected].
The config below matches an s3 path like so:
s3://s3-bucket-prefix/partition_seperated_val_1=1010/partition_seperated_val_2=some_source/fixed_value_1/2024-10/day=05
The config is stored on your object storage path that has the following format: lariat-s3-default-config followed by a timestamp prefix.
The above is the structure of the configuration YAML.
Below are descriptions for each field, and a demonstrative fully filled in YAML.
your_bucket_name:This is the bucket name being tracked by the monitoring agents3-bucket-prefix:Only monitor objects under this prefix (i.e. follows the patterns3://your-bucket-name/s3-bucket-prefixkey_val_partition_separator:The seperator used in object keys to denote key-value pairs. Any character allowed here. (E.g. if you have any s3 path like s3://my-bucket/partner_id=1234, the partition_separator for {partner_id} would be "=") N.B: This isn't the "/" file directory partition separator.suffix_templateThe variable patterns that you want to track objects for, including any partition variables that need to be captured.{partition_seperated_var_1}:when wrapped in braces the pattern being matched is a key value partition pair that is separated by partition_separator. The variable will be namedpartition_seperated_var_1and can be referred to as such when naming columns and defining timestampsfixed_value_1:when there are no braces or angle-brackets, this part of the suffix just represents a static part of the object key<unpartitioned_var_1>:when wrapped in angular brackets, this represents a variable that doesn't have a partition key but that we still want to assign a name and can be referred to as such when naming columns and defining timestampsfile_typewe currently support the following file_types:jsonl- Line seperated json (every line is a json record)json- Json filecsv- CSV fileparquet- parquet file
name- Name to refer to this dataset bycolumnsThe columns expected in the dataset that we want to track statisticsstring- The listed columns here represent the columns of type string in the objectnumber- The columns under this list represent the columns of type number in the object
dimensions- This is the granularity at which we want the above statistics to be computed. They represent the columns we want to group & filter by.timestamp- Time is a first class citizen in the Lariat platform. You can construct a time field so that you can see time series of statistics over time.timestamp_col_name_1Represents the name of the timestamp column to be defined. You can either construct by combining existing fields or by directly using a field.columnDefinition of the column. There are two ways to define a column:col_name - directly put in a column name. You will be able to specify a timezone and format
combine columns - combine columns like so "{partition_seperated_var_1}-{partition_seperated_var_2}" and specify a format. So if the values of partition_seperated_var_1 is 2023-02 and partition_seperated_var_2 is 10, you can specify a format of
%Y-%m-%dto construct a time field.
format- This is time format code that extends the 1989 C standard. Details can be found here. We also support a special "unixtime" format that parses an int unixtime.primary- If specified and set to True, this will be the primary timestamp used for dashboarding and downstream analytics. If not set, the first defined timestamp is treated as primary.
source_idThis is the unique source_id representing the agent. This is used to make sure we don't conflate data across regions or permission boundaries. N.B: Make sure no other s3 object storage configuration shares the same source_id
Once setup is complete, and the configuration has been defined, you can start to inspect events on the platform.
At app.lariatdata.com, select the "Object Events" option from the sidebar. The screenshot below shows what the sidebar looks like.

Last updated