Core Features
Lariat helps you build, track and alert on health metrics across your data stack
Last updated
Lariat helps you build, track and alert on health metrics across your data stack
Last updated
The Lariat platform collects health metrics via a lightweight agent. These agents are always open-source and live on customer cloud.
Agents capture non-PII information like time, value and user-specified dimensions - (e.g. event_time, distinct count of users, partner_id, country_code, city_name). This information is sent to the Lariat platform for visualization, alerting & other analyses.
All of our agents are setup to install within 5-minutes. The are open source and set up the agent against the data source you specify. E.g.: Running the S3 monitoring agent install will set the agent up to track metrics on a limited subset of specified buckets and prefixes)
One of the first actions of the Agent is to collect schema information of data being tracked. It does this for datasets ranging from s3 objects to Postgres tables.
A user can immediately leverage this schema information to define an indicator (a health metric). In addition, the platform keeps track of schema modifications over time.
Indicators are scalar metrics tracked over time.
Indicators can etiher be set up during configuration through a quickstart menu or custom-built with SQL. In either case, there is a plain ANSI SQL representation supporting joins, aggregations, grouping and more that backs an Indicator.
The screenshot below showcases how you can fast-add common metrics like Count, Distinct to a variety of columns within a dataset.
Focus on building features and only worry about data quality when you receive an alert about missing subsets of data or an SLA violation.
We support both automatic alerting and manual alert creation.
Alerts can be configured to the dimension level (i.e. if you have an indicator that is the distinct count of uuid grouped by countrycode and source_id -> alerts can be configured to occur per country and source)
Get notified over email or via external tools like PagerDuty & Slack.
Add health-metrics to a single dashboard to get a snapshot understanding of a variety of your datasets all in one-place. You can also curate dashboards for other members of your team.
A dataset is any table or type of file that you want to monitor. This could be a database table, a google analytics dataset that is ingested on a daily basis, or a Kafka topic.
If there are computed columns that you rely on a lot, you can create a virtual dataset (like a View) to represent columns that you can then build data quality metrics on. For example, if you have Price & Quantity in your dataset, but want to track the standard deviation of Revenue as a data quality metric, you could create a virtual dataset in Lariat that has a column called revenue. You can do this either via the visual editor below, or use ANSI SQL to define a new schema exactly the way you would to create a view.
Indicators are schema-aware, and let you measure data quality on any collection of fields discovered by Lariat integrations. They are designed with a few properties in mind that make them easy to layer on top of your data stack.
Indicator Queries run:
On a schedule you define
Over a window of data you define
With customizable offsets that let you account for upstream lag
An indicator execution fetches a result set, and computes a numeric value from that result set without modifying any data
This value may be tagged with any dimensions retrievable from the same result set.
If multiple Indicators are defined on top of overlapping result sets, Lariat will attempt to compose them into the lowest possible number of queries to execute against your data sets.
This lets you express Indicator queries at a level of simplicity that is meaningful to your team, rather than overloading them into complex SQL statements
Common Usability Note: In Lariat, the timestamp in a schema needs to be an integer unix epoch timestamp. If your dataset doesn't have this, either use the configuration menu shown below to create a derived column that converts your timestamp field to an integer unix-epoch timestamp or with the appropriate column.
More information about Indicators, including steps for custom creation can be found .