Quickstart

Kaskada Architecture Diagram

With Kaskada, you can connect directly to your event-based data and choose to calculate aggregated feature values at any point in time to train models without the risk of leakage. When you’re ready, you can compute the current value of the same features to make predictions using new data and a live model in production.

This article details the first three steps to get started:

  1. Install the Kaskada Python client package and connect to Kaskada service

  2. Bring your data - Create a table and load data

  3. Use queries to explore your data, build your features, and model context

1. Installation is a fast as:

pip install kaskada
export KASKADA_CLIENT_ID="..." export KASKADA_CLIENT_SECRET="..."

The next step is to connect to Kaskada. To do this, you'll need to obtain an API key by logging in to Kaskada's admin page (if you don't have an account, contact us to sign up) and set up your environment:

import kaskada as kda
from kaskada import compute
client = kda.client

2. Bring your data - create a table and load data

All data that Kaskada uses is described by tables, regardless of where the actual data is stored. Tables consist of multiple rows, and each row is a value of the same type. Using the Kaskada client you can create tables and point them to data stored in a variety of locations.

In this process, you’ll need to define how each row should be interpreted by describing two mandatory fields: the time field and entity key associated with each row. 

The code snippet below creates a table named Purchase. Any data loaded into this table must have a timestamp field named purchase_time and a field named customer_id.

tables.create_table(
  table_name = "Purchase",
  time_column_name = "purchase_time",
  entity_key_column_name = "customer_id",
})

Now that we've created a table, we're ready to load some data into it.

tables.upload_file("Purchase", "/path/to/a/file/to/load.parquet")

Now we're ready to make some queries.

3. Iterate on features and build up your model context

Features are composed using Fenl, a feature engineering query language designed for authoring and sharing feature definitions.

Fenl expressions are temporal - they describe how the result of a computation changes over time rather than just the current result. Temporal queries make it easy to reconstruct the information available at arbitrary times in the past.

The rich time-traveling tools provided by Fenl makes it easy to build training datasets free of knowledge of the future.

Before diving deep into Fenl, start with a simple query, a "hello world" to make sure your data has been uploaded correctly. Continuing with the example table you can start with a simple sum over time. The query results below will be computed and returned as a Pandas dataframe:

resp = compute.query(expression = 'Purchase | last() | at("2020-01-01")'")
pandas.read_parquet(resp.parquet.path)

As your queries get more advanced you’ll want to check out a few examples to read more about:

  • Building complex queries

  • Data-dependent time selection

Next Steps

Congratulations, you've completed this quickstart and now know how to install Kaskada, create a simple table, load data, and execute a simple query.

  • If you’re using IPython, be sure to check out our quickstart for Jupyter.

  • If you’re interested in setting up more tables and Entities, you’ll want to check out our docs for more on connecting and mapping data

  • And if you love understanding the philosophy behind a new language before you use it, you’ll want to take a quick detour to the Fenl language guide on our doc site.