Quickstart

All data that Kaskada uses is described by tables, regardless of where the actual data is stored. Tables consist of multiple rows, and each row is a value of the same type. Using a typical CRUD interface you can create tables and point them to data stored in a variety of locations.

In this process, you’ll need to describe two fundamental attributes of every event: the time at which it occurred and the default entity it relates to.

Creating a Table

When creating a table, you must provide some information about how each row should be interpreted. You must describe:

  • The time associated with each row. The time should refer to when the event occurred.

  • An initial entity associated with each row. The entity should identify a thing in the world that each event is associated with. Don't worry too much about picking the "right" value here - it's easy to change the entity later.

kli table create \
  --table-name Purchase \
  --time-column-name purchase_time \
  --group-column-name customer_id

Now that we've created a table, we're ready to load some data into it.

What is an Entity?

Entities are how Kaskada organizes data for use in feature engineering. They describe the particular objects that are being represented in the system.

Entities represent objects or "nouns" related to individual events. Common examples of entities are "Users" or "Vendors".

If something can be given a name or other unique identifier, it can probably be expressed as an entity. In a relational database, an entity would be anything that is identified by the same key in a set of tables.

What is an Entity Key?

While Entities represent a category of a type of thing, an "Entity Key" is the field of a table that contains the Entity instances. An Entity instance represents a specific item in the field. Below is a table with some example Entities and specific Entity instances.

Example Entity and specific Entity Key instances

To demonstrate how entities affect Fenl expressions, we'll start with a simplified dataset consisting of two tables. The Purchase table describes purchase transactions.

Purchase :: record<customer_id: string, time: datetime,
   product_id: string, amount: number>

Table describing purchase transactions

The ProductReview table describes customer's ratings of products they've purchased

ProductReview :: record<customer_id: string, time: datetime,
   product_id: string, stars: number>

Table describing customer's ratings of products they've purchased

Loading a File

Most Kaskada API clients provide convenient helpers for loading a file from your local disk into a table. The file is transferred to Kaskada and added to the table.

kli upload --table Purchase /path/to/a/file/to/load.parquet

Inspecting the Table's Contents

To verify the file was loaded as expected you can use the table list endpoint to see all the tables defined for your organization and the files loaded into each:

kli table list

The above command returns something similar to:

{
  "tables": [
    {
      "tableId": "31112aca11d0e9e6eb7db96f317dda49",
      "tableName": "Purchase",
      "timeColumnName": "purchase_time",
      "groupColumnName": "customer_id",
      "fileNames": [
        "file.parquet"
      ]
  ]
}

Next Steps:

  • Check out our docs for a version of this quickstart with copyable code blocks

  • Check out our examples for specific bite-sized problems and solutions

  • Check out Kaskada in action on industry-specific solutions and try it yourself!