Quick Start

This shows the bare minimum needed to get started with Kaskada.

Install

Install the latest version. This uses kaskada>=0.6.0-a.3 to ensure the pre-release version is installed.

pip install kaskada>=0.6.0-a.3

See the section on installation to learn more about installing Kaskada.

Write a query

The following Python code imports the Kaskada library, creates a session, and loads some CSV data. It then runs a query to produce a Pandas DataFrame.

import asyncio
import kaskada as kd
kd.init_session()
content = "\n".join(
    [
        "time,key,m,n",
        "1996-12-19T16:39:57,A,5,10",
        "1996-12-19T16:39:58,B,24,3",
        "1996-12-19T16:39:59,A,17,6",
        "1996-12-19T16:40:00,A,,9",
        "1996-12-19T16:40:01,A,12,",
        "1996-12-19T16:40:02,A,,",
    ]
)
source = await kd.sources.CsvString.create(content, time_column="time", key_column="key")
source.select("m", "n").extend({"sum_m": source.col("m").sum()}).to_pandas()
_time _key sum_m m n
0 1996-12-19 16:39:57 A 5 5.0 10.0
1 1996-12-19 16:39:58 B 24 24.0 3.0
2 1996-12-19 16:39:59 A 22 17.0 6.0
3 1996-12-19 16:40:00 A 22 NaN 9.0
4 1996-12-19 16:40:01 A 34 12.0 NaN
5 1996-12-19 16:40:02 A 34 NaN NaN