
Time travel with Kaskada
With Kaskada, you can choose to calculate all feature values at any point in time. Or you can calculate the feature values for each entity at the time an event occurred. For instance, calculate all features at the exact time a user made a large purchase, when a customer churned out 30 days after their planned subscription date, or at the time of a fraudulent transaction.
Use these point-in-time and event-driven feature values to train models without risk of leakage. When you're ready, you can compute the same feature values with a time of "now" to make new predictions using a live model in production.
Time travel for feature engineering without leakage requires:
Historical feature value generation
Compute directly from event-based data to try new features
Quickly try ideas on historical data by computing the prediction and label times for each training example directly from event times and fields. True time travel allows for each training example time to be different relative to each other and based on predicates.Learn more
Expressive time selection
Specify your model context iteratively
Iteration enables exploration and discovery. True time travel allows for specifying feature definitions and time selection independently during the feature engineering and selection process.Learn more about iterative time selection in this retention example
Ordered processing to prevent leakage
Flexible time selection requires support for temporal joins
Feature values need to be calculated and joined in order. Joining with the contents of a relational database reflects the current values, not the values at the relevant times in the past. True time travel enables joining values between different entities, at precise times—without leakage.Learn more in this case study
Eliminating data discrepancies in production
Shared feature definitions to power live models
Training a model on examples computed directly from event based data requires the same data shape available in production. Writing different code for production risks model degradation. True time travel for machine learning allows for feature definitions to be shared.Learn how in this quickstart guide