Parquet

class sources.Parquet

Source reading data from Parquet files.

sources.Parquet.add_file(file)

Add data to the source.

sources.Parquet.create(file=None, *, time_column, key_column, schema=None, subsort_column=None, grouping_name=None, time_unit=None)

Create a Parquet source.

Parameters:
  • file (Optional[str], default: None)

    The url or path of the Parquet file to add. Paths should be relative to the

    current working directory or absolute. URLs may describe local file paths or

    object-store locations.

  • time_column (str)

    The name of the column containing the time.

  • key_column (str)

    The name of the column containing the key.

  • schema (Optional[Schema], default: None)

    The schema to use. If not provided, it will be inferred from the input.

  • subsort_column (Optional[str], default: None)

    The name of the column containing the subsort.

    If not provided, the subsort will be assigned by the system.

  • grouping_name (Optional[str], default: None)

    The name of the group associated with each key.

    This is used to ensure implicit joins are only performed between data grouped

    by the same entity.

  • time_unit (Optional[TimeUnit], default: None)

    The unit of the time column. One of ns, us, ms, or s.

    If not specified (and not specified in the data), nanosecond will be assumed.