pplkit.data#

pplkit.data.io#

class pplkit.data.io.DataIO[source]#

Bases: ABC

Bridge class that unifies the file I/O for different data types.

fextns: tuple[str, ...] = ('',)#

The file extensions. When loading a file, it will be used to check if the file extension matches.

dtypes: tuple[Type, ...] = (<class 'object'>,)#

The data types. When dumping the data, it will be used to check if the data type matches.

load(fpath, **options)[source]#

Load data from given path.

Parameters:
  • fpath (str | Path) – Provided file path.

  • options – Extra arguments for the load function.

Raises:

ValueError – Raised when the file extension doesn’t match.

Returns:

Data loaded from the given path.

Return type:

Any

dump(obj, fpath, mkdir=True, **options)[source]#

Dump data to given path.

Parameters:
  • obj (Any) – Provided data object.

  • fpath (str | Path) – Provided file path.

  • mkdir (bool) – If true, it will automatically create the parent directory. The default is true.

  • options – Extra arguments for the dump function.

Raises:

TypeError – Raised when the given data object type doesn’t match.

class pplkit.data.io.CSVIO[source]#

Bases: DataIO

fextns: tuple[str, ...] = ('.csv',)#

The file extensions. When loading a file, it will be used to check if the file extension matches.

dtypes: tuple[Type, ...] = (<class 'pandas.core.frame.DataFrame'>,)#

The data types. When dumping the data, it will be used to check if the data type matches.

class pplkit.data.io.PickleIO[source]#

Bases: DataIO

fextns: tuple[str, ...] = ('.pkl', '.pickle')#

The file extensions. When loading a file, it will be used to check if the file extension matches.

class pplkit.data.io.YAMLIO[source]#

Bases: DataIO

fextns: tuple[str, ...] = ('.yml', '.yaml')#

The file extensions. When loading a file, it will be used to check if the file extension matches.

dtypes: tuple[Type, ...] = (<class 'dict'>, <class 'list'>)#

The data types. When dumping the data, it will be used to check if the data type matches.

class pplkit.data.io.ParquetIO[source]#

Bases: DataIO

fextns: tuple[str, ...] = ('.parquet',)#

The file extensions. When loading a file, it will be used to check if the file extension matches.

dtypes: tuple[Type, ...] = (<class 'pandas.core.frame.DataFrame'>,)#

The data types. When dumping the data, it will be used to check if the data type matches.

class pplkit.data.io.JSONIO[source]#

Bases: DataIO

fextns: tuple[str, ...] = ('.json',)#

The file extensions. When loading a file, it will be used to check if the file extension matches.

dtypes: tuple[Type, ...] = (<class 'dict'>, <class 'list'>)#

The data types. When dumping the data, it will be used to check if the data type matches.

class pplkit.data.io.TOMLIO[source]#

Bases: DataIO

fextns: tuple[str, ...] = ('.toml',)#

The file extensions. When loading a file, it will be used to check if the file extension matches.

dtypes: tuple[Type, ...] = (<class 'dict'>,)#

The data types. When dumping the data, it will be used to check if the data type matches.

pplkit.data.io.dataio_dict: dict[str, pplkit.data.io.DataIO] = {'.csv': CSVIO(), '.json': JSONIO(), '.parquet': ParquetIO(), '.pickle': PickleIO(), '.pkl': PickleIO(), '.toml': TOMLIO(), '.yaml': YAMLIO(), '.yml': YAMLIO()}#

Instances of data ios, organized in a dictionary with key as the file extensions for each DataIO class.

pplkit.data.interface#

class pplkit.data.interface.DataInterface(**dirs)[source]#

Bases: object

Data interface that store important directories and automatically read and write data to the stored directories based on their data types.

Parameters:

dirs (dict[str, str | pathlib.Path]) – Directories to manage with directory’s name as the name of the keyword argument’s name and directory’s path as the value of the keyword argument’s value.

dataio_dict: dict[str, pplkit.data.io.DataIO]#

A dictionary that maps the file extensions to the corresponding data io class. This is a module-level variable from pplkit.data.io.dataio_dict.

add_dir(key, value, exist_ok=False)[source]#

Add a directory to instance. If the directory already exist

Parameters:
  • key (str) – Directory name.

  • value (str | Path) – Directory path.

  • exist_ok (bool) – If exist_ok=True and key already exists in the current instance it will raise an error. Otherwise it will overwrite the path corresponding to the key.

Raises:

ValueError – Raised when exist_ok=False and key already exists.

Return type:

None

remove_dir(key)[source]#

Remove a directory from the current set of directories.

Parameters:

key (str) – Directory name

Return type:

None

get_fpath(*fparts, key='')[source]#

Get the file path from the name of the directory and the sub-parts under the directory.

Parameters:
  • fparts (tuple[str, ...]) – Subdirectories or the file name.

  • key (str) – The name of the directory stored in the class.

Return type:

Path

load(*fparts, key='', **options)[source]#

Load data from given directory.

Parameters:
  • fparts (tuple[str, ...]) – Subdirectories or the file name.

  • key (str) – The name of the directory stored in the class.

  • options (dict[str, Any]) – Extra arguments for the load function.

Returns:

Data loaded from the given path.

Return type:

Any

dump(obj, *fparts, key='', mkdir=True, **options)[source]#

Dump data to the given directory.

Parameters:
  • obj (Any) – Provided data object.

  • fparts (str) – Subdirectories or the file name.

  • key (str) – The name of the directory stored in the class.

  • mkdir (bool) – If true, it will automatically create the parent directory. The default is true.

  • options (dict[str, Any]) – Extra arguments for the dump function.