pplkit.data#

pplkit.data.io#

class pplkit.data.io.DataIO[source]#

Bases: ABC

Bridge class that unifies the file I/O for different data types.

fextns: tuple[str, ...] = ('',)#: The file extensions. When loading a file, it will be used to check if the file extension matches.

dtypes: tuple[Type, ...] = (<class 'object'>,)#: The data types. When dumping the data, it will be used to check if the data type matches.

load(fpath, **options)[source]#

Load data from given path.

Parameters:

fpath (str | Path) – Provided file path.
options – Extra arguments for the load function.

Raises:

ValueError – Raised when the file extension doesn’t match.

Returns:

Data loaded from the given path.

Return type:

Any

dump(obj, fpath, mkdir=True, **options)[source]#

Dump data to given path.

Parameters:

obj (Any) – Provided data object.
fpath (str | Path) – Provided file path.
mkdir (bool) – If true, it will automatically create the parent directory. The default is true.
options – Extra arguments for the dump function.

Raises:

TypeError – Raised when the given data object type doesn’t match.

class pplkit.data.io.CSVIO[source]#

Bases: DataIO

fextns: tuple[str, ...] = ('.csv',)#: The file extensions. When loading a file, it will be used to check if the file extension matches.

dtypes: tuple[Type, ...] = (<class 'pandas.core.frame.DataFrame'>,)#: The data types. When dumping the data, it will be used to check if the data type matches.

class pplkit.data.io.PickleIO[source]#

Bases: DataIO

fextns: tuple[str, ...] = ('.pkl', '.pickle')#: The file extensions. When loading a file, it will be used to check if the file extension matches.

class pplkit.data.io.YAMLIO[source]#

Bases: DataIO

fextns: tuple[str, ...] = ('.yml', '.yaml')#: The file extensions. When loading a file, it will be used to check if the file extension matches.

dtypes: tuple[Type, ...] = (<class 'dict'>, <class 'list'>)#: The data types. When dumping the data, it will be used to check if the data type matches.

class pplkit.data.io.ParquetIO[source]#

Bases: DataIO

fextns: tuple[str, ...] = ('.parquet',)#: The file extensions. When loading a file, it will be used to check if the file extension matches.

dtypes: tuple[Type, ...] = (<class 'pandas.core.frame.DataFrame'>,)#: The data types. When dumping the data, it will be used to check if the data type matches.

class pplkit.data.io.JSONIO[source]#

Bases: DataIO

fextns: tuple[str, ...] = ('.json',)#: The file extensions. When loading a file, it will be used to check if the file extension matches.

dtypes: tuple[Type, ...] = (<class 'dict'>, <class 'list'>)#: The data types. When dumping the data, it will be used to check if the data type matches.

class pplkit.data.io.TOMLIO[source]#

Bases: DataIO

fextns: tuple[str, ...] = ('.toml',)#: The file extensions. When loading a file, it will be used to check if the file extension matches.

dtypes: tuple[Type, ...] = (<class 'dict'>,)#: The data types. When dumping the data, it will be used to check if the data type matches.

pplkit.data.io.dataio_dict: dict[str, pplkit.data.io.DataIO] = {'.csv': CSVIO(), '.json': JSONIO(), '.parquet': ParquetIO(), '.pickle': PickleIO(), '.pkl': PickleIO(), '.toml': TOMLIO(), '.yaml': YAMLIO(), '.yml': YAMLIO()}#: Instances of data ios, organized in a dictionary with key as the file extensions for each DataIO class.

pplkit.data.interface#

class pplkit.data.interface.DataInterface(**dirs)[source]#

Bases: object

Data interface that store important directories and automatically read and write data to the stored directories based on their data types.

Parameters:: dirs (dict[str, str | pathlib.Path]) – Directories to manage with directory’s name as the name of the keyword argument’s name and directory’s path as the value of the keyword argument’s value.

dataio_dict: dict[str, pplkit.data.io.DataIO]#

A dictionary that maps the file extensions to the corresponding data io class. This is a module-level variable from pplkit.data.io.dataio_dict.

add_dir(key, value, exist_ok=False)[source]#

Add a directory to instance. If the directory already exist

Parameters:

key (str) – Directory name.
value (str | Path) – Directory path.
exist_ok (bool) – If exist_ok=True and key already exists in the current instance it will raise an error. Otherwise it will overwrite the path corresponding to the key.

Raises:

ValueError – Raised when exist_ok=False and key already exists.

Return type:

None

remove_dir(key)[source]#

Remove a directory from the current set of directories.

Parameters:: key (str) – Directory name
Return type:: None

get_fpath(*fparts, key='')[source]#

Get the file path from the name of the directory and the sub-parts under the directory.

Parameters:

fparts (tuple[str, ...]) – Subdirectories or the file name.
key (str) – The name of the directory stored in the class.

Return type:

Path

load(*fparts, key='', **options)[source]#

Load data from given directory.

Parameters:

fparts (tuple[str, ...]) – Subdirectories or the file name.
key (str) – The name of the directory stored in the class.
options (dict[str, Any]) – Extra arguments for the load function.

Returns:

Data loaded from the given path.

Return type:

Any

dump(obj, *fparts, key='', mkdir=True, **options)[source]#

Dump data to the given directory.

Parameters:

obj (Any) – Provided data object.
fparts (str) – Subdirectories or the file name.
key (str) – The name of the directory stored in the class.
mkdir (bool) – If true, it will automatically create the parent directory. The default is true.
options (dict[str, Any]) – Extra arguments for the dump function.