pplkit.data#
pplkit.data.io#
- class pplkit.data.io.DataIO[source]#
Bases:
ABC
Bridge class that unifies the file I/O for different data types.
- fextns: tuple[str, ...] = ('',)#
The file extensions. When loading a file, it will be used to check if the file extension matches.
- dtypes: tuple[Type, ...] = (<class 'object'>,)#
The data types. When dumping the data, it will be used to check if the data type matches.
- load(fpath, **options)[source]#
Load data from given path.
- Parameters:
fpath (str | Path) – Provided file path.
options – Extra arguments for the load function.
- Raises:
ValueError – Raised when the file extension doesn’t match.
- Returns:
Data loaded from the given path.
- Return type:
Any
- dump(obj, fpath, mkdir=True, **options)[source]#
Dump data to given path.
- Parameters:
obj (Any) – Provided data object.
fpath (str | Path) – Provided file path.
mkdir (bool) – If true, it will automatically create the parent directory. The default is true.
options – Extra arguments for the dump function.
- Raises:
TypeError – Raised when the given data object type doesn’t match.
- class pplkit.data.io.CSVIO[source]#
Bases:
DataIO
- fextns: tuple[str, ...] = ('.csv',)#
The file extensions. When loading a file, it will be used to check if the file extension matches.
- dtypes: tuple[Type, ...] = (<class 'pandas.core.frame.DataFrame'>,)#
The data types. When dumping the data, it will be used to check if the data type matches.
- class pplkit.data.io.PickleIO[source]#
Bases:
DataIO
- fextns: tuple[str, ...] = ('.pkl', '.pickle')#
The file extensions. When loading a file, it will be used to check if the file extension matches.
- class pplkit.data.io.YAMLIO[source]#
Bases:
DataIO
- fextns: tuple[str, ...] = ('.yml', '.yaml')#
The file extensions. When loading a file, it will be used to check if the file extension matches.
- dtypes: tuple[Type, ...] = (<class 'dict'>, <class 'list'>)#
The data types. When dumping the data, it will be used to check if the data type matches.
- class pplkit.data.io.ParquetIO[source]#
Bases:
DataIO
- fextns: tuple[str, ...] = ('.parquet',)#
The file extensions. When loading a file, it will be used to check if the file extension matches.
- dtypes: tuple[Type, ...] = (<class 'pandas.core.frame.DataFrame'>,)#
The data types. When dumping the data, it will be used to check if the data type matches.
- class pplkit.data.io.JSONIO[source]#
Bases:
DataIO
- fextns: tuple[str, ...] = ('.json',)#
The file extensions. When loading a file, it will be used to check if the file extension matches.
- dtypes: tuple[Type, ...] = (<class 'dict'>, <class 'list'>)#
The data types. When dumping the data, it will be used to check if the data type matches.
- class pplkit.data.io.TOMLIO[source]#
Bases:
DataIO
- fextns: tuple[str, ...] = ('.toml',)#
The file extensions. When loading a file, it will be used to check if the file extension matches.
- dtypes: tuple[Type, ...] = (<class 'dict'>,)#
The data types. When dumping the data, it will be used to check if the data type matches.
- pplkit.data.io.dataio_dict: dict[str, pplkit.data.io.DataIO] = {'.csv': CSVIO(), '.json': JSONIO(), '.parquet': ParquetIO(), '.pickle': PickleIO(), '.pkl': PickleIO(), '.toml': TOMLIO(), '.yaml': YAMLIO(), '.yml': YAMLIO()}#
Instances of data ios, organized in a dictionary with key as the file extensions for each
DataIO
class.
pplkit.data.interface#
- class pplkit.data.interface.DataInterface(**dirs)[source]#
Bases:
object
Data interface that store important directories and automatically read and write data to the stored directories based on their data types.
- Parameters:
dirs (dict[str, str | pathlib.Path]) – Directories to manage with directory’s name as the name of the keyword argument’s name and directory’s path as the value of the keyword argument’s value.
- dataio_dict: dict[str, pplkit.data.io.DataIO]#
A dictionary that maps the file extensions to the corresponding data io class. This is a module-level variable from
pplkit.data.io.dataio_dict
.
- add_dir(key, value, exist_ok=False)[source]#
Add a directory to instance. If the directory already exist
- Parameters:
key (str) – Directory name.
value (str | Path) – Directory path.
exist_ok (bool) – If
exist_ok=True
andkey
already exists in the current instance it will raise an error. Otherwise it will overwrite the path corresponding to thekey
.
- Raises:
ValueError – Raised when
exist_ok=False
andkey
already exists.- Return type:
None
- remove_dir(key)[source]#
Remove a directory from the current set of directories.
- Parameters:
key (str) – Directory name
- Return type:
None
- get_fpath(*fparts, key='')[source]#
Get the file path from the name of the directory and the sub-parts under the directory.
- Parameters:
fparts (tuple[str, ...]) – Subdirectories or the file name.
key (str) – The name of the directory stored in the class.
- Return type:
Path
- load(*fparts, key='', **options)[source]#
Load data from given directory.
- Parameters:
fparts (tuple[str, ...]) – Subdirectories or the file name.
key (str) – The name of the directory stored in the class.
options (dict[str, Any]) – Extra arguments for the load function.
- Returns:
Data loaded from the given path.
- Return type:
Any
- dump(obj, *fparts, key='', mkdir=True, **options)[source]#
Dump data to the given directory.
- Parameters:
obj (Any) – Provided data object.
fparts (str) – Subdirectories or the file name.
key (str) – The name of the directory stored in the class.
mkdir (bool) – If true, it will automatically create the parent directory. The default is true.
options (dict[str, Any]) – Extra arguments for the dump function.