PythonBasicTools

pythonbasictools.experiment_utils package

Submodules

pythonbasictools.experiment_utils.metadata_file module

class pythonbasictools.experiment_utils.metadata_file.MetadataFile(output_dir: str | Path, filename: str = 'METADATA', data: dict | None = None, save_every_set: bool = True, **kwargs)

Bases: RunOutputFile

DEFAULT_FILENAME: str = 'METADATA'
EXT: str = '.meta.json'
__init__(output_dir: str | Path, filename: str = 'METADATA', data: dict | None = None, save_every_set: bool = True, **kwargs)
property env

pythonbasictools.experiment_utils.output_folder module

class pythonbasictools.experiment_utils.output_folder.ExperimentState(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

FAILED = 'FAILED'
FINISHED = 'FINISHED'
RUNNING = 'RUNNING'
UNKNOWN = 'UNKNOWN'
WAITING = 'WAITING'
class pythonbasictools.experiment_utils.output_folder.ExperimentStateFile(folder: str | Path, state: ExperimentState = ExperimentState.UNKNOWN)

Bases: object

FILENAME = 'state'
__init__(folder: str | Path, state: ExperimentState = ExperimentState.UNKNOWN)
append(new_text: str)
property path: Path
read() str
property state: ExperimentState
write(new_text: str)
class pythonbasictools.experiment_utils.output_folder.OutputFolder(path: str | Path, metadata_file: MetadataFile | None = None, data_file: RunOutputFile | None = None)

Bases: object

__init__(path: str | Path, metadata_file: MetadataFile | None = None, data_file: RunOutputFile | None = None)
property data_file: RunOutputFile
gather_files(pattern: str = '*') List[Path]

Gather files in the output folder matching the given pattern.

Parameters:

pattern – The glob pattern to match files.

Returns:

A list of file paths matching the pattern.

classmethod gather_output_folders(root_folder: str | Path, metadata_ext: str = '.meta.json', data_ext: str = '.out.json') List[OutputFolder]
property metadata_file: MetadataFile
property path: Path
classmethod root_folder_to_dataframe(root_folder: str | Path, metadata_ext: str = '.meta.json', data_ext: str = '.out.json') Tuple[DataFrame, DataFrame]

Gather all the metadata files and data files and build two dataframes, one for metadata and a second for the data. Each row in the dataframe corresponds to the data of one file. The dataframes can be merge together using the _output_folder column.

Example:

>>> metadata_df, data_df = OutputFolder.root_folder_to_dataframe("./data/root")
>>> full_df = pd.merge(metadata_df, data_df, on="_output_folder")
property state: ExperimentState
property state_file: ExperimentStateFile
update_data(other: dict | None = None, **kwargs)
update_metadata(other: dict | None = None, **kwargs)

Updates the metadata of the instance with information from the provided dictionary and any additional keyword arguments. If a dictionary is not provided, an empty dictionary is used. The method updates the instance’s metadata with both the values from the given dictionary and the keyword arguments.

Parameters:
  • other (Optional[dict]) – Optional initial dictionary of metadata to update. Defaults to None.

  • kwargs – Additional metadata key-value pairs to update.

Returns:

None

pythonbasictools.experiment_utils.run_output_file module

class pythonbasictools.experiment_utils.run_output_file.RunOutputFile(output_dir: str | Path, filename: str = 'run_output', data: dict | None = None, save_every_set: bool = True, **kwargs)

Bases: object

This object is used to save and load data of a script run to a JSON file. The data is saved as a dictionary and can be accessed as an attribute of the object. The data is saved to a JSON file in the output directory of the script. It is useful when you want to store some data or state of the script run to a file and load it later.

Example

```python from run_output_file import RunOutputFile

output = RunOutputFile(“output_dir”, save_every_set=True) output_file.update({“status”: “STARTING”}) print(“Doing some work…”) work = 1 + 1 output_file.update({“status”: “WORKING”, “work”: work}) print(“Doing some other stuff…”) stuff = 2 + 2 output_file.update({“status”: “WORKING”, “stuff”: stuff}) print(“Done working.”) output_file.update({“status”: “DONE”}) ```

Parameters:
  • output_dir (Union[str, Path]) – The directory where the output file will be saved.

  • filename (str) – The name of the output file. Default is “run_output”.

  • data (dict) – The initial data to be saved to the output file.

  • save_every_set (bool) – If True, the data will be saved to the output file every time an item is added or updated.

  • kwargs – Additional keyword arguments

DEFAULT_FILENAME: str = 'run_output'
EXT: str = '.out.json'
RAVEL_DICT_KEY_SEP = '.'
__init__(output_dir: str | Path, filename: str = 'run_output', data: dict | None = None, save_every_set: bool = True, **kwargs)
as_dataframe(add_path: bool = True) DataFrame
as_series(add_path: bool = True) Series
property exists: bool
classmethod from_file(path: str | Path, **kwargs)
get(key, default=None)
get_raveled_state(key_sep: str | None = None)
legacy_load()
load()
load_if_exists()
log(msg: str, level=20, print_msg: bool = True, **kwargs)
classmethod parse_results_from_dir_to_dataframe(root_dir: str | Path, requires_columns: List[str] | None = None, file_column: str = '_file', mp_kwargs: Dict[str, Any] | None = None) DataFrame

Parse the results from a root directory. If requires_columns is not None, the rows that do not contain all the required columns will be removed.

Parameters:
  • root_dir (str) – The directory containing the results.

  • requires_columns (Optional[List[str]]) – The columns that are required in the DataFrame. If None, no columns are required. If a row does not contain all the required columns, it will be removed from the final DataFrame.

  • file_column (str) – The name of the column that contains the file path.

  • mp_kwargs (Optional[Dict[str, Any]]) – The keyword arguments to pass to the multiprocessing function.

Returns:

The DataFrame containing the results.

property path: Path
print_logs(level=20, sep: str = '\n')
property raveled_state
classmethod raveled_state_from_file(path: str | Path, raise_on_error: bool = False, **kwargs) dict

Get the raveled state of the data in the file.

Parameters:
  • path (str) – The path to the file.

  • raise_on_error (bool) – If True, raise an error if an error occurs while loading the file. If False, return an empty dictionary if an error occurs.

  • kwargs – Additional keyword arguments.

Returns:

The raveled state of the data in the file.

Return type:

dict

save()
save_if_save_every_set()
update(other: Dict[str, Any], print_updated: bool = True, print_header: str = 'New Data')

Module contents