Missing and/or incorrect parameters initialized in run_importance_sampler
are reset in preprocessing_setup
.
preprocessing_setup
also preprocesses data based on the specified parameter configuration. From the parameter
settings in Table 2 of the P-CIT Toolbox Manual, we see that the predictor variable can be z-scored and outliers can
be dropped. We also can generate bootstrap data, scramble the dependent variable, scale the predictor variable between
0 and 1 (this is a mandatory step), and we can perform the analysis on one or more categories while leaving out data
from irrelevant categories. Trials where the predictor variable is set to NaN are filtered out (rows removed) for
purposes of z-scoring, dropping outliers and scaling. These filtered rows are appended to the data matrix following
those pre-processing steps. For,
Simple data analysis (includes both think/no-think and simulated data) the order of pre-processing is:
- Filter out irrelevant category data entries (rows) from the data matrix
- Drop outliers in the predictor variable, if drop outliers > 0
- Z-score predictor-variable data within subjects, if zscore within subjects = TRUE
- Scale predictor variable between 0 and 1
Bootstrap data analysis the order of pre-processing is:
- Generate bootstrap data from the original data matrix (see the "Nonparametric statistical tests" section of the main paper, and Section 4.8 of the Manual).
- Filter out irrelevant category data entries (rows) from the data matrix
- Drop outliers in the predictor variable, if drop outliers > 0
- Z-score predictor-variable data within subjects, if zscore within subjects = TRUE
- Scale predictor variable between 0 and 1
Scramble data analysis the order of pre-processing is:
- Filter out irrelevant category data entries (rows) from the data matrix
- Drop outliers in the predictor variable, if drop outliers > 0
- Z-score predictor-variable data within subjects, if zscore within subjects = TRUE
- Scale predictor variable between 0 and 1
- Scramble the dependent variable depending on the scrambling technique (see the "Nonparametric statistical tests" section of the main paper, and Section 4.8 of the Manual).
The function can be run in isolation from the toolbox pipeline very concisely. For example:
from pcitpy.run_importance_sampler import run_importance_sampler
preprocessing_setup(run_importance_sampler(run_sampler=False))
However, this isn't a typical use case for the function. The importance_sampler
module calls the function directly
at start of execution, making manual execution of preprocessing_setup
redundant.
While preprocessing_setup
is hard to demonstrate in isolation, its helper function scramble_dependent_variable
is
straightforward to illustrate:
scramble_dependent_variable([1, 0, 1, 0, 0, 0, 1], [3, 5, 3, 7, 7, 5, 8])
array([1., 1., 1., 0., 0., 1., 0.])