Missing and/or incorrect parameters initialized in run_importance_sampler are reset in preprocessing_setup. preprocessing_setup also preprocesses data based on the specified parameter configuration. From the parameter settings in Table 2 of the P-CIT Toolbox Manual, we see that the predictor variable can be z-scored and outliers can be dropped. We also can generate bootstrap data, scramble the dependent variable, scale the predictor variable between 0 and 1 (this is a mandatory step), and we can perform the analysis on one or more categories while leaving out data from irrelevant categories. Trials where the predictor variable is set to NaN are filtered out (rows removed) for purposes of z-scoring, dropping outliers and scaling. These filtered rows are appended to the data matrix following those pre-processing steps. For,

Simple data analysis (includes both think/no-think and simulated data) the order of pre-processing is:

Filter out irrelevant category data entries (rows) from the data matrix
Drop outliers in the predictor variable, if drop outliers > 0
Z-score predictor-variable data within subjects, if zscore within subjects = TRUE
Scale predictor variable between 0 and 1

Bootstrap data analysis the order of pre-processing is:

Generate bootstrap data from the original data matrix (see the "Nonparametric statistical tests" section of the main paper, and Section 4.8 of the Manual).
Filter out irrelevant category data entries (rows) from the data matrix
Drop outliers in the predictor variable, if drop outliers > 0
Z-score predictor-variable data within subjects, if zscore within subjects = TRUE
Scale predictor variable between 0 and 1

Scramble data analysis the order of pre-processing is:

Filter out irrelevant category data entries (rows) from the data matrix
Drop outliers in the predictor variable, if drop outliers > 0
Z-score predictor-variable data within subjects, if zscore within subjects = TRUE
Scale predictor variable between 0 and 1
Scramble the dependent variable depending on the scrambling technique (see the "Nonparametric statistical tests" section of the main paper, and Section 4.8 of the Manual).

The function can be run in isolation from the toolbox pipeline very concisely. For example:

from pcitpy.run_importance_sampler import run_importance_sampler


preprocessing_setup(run_importance_sampler(run_sampler=False))

However, this isn't a typical use case for the function. The importance_sampler module calls the function directly at start of execution, making manual execution of preprocessing_setup redundant.

While preprocessing_setup is hard to demonstrate in isolation, its helper function scramble_dependent_variable is straightforward to illustrate:

scramble_dependent_variable([1, 0, 1, 0, 0, 0, 1], [3, 5, 3, 7, 7, 5, 8])

array([1., 1., 1., 0., 0., 1., 0.])

Data Pre-Processing

`preprocessing_setup`[source]

`scramble_dependent_variable`[source]

Data Pre-Processing

preprocessing_setup[source]

scramble_dependent_variable[source]

`preprocessing_setup`[source]

`scramble_dependent_variable`[source]