Missing and/or incorrect parameters initialized in run_importance_sampler are reset in preprocessing_setup. preprocessing_setup also preprocesses data based on the specified parameter configuration. From the parameter settings in Table 2 of the P-CIT Toolbox Manual, we see that the predictor variable can be z-scored and outliers can be dropped. We also can generate bootstrap data, scramble the dependent variable, scale the predictor variable between 0 and 1 (this is a mandatory step), and we can perform the analysis on one or more categories while leaving out data from irrelevant categories. Trials where the predictor variable is set to NaN are filtered out (rows removed) for purposes of z-scoring, dropping outliers and scaling. These filtered rows are appended to the data matrix following those pre-processing steps. For,

Simple data analysis (includes both think/no-think and simulated data) the order of pre-processing is:

  1. Filter out irrelevant category data entries (rows) from the data matrix
  2. Drop outliers in the predictor variable, if drop outliers > 0
  3. Z-score predictor-variable data within subjects, if zscore within subjects = TRUE
  4. Scale predictor variable between 0 and 1

Bootstrap data analysis the order of pre-processing is:

  1. Generate bootstrap data from the original data matrix (see the "Nonparametric statistical tests" section of the main paper, and Section 4.8 of the Manual).
  2. Filter out irrelevant category data entries (rows) from the data matrix
  3. Drop outliers in the predictor variable, if drop outliers > 0
  4. Z-score predictor-variable data within subjects, if zscore within subjects = TRUE
  5. Scale predictor variable between 0 and 1

Scramble data analysis the order of pre-processing is:

  1. Filter out irrelevant category data entries (rows) from the data matrix
  2. Drop outliers in the predictor variable, if drop outliers > 0
  3. Z-score predictor-variable data within subjects, if zscore within subjects = TRUE
  4. Scale predictor variable between 0 and 1
  5. Scramble the dependent variable depending on the scrambling technique (see the "Nonparametric statistical tests" section of the main paper, and Section 4.8 of the Manual).

preprocessing_setup[source]

preprocessing_setup(data, analysis_settings)

Performs sanity checks on the input data and the algorithm parameter struct. Massages the data (i.e. drop outliers, zscore data, etc).

Arguments:

  • data: Input data matrix (total number of trials x 6 columns)
  • analysis_settings: Struct with algorithm parameters

Returns:

  • data: Input data matrix (if applicable, outlier free, zscored, category specific data only, etc)
  • analysis_settings: Struct with algorithm parameters; some additional parameters are added to this struct as well

The function can be run in isolation from the toolbox pipeline very concisely. For example:

from pcitpy.run_importance_sampler import run_importance_sampler


preprocessing_setup(run_importance_sampler(run_sampler=False))

However, this isn't a typical use case for the function. The importance_sampler module calls the function directly at start of execution, making manual execution of preprocessing_setup redundant.

scramble_dependent_variable[source]

scramble_dependent_variable(target_dependent_variables, target_net_effect_clusters, testing=False)

Takes dependent variable vector and scramble it such that the net effect cluster groupings are NOT violated.

Arguments:

  • target_dependent_variables: The vector you would like scrambled
  • target_net_effect_clusters: The groupings that you would like to NOT violate. Follow the example below

Returns a scrambled vector

While preprocessing_setup is hard to demonstrate in isolation, its helper function scramble_dependent_variable is straightforward to illustrate:

scramble_dependent_variable([1, 0, 1, 0, 0, 0, 1], [3, 5, 3, 7, 7, 5, 8])

array([1., 1., 1., 0., 0., 1., 0.])