Model criticism for growth curve models via posterior predictive model checking

154063-Thumbnail Image.png
Description
Although models for describing longitudinal data have become increasingly sophisticated, the criticism of even foundational growth curve models remains challenging. The challenge arises from the need to disentangle data-model misfit at multiple and interrelated levels of analysis. Using posterior predictive

Although models for describing longitudinal data have become increasingly sophisticated, the criticism of even foundational growth curve models remains challenging. The challenge arises from the need to disentangle data-model misfit at multiple and interrelated levels of analysis. Using posterior predictive model checking (PPMC)—a popular Bayesian framework for model criticism—the performance of several discrepancy functions was investigated in a Monte Carlo simulation study. The discrepancy functions of interest included two types of conditional concordance correlation (CCC) functions, two types of R2 functions, two types of standardized generalized dimensionality discrepancy (SGDDM) functions, the likelihood ratio (LR), and the likelihood ratio difference test (LRT). Key outcomes included effect sizes of the design factors on the realized values of discrepancy functions, distributions of posterior predictive p-values (PPP-values), and the proportion of extreme PPP-values.

In terms of the realized values, the behavior of the CCC and R2 functions were generally consistent with prior research. However, as diagnostics, these functions were extremely conservative even when some aspect of the data was unaccounted for. In contrast, the conditional SGDDM (SGDDMC), LR, and LRT were generally sensitive to the underspecifications investigated in this work on all outcomes considered. Although the proportions of extreme PPP-values for these functions tended to increase in null situations for non-normal data, this behavior may have reflected the true misfit that resulted from the specification of normal prior distributions. Importantly, the LR and the SGDDMC to a greater extent exhibited some potential for untangling the sources of data-model misfit. Owing to connections of growth curve models to the more fundamental frameworks of multilevel modeling, structural equation models with a mean structure, and Bayesian hierarchical models, the results of the current work may have broader implications that warrant further research.
Date Created
2015
Agent

Sample size and test length minima for DIMTEST with conditional covariance-based subtest selection

150934-Thumbnail Image.png
Description
The existing minima for sample size and test length recommendations for DIMTEST (750 examinees and 25 items) are tied to features of the procedure that are no longer in use. The current version of DIMTEST uses a bootstrapping procedure to

The existing minima for sample size and test length recommendations for DIMTEST (750 examinees and 25 items) are tied to features of the procedure that are no longer in use. The current version of DIMTEST uses a bootstrapping procedure to remove bias from the test statistic and is packaged with a conditional covariance-based procedure called ATFIND for partitioning test items. Key factors such as sample size, test length, test structure, the correlation between dimensions, and strength of dependence were manipulated in a Monte Carlo study to assess the effectiveness of the current version of DIMTEST with fewer examinees and items. In addition, the DETECT program was also used to partition test items; a second feature of this study also compared the structure of test partitions obtained with ATFIND and DETECT in a number of ways. With some exceptions, the performance of DIMTEST was quite conservative in unidimensional conditions. The performance of DIMTEST in multidimensional conditions depended on each of the manipulated factors, and did suggest that the minima of sample size and test length can be made lower for some conditions. In terms of partitioning test items in unidimensional conditions, DETECT tended to produce longer assessment subtests than ATFIND in turn yielding different test partitions. In multidimensional conditions, test partitions became more similar and were more accurate with increased sample size, for factorially simple data, greater strength of dependence, and a decreased correlation between dimensions. Recommendations for sample size and test length minima are provided along with suggestions for future research.
Date Created
2012
Agent