Optimal Experimental Designs for Mixed Categorical and Continuous Responses

155868-Thumbnail Image.png
Description
This study concerns optimal designs for experiments where responses consist of both binary and continuous variables. Many experiments in engineering, medical studies, and other fields have such mixed responses. Although in recent decades several statistical methods have been developed for

This study concerns optimal designs for experiments where responses consist of both binary and continuous variables. Many experiments in engineering, medical studies, and other fields have such mixed responses. Although in recent decades several statistical methods have been developed for jointly modeling both types of response variables, an effective way to design such experiments remains unclear. To address this void, some useful results are developed to guide the selection of optimal experimental designs in such studies. The results are mainly built upon a powerful tool called the complete class approach and a nonlinear optimization algorithm. The complete class approach was originally developed for a univariate response, but it is extended to the case of bivariate responses of mixed variable types. Consequently, the number of candidate designs are significantly reduced. An optimization algorithm is then applied to efficiently search the small class of candidate designs for the D- and A-optimal designs. Furthermore, the optimality of the obtained designs is verified by the general equivalence theorem. In the first part of the study, the focus is on a simple, first-order model. The study is expanded to a model with a quadratic polynomial predictor. The obtained designs can help to render a precise statistical inference in practice or serve as a benchmark for evaluating the quality of other designs.
Date Created
2017
Agent

Optimum Experimental Design Issues in Functional Neuroimaging Studies

155789-Thumbnail Image.png
Description
Functional magnetic resonance imaging (fMRI) is one of the popular tools to study human brain functions. High-quality experimental designs are crucial to the success of fMRI experiments as they allow the collection of informative data for making precise and valid

Functional magnetic resonance imaging (fMRI) is one of the popular tools to study human brain functions. High-quality experimental designs are crucial to the success of fMRI experiments as they allow the collection of informative data for making precise and valid inference with minimum cost. The primary goal of this study is on identifying the best sequence of mental stimuli (i.e. fMRI design) with respect to some statistically meaningful optimality criteria. This work focuses on two related topics in this research field. The first topic is on finding optimal designs for fMRI when the design matrix is uncertain. This challenging design issue occurs in many modern fMRI experiments, in which the design matrix of the statistical model depends on both the selected design and the experimental subject's uncertain behavior during the experiment. As a result, the design matrix cannot be fully determined at the design stage that makes it difficult to select a good design. For the commonly used linear model with autoregressive errors, this study proposes a very efficient approach for obtaining high-quality fMRI designs for such experiments. The proposed approach is built upon an analytical result, and an efficient computer algorithm. It is shown through case studies that our proposed approach can outperform the existing method in terms of computing time, and the quality of the obtained designs. The second topic of the research is to find optimal designs for fMRI when a wavelet-based technique is considered in the fMRI data analysis. An efficient computer algorithm to search for optimal fMRI designs for such cases is developed. This algorithm is inspired by simulated annealing and a recently proposed algorithm by Saleh et al. (2017). As demonstrated in the case studies, the proposed approach makes it possible to efficiently obtain high-quality designs for fMRI studies, and is practically useful.
Date Created
2017
Agent

fMRI design under autoregressive model with one type of stimulus

155642-Thumbnail Image.png
Description
Functional magnetic resonance imaging (fMRI) is used to study brain activity due

to stimuli presented to subjects in a scanner. It is important to conduct statistical

inference on such time series fMRI data obtained. It is also important to select optimal designs

Functional magnetic resonance imaging (fMRI) is used to study brain activity due

to stimuli presented to subjects in a scanner. It is important to conduct statistical

inference on such time series fMRI data obtained. It is also important to select optimal designs for practical experiments. Design selection under autoregressive models

have not been thoroughly discussed before. This paper derives general information

matrices for orthogonal designs under autoregressive model with an arbitrary number

of correlation coefficients. We further provide the minimum trace of orthogonal circulant designs under AR(1) model, which is used as a criterion to compare practical

designs such as M-sequence designs and circulant (almost) orthogonal array designs.

We also explore optimal designs under AR(2) model. In practice, types of stimuli can

be more than one, but in this paper we only consider the simplest situation with only

one type of stimuli.
Date Created
2017
Agent

An information based optimal subdata selection algorithm for big data linear regression and a suitable variable selection algorithm

155598-Thumbnail Image.png
Description
This article proposes a new information-based subdata selection (IBOSS) algorithm, Squared Scaled Distance Algorithm (SSDA). It is based on the invariance of the determinant of the information matrix under orthogonal transformations, especially rotations. Extensive simulation results show that the new

This article proposes a new information-based subdata selection (IBOSS) algorithm, Squared Scaled Distance Algorithm (SSDA). It is based on the invariance of the determinant of the information matrix under orthogonal transformations, especially rotations. Extensive simulation results show that the new IBOSS algorithm retains nice asymptotic properties of IBOSS and gives a larger determinant of the subdata information matrix. It has the same order of time complexity as the D-optimal IBOSS algorithm. However, it exploits the advantages of vectorized calculation avoiding for loops and is approximately 6 times as fast as the D-optimal IBOSS algorithm in R. The robustness of SSDA is studied from three aspects: nonorthogonality, including interaction terms and variable misspecification. A new accurate variable selection algorithm is proposed to help the implementation of IBOSS algorithms when a large number of variables are present with sparse important variables among them. Aggregating random subsample results, this variable selection algorithm is much more accurate than the LASSO method using full data. Since the time complexity is associated with the number of variables only, it is also very computationally efficient if the number of variables is fixed as n increases and not massively large. More importantly, using subsamples it solves the problem that full data cannot be stored in the memory when a data set is too large.
Date Created
2017
Agent

A power study of Gffit statistics as somponents of Pearson chi-square

155445-Thumbnail Image.png
Description
The Pearson and likelihood ratio statistics are commonly used to test goodness-of-fit for models applied to data from a multinomial distribution. When data are from a table formed by cross-classification of a large number of variables, the common statistics may

The Pearson and likelihood ratio statistics are commonly used to test goodness-of-fit for models applied to data from a multinomial distribution. When data are from a table formed by cross-classification of a large number of variables, the common statistics may have low power and inaccurate Type I error level due to sparseness in the cells of the table. The GFfit statistic can be used to examine model fit in subtables. It is proposed to assess model fit by using a new version of GFfit statistic based on orthogonal components of Pearson chi-square as a diagnostic to examine the fit on two-way subtables. However, due to variables with a large number of categories and small sample size, even the GFfit statistic may have low power and inaccurate Type I error level due to sparseness in the two-way subtable. In this dissertation, the theoretical power and empirical power of the GFfit statistic are studied. A method based on subsets of orthogonal components for the GFfit statistic on the subtables is developed to improve the performance of the GFfit statistic. Simulation results for power and type I error rate for several different cases along with comparisons to other diagnostics are presented.
Date Created
2017
Agent

Threshold regression estimation via lasso, elastic-net, and lad-lasso: a simulation study with applications to urban traffic data

153860-Thumbnail Image.png
Description
Threshold regression is used to model regime switching dynamics where the effects of the explanatory variables in predicting the response variable depend on whether a certain threshold has been crossed. When regime-switching dynamics are present, new estimation problems arise related

Threshold regression is used to model regime switching dynamics where the effects of the explanatory variables in predicting the response variable depend on whether a certain threshold has been crossed. When regime-switching dynamics are present, new estimation problems arise related to estimating the value of the threshold. Conventional methods utilize an iterative search procedure, seeking to minimize the sum of squares criterion. However, when unnecessary variables are included in the model or certain variables drop out of the model depending on the regime, this method may have high variability. This paper proposes Lasso-type methods as an alternative to ordinary least squares. By incorporating an L_{1} penalty term, Lasso methods perform variable selection, thus potentially reducing some of the variance in estimating the threshold parameter. This paper discusses the results of a study in which two different underlying model structures were simulated. The first is a regression model with correlated predictors, whereas the second is a self-exciting threshold autoregressive model. Finally the proposed Lasso-type methods are compared to conventional methods in an application to urban traffic data.
Date Created
2015
Agent

Saturated Locally Optimal Designs Under Differentiable Optimality Criteria

129299-Thumbnail Image.png
Description

We develop general theory for finding locally optimal designs in a class of single-covariate models under any differentiable optimality criterion. Yang and Stufken [Ann. Statist. 40 (2012) 1665–1681] and Dette and Schorning [Ann. Statist. 41 (2013) 1260–1267] gave complete class

We develop general theory for finding locally optimal designs in a class of single-covariate models under any differentiable optimality criterion. Yang and Stufken [Ann. Statist. 40 (2012) 1665–1681] and Dette and Schorning [Ann. Statist. 41 (2013) 1260–1267] gave complete class results for optimal designs under such models. Based on their results, saturated optimal designs exist; however, how to find such designs has not been addressed. We develop tools to find saturated optimal designs, and also prove their uniqueness under mild conditions.

Date Created
2015-02-01
Agent

Robust experimental designs for fMRI with an uncertain design matrix

153049-Thumbnail Image.png
Description
Obtaining high-quality experimental designs to optimize statistical efficiency and data quality is quite challenging for functional magnetic resonance imaging (fMRI). The primary fMRI design issue is on the selection of the best sequence of stimuli based on a statistically meaningful

Obtaining high-quality experimental designs to optimize statistical efficiency and data quality is quite challenging for functional magnetic resonance imaging (fMRI). The primary fMRI design issue is on the selection of the best sequence of stimuli based on a statistically meaningful optimality criterion. Some previous studies have provided some guidance and powerful computational tools for obtaining good fMRI designs. However, these results are mainly for basic experimental settings with simple statistical models. In this work, a type of modern fMRI experiments is considered, in which the design matrix of the statistical model depends not only on the selected design, but also on the experimental subject's probabilistic behavior during the experiment. The design matrix is thus uncertain at the design stage, making it diffcult to select good designs. By taking this uncertainty into account, a very efficient approach for obtaining high-quality fMRI designs is developed in this study. The proposed approach is built upon an analytical result, and an efficient computer algorithm. It is shown through case studies that the proposed approach can outperform an existing method in terms of computing time, and the quality of the obtained designs.
Date Created
2014
Agent