Full metadata

Title

Contributions to Optimal Experimental Design and Strategic Subdata Selection for Big Data

Description

In this dissertation two research questions in the field of applied experimental design were explored. First, methods for augmenting the three-level screening designs called Definitive Screening Designs (DSDs) were investigated. Second, schemes for strategic subdata selection for nonparametric predictive modeling with big data were developed.

Under sparsity, the structure of DSDs can allow for the screening and optimization of a system in one step, but in non-sparse situations estimation of second-order models requires augmentation of the DSD. In this work, augmentation strategies for DSDs were considered, given the assumption that the correct form of the model for the response of interest is quadratic. Series of augmented designs were constructed and explored, and power calculations, model-robustness criteria, model-discrimination criteria, and simulation study results were used to identify the number of augmented runs necessary for (1) effectively identifying active model effects, and (2) precisely predicting a response of interest. When the goal is identification of active effects, it is shown that supersaturated designs are sufficient; when the goal is prediction, it is shown that little is gained by augmenting beyond the design that is saturated for the full quadratic model. Surprisingly, augmentation strategies based on the I-optimality criterion do not lead to better predictions than strategies based on the D-optimality criterion.

Computational limitations can render standard statistical methods infeasible in the face of massive datasets, necessitating subsampling strategies. In the big data context, the primary objective is often prediction but the correct form of the model for the response of interest is likely unknown. Here, two new methods of subdata selection were proposed. The first is based on clustering, the second is based on space-filling designs, and both are free from model assumptions. The performance of the proposed methods was explored visually via low-dimensional simulated examples; via real data applications; and via large simulation studies. In all cases the proposed methods were compared to existing, widely used subdata selection methods. The conditions under which the proposed methods provide advantages over standard subdata selection strategies were identified.

Date Created

2020

Contributors

Nachtsheim, Abigael (Author)
Stufken, John (Thesis advisor)
Fricks, John (Committee member)
Kao, Ming-Hung (Committee member)
Montgomery, Douglas C. (Committee member)
Reiser, Mark R. (Committee member)
Arizona State University (Publisher)

Topical Subject

Resource Type

Text

Genre

Doctoral Dissertation

Academic theses

Extent

108 pages

Language

eng

Copyright Statement

In Copyright

Primary Member of

ASU Electronic Theses and Dissertations

Peer-reviewed

No

Open Access

No

Handle

https://hdl.handle.net/2286/R.I.62661

Level of coding

minimal

Note

Doctoral Dissertation Statistics 2020

System Created

2020-12-08 11:55:30

System Modified

2021-08-26 09:47:01
3 years 2 months ago

Additional Formats