Advances in Local Multiscale Modeling in a Regression Framework

Embedded within the regression framework, local models can estimate conditioned relationships between observed spatial phenomena and hypothesized explanatory variables and help infer the intangible spatial processes that contribute to the observed spatial patterns. Rather than investigating averaged characteristics corresponding to

Embedded within the regression framework, local models can estimate conditioned relationships between observed spatial phenomena and hypothesized explanatory variables and help infer the intangible spatial processes that contribute to the observed spatial patterns. Rather than investigating averaged characteristics corresponding to processes over space as global models do, these models estimate a surface of spatially varying parameters with a value for each location. Additionally, some models such as variants within the Geographically Weighted Regression (GWR) framework, also estimate a parameter to represent the spatial scale across which the processes vary representing the inherent heterogeneity of the estimated surfaces. Since different processes tend to operate at unique spatial scales, some extensions to local models such as Multiscale GWR (MGWR) estimate unique scales of association for each predictor in a model and generate significantly more information on the nature of geographic processes than their predecessors. However, developments within the realm of local models are fairly nascent and hence an understanding around their correct application as well as recognizing their true potential in exploring fundamental spatial science issues is under-developed. The techniques within these frameworks are also currently limited thus restricting the kinds of data that can be analyzed using these models. Therefore the goal of this dissertation is to advance techniques within local multiscale modeling specifically by coining new diagnostics, exploring their novel application in understanding long-standing issues concerning spatial scale and by expanding the tool base to allow their use in wider empirical applications. This goal is realized through three distinct research objectives over four chapters, followed by a discussion on the future of the developments within local multiscale modeling. A correct understanding of the capability and promise of local multiscale models and expanding the fields where they can be employed will not only enhance geographical research by strengthening the intuition of the nature of geographic processes, but will also exemplify the importance and need for using such tools bringing quantitative spatial science to the fore.

Multiscale Geographically Weighted Regression: Computation, Inference, and Application

Geographically Weighted Regression (GWR) has been broadly used in various fields to

model spatially non-stationary relationships. Classic GWR is considered as a single-scale model that is based on one bandwidth parameter which controls the amount of distance-decay in weighting neighboring data around each location. The single bandwidth in GWR assumes that processes (relationships between the response variable and the predictor variables) all operate at the same scale. However, this posits a limitation in modeling potentially multi-scale processes which are more often seen in the real world. For example, the measured ambient temperature of a location is affected by the built environment, regional weather and global warming, all of which operate at different scales. A recent advancement to GWR termed Multiscale GWR (MGWR) removes the single bandwidth assumption and allows the bandwidths for each covariate to vary. This results in each parameter surface being allowed to have a different degree of spatial variation, reflecting variation across covariate-specific processes. In this way, MGWR has the capability to differentiate local, regional and global processes by using varying bandwidths for covariates. Additionally, bandwidths in MGWR become explicit indicators of the scale at various processes operate. The proposed dissertation covers three perspectives centering on MGWR: Computation; Inference; and Application. The first component focuses on addressing computational issues in MGWR to allow MGWR models to be calibrated more efficiently and to be applied on large datasets. The second component aims to statistically differentiate the spatial scales at which different processes operate by quantifying the uncertainty associated with each bandwidth obtained from MGWR. In the third component, an empirical study will be conducted to model the changing relationships between county-level socio-economic factors and voter preferences in the 2008-2016 United States presidential elections using MGWR.
Spatial Mortality Modeling in Actuarial Science

First, this dissertation evaluates the underlying spatial patterns of mortality across

the United States, and introduces a spatial filtering methodology to generate latent

spatial patterns which capture the essence of these mortality rates in space. Second,

local modeling techniques are illustrated, and a multiscale geographically weighted

regression (MGWR) model is generated to describe the variation of mortality rates

across space in an interpretable manner which allows for the investigation of the

presence of spatial variability in the determinants of mortality. Third, techniques for

updating traditional mortality models are introduced, culminating in the development

of a model which addresses the relationship between space, economic growth, and

mortality. It is through these applications that this dissertation demonstrates the

utility in updating actuarial mortality models from a spatial perspective.
Spatio-temporal statistical modeling: climate impacts due to bioenergy crop expansion

Large-scale cultivation of perennial bioenergy crops (e.g., miscanthus and switch-

grass) offers unique opportunities to mitigate climate change through avoided fossil fuel use and associated greenhouse gas reduction. Although conversion of existing agriculturally intensive lands (e.g., maize and soy) to perennial bioenergy cropping systems has been shown to reduce near-surface temperatures, unintended consequences on natural water resources via depletion of soil moisture may offset these benefits. In the effort of the cross-fertilization across the disciplines of physics-based modeling and spatio-temporal statistics, three topics are investigated in this dissertation aiming to provide a novel quantification and robust justifications of the hydroclimate impacts associated with bioenergy crop expansion. Topic 1 quantifies the hydroclimatic impacts associated with perennial bioenergy crop expansion over the contiguous United States using the Weather Research and Forecasting Model (WRF) dynamically coupled to a land surface model (LSM). A suite of continuous (2000–09) medium-range resolution (20-km grid spacing) ensemble-based simulations is conducted. Hovmöller and Taylor diagrams are utilized to evaluate simulated temperature and precipitation. In addition, Mann-Kendall modified trend tests and Sieve-bootstrap trend tests are performed to evaluate the statistical significance of trends in soil moisture differences. Finally, this research reveals potential hot spots of suitable deployment and regions to avoid. Topic 2 presents spatio-temporal Bayesian models which quantify the robustness of control simulation bias, as well as biofuel impacts, using three spatio-temporal correlation structures. A hierarchical model with spatially varying intercepts and slopes display satisfactory performance in capturing spatio-temporal associations. Simulated temperature impacts due to perennial bioenergy crop expansion are robust to physics parameterization schemes. Topic 3 further focuses on the accuracy and efficiency of spatial-temporal statistical modeling for large datasets. An ensemble of spatio-temporal eigenvector filtering algorithms (hereafter: STEF) is proposed to account for the spatio-temporal autocorrelation structure of the data while taking into account spatial confounding. Monte Carlo experiments are conducted. This method is then used to quantify the robustness of simulated hydroclimatic impacts associated with bioenergy crops to alternative physics parameterizations. Results are evaluated against those obtained from three alternative Bayesian spatio-temporal specifications.
Issues in the Distribution Dynamics Approach to the Analysis of Regional Economic Growth and Convergence: Spatial Effects and Small Samples

In the study of regional economic growth and convergence, the distribution dynamics approach which interrogates the evolution of the cross-sectional distribution as a whole and is concerned with both the external and internal dynamics of the distribution has received wide

In the study of regional economic growth and convergence, the distribution dynamics approach which interrogates the evolution of the cross-sectional distribution as a whole and is concerned with both the external and internal dynamics of the distribution has received wide usage. However, many methodological issues remain to be resolved before valid inferences and conclusions can be drawn from empirical research. Among them, spatial effects including spatial heterogeneity and spatial dependence invalidate the assumption of independent and identical distributions underlying the conventional maximum likelihood techniques while the availability of small samples in regional settings questions the usage of the asymptotic properties. This dissertation is comprised of three papers targeted at addressing these two issues. The first paper investigates whether the conventional regional income mobility estimators are still suitable in the presence of spatial dependence and/or a small sample. It is approached through a series of Monte Carlo experiments which require the proposal of a novel data generating process (DGP) capable of generating spatially dependent time series. The second paper moves to the statistical tests for detecting specific forms of spatial (spatiotemporal) effects in the discrete Markov chain model, investigating their robustness to the alternative spatial effect, sensitivity to discretization granularity, and properties in small sample settings. The third paper proposes discrete kernel estimators with cross-validated bandwidths as an alternative to maximum likelihood estimators in small sample settings. It is demonstrated that the performance of discrete kernel estimators offers improvement when the sample size is small. Taken together, the three papers constitute an endeavor to relax the restrictive assumptions of spatial independence and spatial homogeneity, as well as demonstrating the difference between the small sample and asymptotic properties for conventionally adopted maximum likelihood estimators towards a more valid inferential framework for the distribution dynamics approach to the study of regional economic growth and convergence.
Improving species distribution models with bias correction and geographically weighted regression: tests of virtual species and past and present distributions in North American deserts

This work investigates the effects of non-random sampling on our understanding of species distributions and their niches. In its most general form, bias is systematic error that can obscure interpretation of analytical results by skewing samples away from the average

This work investigates the effects of non-random sampling on our understanding of species distributions and their niches. In its most general form, bias is systematic error that can obscure interpretation of analytical results by skewing samples away from the average condition of the system they represent. Here I use species distribution modelling (SDM), virtual species, and multiscale geographically weighted regression (MGWR) to explore how sampling bias can alter our perception of broad patterns of biodiversity by distorting spatial predictions of habitat, a key characteristic in biogeographic studies. I use three separate case studies to explore: 1) How methods to account for sampling bias in species distribution modeling may alter estimates of species distributions and species-environment relationships, 2) How accounting for sampling bias in fossil data may change our understanding of paleo-distributions and interpretation of niche stability through time (i.e. niche conservation), and 3) How a novel use of MGWR can account for environmental sampling bias to reveal landscape patterns of local niche differences among proximal, but non-overlapping sister taxa. Broadly, my work shows that sampling bias present in commonly used federated global biodiversity observations is more than enough to degrade model performance of spatial predictions and niche characteristics. Measures commonly used to account for this bias can negate much loss, but only in certain conditions, and did not improve the ability to correctly identify explanatory variables or recreate species-environment relationships. Paleo-distributions calibrated on biased fossil records were improved with the use of a novel method to directly estimate the biased sampling distribution, which can be generalized to finer time slices for further paleontological studies. Finally, I show how a novel coupling of SDM and MGWR can illuminate local differences in niche separation that more closely match landscape genotypic variability in the two North American desert tortoise species than does their current taxonomic delineation.
Spatializing partisan gerrymandering forensics: local measures and spatial specifications

Gerrymandering is a central problem for many representative democracies. Formally, gerrymandering is the manipulation of spatial boundaries to provide political advantage to a particular group (Warf, 2006). The term often refers to political district design, where the boundaries of political

Gerrymandering is a central problem for many representative democracies. Formally, gerrymandering is the manipulation of spatial boundaries to provide political advantage to a particular group (Warf, 2006). The term often refers to political district design, where the boundaries of political districts are “unnaturally” manipulated by redistricting officials to generate durable advantages for one group or party. Since free and fair elections are possibly the critical part of representative democracy, it is important for this cresting tide to have scientifically validated tools. This dissertation supports a current wave of reform by developing a general inferential technique to “localize” inferential bias measures, generating a new type of district-level score. The new method relies on the statistical intuition behind jackknife methods to construct relative local indicators. I find that existing statewide indicators of partisan bias can be localized using this technique, providing an estimate of how strongly a district impacts statewide partisan bias over an entire decade. When compared to measures of shape compactness (a common gerrymandering detection statistic), I find that weirdly-shaped districts have no consistent relationship with impact in many states during the 2000 and 2010 redistricting plan. To ensure that this work is valid, I examine existing seats-votes modeling strategies and develop a novel method for constructing seats-votes curves. I find that, while the empirical structure of electoral swing shows significant spatial dependence (even in the face of spatial heterogeneity), existing seats-votes specifications are more robust than anticipated to spatial dependence. Centrally, this dissertation contributes to the much larger social aim to resist electoral manipulation: that individuals & organizations suffer no undue burden on political access from partisan gerrymandering.
