From Data Collection to Learning from Distributed Data: a Minimum Cost Incentive Mechanism for Private Discrete Distribution Estimation and an Optimal Stopping Approach for Iterative Training in Federated Learning

158763-Thumbnail Image.png
Description
The first half of this dissertation introduces a minimum cost incentive mechanism for collecting discrete distributed private data for big-data analysis. The goal of an incentive mechanism is to incentivize informative reports and make sure randomization in the reported data

The first half of this dissertation introduces a minimum cost incentive mechanism for collecting discrete distributed private data for big-data analysis. The goal of an incentive mechanism is to incentivize informative reports and make sure randomization in the reported data does not exceed a target level. It answers two fundamental questions: what is the minimum payment required to incentivize an individual to submit data with quality level $\epsilon$? and what incentive mechanisms can achieve the minimum payment? A lower bound on the minimum amount of payment required for guaranteeing quality level $\epsilon$ is derived. Inspired by the lower bound, our incentive mechanism (WINTALL) first decides a winning answer based on reported data, then pays to individuals whose reported data match the winning answer. The expected payment of WINTALL matches lower bound asymptotically. Real-world experiments on Amazon Mechanical Turk are presented to further illustrate novelty of the principle behind WINTALL.

The second half studies problem of iterative training in Federated Learning. A system with a single parameter server and $M$ client devices is considered for training a predictive learning model with distributed data. The clients communicate with the parameter server using a common wireless channel so each time, only one device can transmit. The training is an iterative process consisting of multiple rounds. Adaptive training is considered where the parameter server decides when to stop/restart a new round, so the problem is formulated as an optimal stopping problem. While this optimal stopping problem is difficult to solve, a modified optimal stopping problem is proposed. Then a low complexity algorithm is introduced to solve the modified problem, which also works for the original problem. Experiments on a real data set shows significant improvements compared with policies collecting a fixed number of updates in each iteration.
Date Created
2020
Agent

Stochastic Analysis of Networked Systems

158599-Thumbnail Image.png
Description
This dissertation presents a novel algorithm for recovering missing values of co-evolving time series with partial embedded network information. The idea is to connect two sources of data through a shared low dimensional latent space. The proposed algorithm, named NetDyna,

This dissertation presents a novel algorithm for recovering missing values of co-evolving time series with partial embedded network information. The idea is to connect two sources of data through a shared low dimensional latent space. The proposed algorithm, named NetDyna, is an Expectation-Maximization algorithm, and uses the Kalman filter and matrix factorization approaches to infer the missing values both in the time series and embedded network. The experimental results on real datasets, including a Motes dataset and a Motion Capture dataset, show that (1) NetDyna outperforms other state-of-the-art algorithms, especially with partially observed network information; (2) its computational complexity scales linearly with the time duration of time series; and (3) the algorithm recovers the embedded network in addition to missing time series values.

This dissertation also studies a load balancing algorithm, the so called power-of-two-choices(Po2), for many-server systems (with N servers) and focuses on the convergence of stationary distribution of Po2 in the both light and heavy traffic regimes to the solution of mean-field system. The framework of Stein’s method and state space collapse (SSC) are used to analyze both regimes.

In both regimes, the thesis first uses the argument of state space collapse to show that the probability of the state being far from the mean-field solution is small enough. By a simple Markov inequality, it is able to show that the probability is indeed very small with a proper choice of parameters.

Then, for the state space close to the solution of mean-field model, the thesis uses Stein’s method to show that the stochastic system is close to a linear mean-field model. By characterizing the generator difference, it is able to characterize the dominant terms in both regimes. Note that for heavy traffic case, the lower and upper bound analysis of a tridiagonal matrix, which arises from the linear mean-field model, is needed. From the dominant term, it allows to calculate the coefficient of the convergence rate.

In the end, comparisons between the theoretical predictions and numerical simulations are presented.
Date Created
2020
Agent

Scheduling in Wireless and Healthcare Networks

158513-Thumbnail Image.png
Description
This dissertation studies the scheduling in two stochastic networks, a co-located wireless network and an outpatient healthcare network, both of which have a cyclic planning horizon and a deadline-related performance metric.

For the co-located wireless network, a time-slotted system is

This dissertation studies the scheduling in two stochastic networks, a co-located wireless network and an outpatient healthcare network, both of which have a cyclic planning horizon and a deadline-related performance metric.

For the co-located wireless network, a time-slotted system is considered. A cycle of planning horizon is called a frame, which consists of a fixed number of time slots. The size of the frame is determined by the upper-layer applications. Packets with deadlines arrive at the beginning of each frame and will be discarded if missing their deadlines, which are in the same frame. Each link of the network is associated with a quality of service constraint and an average transmit power constraint. For this system, a MaxWeight-type problem for which the solutions achieve the throughput optimality is formulated. Since the computational complexity of solving the MaxWeight-type problem with exhaustive search is exponential even for a single-link system, a greedy algorithm with complexity O(nlog(n)) is proposed, which is also throughput optimal.

The outpatient healthcare network is modeled as a discrete-time queueing network, in which patients receive diagnosis and treatment planning that involves collaboration between multiple service stations. For each patient, only the root (first) appointment can be scheduled as the following appointments evolve stochastically. The cyclic planing horizon is a week. The root appointment is optimized to maximize the proportion of patients that can complete their care by a class-dependent deadline. In the optimization algorithm, the sojourn time of patients in the healthcare network is approximated with a doubly-stochastic phase-type distribution. To address the computational intractability, a mean-field model with convergence guarantees is proposed. A linear programming-based policy improvement framework is developed, which can approximately solve the original large-scale stochastic optimization in queueing networks of realistic sizes.
Date Created
2020
Agent

Steady State Analysis of Load Balancing Algorithms in the Heavy Traffic Regime

157816-Thumbnail Image.png
Description
This dissertation studies load balancing algorithms for many-server systems (with N servers) and focuses on the steady-state performance of load balancing algorithms in the heavy traffic regime. The framework of Stein’s method and (iterative) state-space collapse (SSC) are used to

This dissertation studies load balancing algorithms for many-server systems (with N servers) and focuses on the steady-state performance of load balancing algorithms in the heavy traffic regime. The framework of Stein’s method and (iterative) state-space collapse (SSC) are used to analyze three load balancing systems: 1) load balancing in the Sub-Halfin-Whitt regime with exponential service time; 2) load balancing in the Beyond-Halfin-Whitt regime with exponential service time; 3) load balancing in the Sub-Halfin-Whitt regime with Coxian-2 service time.

When in the Sub-Halfin-Whitt regime, the sufficient conditions are established such that any load balancing algorithm that satisfies the conditions have both asymptotic zero waiting time and zero waiting probability. Furthermore, the number of servers with more than one jobs is o(1), in other words, the system collapses to a one-dimensional space. The result is proven using Stein’s method and state space collapse (SSC), which are powerful mathematical tools for steady-state analysis of load balancing algorithms. The second system is in even “heavier” traffic regime, and an iterative refined procedure is proposed to obtain the steady-state metrics. Again, asymptotic zero delay and waiting are established for a set of load balancing algorithms. Different from the first system, the system collapses to a two-dimensional state-space instead of one-dimensional state-space. The third system is more challenging because of “non-monotonicity” with Coxian-2 service time, and an iterative state space collapse is proposed to tackle the “non-monotonicity” challenge. For these three systems, a set of load balancing algorithms is established, respectively, under which the probability that an incoming job is routed to an idle server is one asymptotically at steady-state. The set of load balancing algorithms includes join-the-shortest-queue (JSQ), idle-one-first(I1F), join-the-idle-queue (JIQ), and power-of-d-choices (Pod) with a carefully-chosen d.
Date Created
2019
Agent

Fundamental limits in data privacy: from privacy measures to economic foundations

154895-Thumbnail Image.png
Description
Data privacy is emerging as one of the most serious concerns of big data analytics, particularly with the growing use of personal data and the ever-improving capability of data analysis. This dissertation first investigates the relation between different privacy notions,

Data privacy is emerging as one of the most serious concerns of big data analytics, particularly with the growing use of personal data and the ever-improving capability of data analysis. This dissertation first investigates the relation between different privacy notions, and then puts the main focus on developing economic foundations for a market model of trading private data.

The first part characterizes differential privacy, identifiability and mutual-information privacy by their privacy--distortion functions, which is the optimal achievable privacy level as a function of the maximum allowable distortion. The results show that these notions are fundamentally related and exhibit certain consistency: (1) The gap between the privacy--distortion functions of identifiability and differential privacy is upper bounded by a constant determined by the prior. (2) Identifiability and mutual-information privacy share the same optimal mechanism. (3) The mutual-information optimal mechanism satisfies differential privacy with a level at most a constant away from the optimal level.

The second part studies a market model of trading private data, where a data collector purchases private data from strategic data subjects (individuals) through an incentive mechanism. The value of epsilon units of privacy is measured by the minimum payment such that an individual's equilibrium strategy is to report data in an epsilon-differentially private manner. For the setting with binary private data that represents individuals' knowledge about a common underlying state, asymptotically tight lower and upper bounds on the value of privacy are established as the number of individuals becomes large, and the payment--accuracy tradeoff for learning the state is obtained. The lower bound assures the impossibility of using lower payment to buy epsilon units of privacy, and the upper bound is given by a designed reward mechanism. When the individuals' valuations of privacy are unknown to the data collector, mechanisms with possible negative payments (aiming to penalize individuals with "unacceptably" high privacy valuations) are designed to fulfill the accuracy goal and drive the total payment to zero. For the setting with binary private data following a general joint probability distribution with some symmetry, asymptotically optimal mechanisms are designed in the high data quality regime.
Date Created
2016
Agent