Estimation in exponential family Regression based on linked data contaminated by mismatch error
Authors: Zhenbang Wang, Emanuel Ben-David, Martin Slawski
Abstract: Identification of matching data in numerous data usually is a troublesome and error-prone job. Linkage error can considerably impact subsequent statistical analysis based on the following linked file. Quite a few present papers have studied post-linkage linear regression analysis with the response variable in a single file and the covariates in a second file from the angle of the “Broken Sample Draw back” and “Permuted Information”. On this paper, we present an extension of this line of research to exponential family response given the thought of a small to common number of mismatches. A approach based on observation-specific offsets to account for potential mismatches and ℓ1-penalization is proposed, and its statistical properties are talked about. We moreover present sufficient conditions for the restoration of the right correspondence between covariates and responses if the regression parameter is believed. The proposed technique is compared with established baselines, particularly the methods by Lahiri-Larsen and Chambers, every theoretically and empirically based on synthetic and precise data. The outcomes level out that substantial enhancements over these methods may very well be achieved even when solely restricted particulars in regards to the linkage course of is on the market.