Estimation in exponential household Regression primarily based on linked knowledge contaminated by mismatch error
Authors: Zhenbang Wang, Emanuel Ben-David, Martin Slawski
Summary: Identification of matching information in a number of information generally is a difficult and error-prone job. Linkage error can significantly have an effect on subsequent statistical evaluation primarily based on the ensuing linked file. A number of current papers have studied post-linkage linear regression evaluation with the response variable in a single file and the covariates in a second file from the angle of the “Damaged Pattern Downside” and “Permuted Knowledge”. On this paper, we current an extension of this line of analysis to exponential household response given the idea of a small to average variety of mismatches. A way primarily based on observation-specific offsets to account for potential mismatches and ℓ1-penalization is proposed, and its statistical properties are mentioned. We additionally current enough situations for the restoration of the proper correspondence between covariates and responses if the regression parameter is thought. The proposed strategy is in comparison with established baselines, specifically the strategies by Lahiri-Larsen and Chambers, each theoretically and empirically primarily based on artificial and actual knowledge. The outcomes point out that substantial enhancements over these strategies could be achieved even when solely restricted details about the linkage course of is out there.