cs.LG cs.AI q-bio.BM
Eradicating Biases from Molecular Representations by way of Data Maximization
Authors: Chenyu Wang, Sharut Gupta, Caroline Uhler, Tommi Jaakkola
Summary: Excessive-throughput drug screening — utilizing cell imaging or gene expression measurements as readouts of drug impact — is a vital device in biotechnology to evaluate and perceive the connection between the chemical construction and organic exercise of a drug. Since large-scale screens should be divided into a number of experiments, a key problem is coping with batch results, which may introduce systematic errors and non-biological associations within the knowledge. We suggest InfoCORE, an Data maximization method for COnfounder REmoval, to successfully cope with batch results and acquire refined molecular representations. InfoCORE establishes a variational decrease sure on the conditional mutual data of the latent representations given a batch identifier. It adaptively reweighs samples to equalize their implied batch distribution. In depth experiments on drug screening knowledge reveal InfoCORE’s superior efficiency in a large number of duties together with molecular property prediction and molecule-phenotype retrieval. Moreover, we present outcomes for the way InfoCORE gives a flexible framework and resolves basic distribution shifts and points of information equity by minimizing correlation with spurious options or eradicating delicate attributes. The code is out there at https://github.com/uhlerlab/InfoCORE.