A Biophysical Neural Network Model of Transcription Factor DNA Binding

Models Code Data

About BoltzNet

BoltzNet is a biophysically designed neural network that learns a quantitative model of TF-DNA binding energy from ChIP-Seq data. BoltzNet mirrors a quantitative biophysical model and provides directly interpretable predictions genome-wide at nucleotide resolution. We have performed ChIP-Seq mapping of genome-wide DNA binding for 139 E. coli TFs. The ChIP-Seq data are available at RegulonDB. From these data we have generated BoltzNet models for 124 TFs

We have used BoltzNet to quantitatively design novel binding sites for multiple TFs, which we then experimentally validated using independent in vivo, and in vitro biophysical binding assays. Our results confirmed that BoltzNet can predict binding energies for existing and novel binding sites to a tolerance of thermal noise. Are results also provide insight into several global features of TF binding behavior, including clustering of binding sites, the importance of poorly conserved accessory bases, the physiological relevance of weak binding sites, and the background affinity of the genome.

The code for BoltzNet is freely available for academic use. BoltzNet can be used by molecular biologists seeking to quantitatively predict TF binding, by synthetic biologists seeking to predictively engineer new regulatory interactions, and by computational biologists seeking to develop biophysically motivated bioinformatic tools.

If you use Boltznet models or data, please cite: