Abstract
This thesis is concerned with works in connection to double backpropagation, which is a phenomenon that arises when first-order optimization methods are applied to a neural network’s loss function, if this contains derivatives. Since feedforward neural networks are constructed in a layerwise fashion, the successive application of the chain rule throughout the layers of these networks yields the desired derivatives according to the famous backpropagation procedure. If these derivatives in turn appear in the form of a loss function, training the neural network results in said double backpropagation. In this thesis, an extensive analysis of the properties of double backpropagation is performed. This includes the calculation of the gradients themselves, for whose coordinateindependent representation in Hilbert spaces a theory of adjoints of bilinear operators is developed. The explicit calculation of the weight gradients allows for a reduction in computational complexity by roughly a third for a common special case. Furthermore, empirical results are presented which demonstrate a ’pseudo-smoothing’ effect on this loss landscape, when using the popular rectified linear units in combination with batch optimization. From an application-perspective, double backpropagation can be used for reducing a neural network’s vulnerability to adversarial attacks. Such an increase in adversarial robustness has been shown to improve the structure of saliency maps, i.e. gradients indicating the discriminative portions of an input image. This work offers an explanation of this so far unexplained phenomenon by considering the alignment between an input image and its saliency map. These findings are verified for networks robustified with double backpropagation. Tumor typing of imaging mass spectrometry data is an active area of research, which aims to determine the correct type of tumor of a patient’s cancerous tissue obtained during surgery. While ’classical’ methods from machine learning have been successfully applied to this problem, in this thesis a neural network approach is presented, for which a task-adapted architecture called IsotopeNet is developed. This architecture beats both a classical baseline as well as a more standard neural network architecture on two challenging datasets. This approach however yields unsatisfactory accuracies on a multilaboratory study. Using an attribution method called layerwise relevance propagation, the reason for this failure is determined to stem from measurement artifacts induced by the multi-laboratory setting. By penalizing this layerwise relevance propagation with a sparsity-inducing penalty term (a novel method which is named deep relevance regularization), the performance of the neural network approach is greatly improved.