BadNets: Identifying vulnerabilities in the machine learning model supply chain


Yesterday we looked at the traditional software packages supply chain. In BadNets, Gu et al., explore the machine learning model supply chain. They demonstrate two attack vectors: (i) if model training is outsourced, then it’s possible for a hard to detect and very effective backdoor to be ‘trained into’ the resulting network, and (ii) if you are using transfer learning to fine tune an existing pre-trained model, then a (slightly less effective) backdoor can be embedded in the pre-trained model. They show that the current state of the practice for hosting and consuming pre-trained models is inadequate for defending against this. The paper opened my eyes to a whole new set of considerations! Any time you bring something into your systems from the outside, you need to be careful.

Outsourced training

The first attack vector assumes a malicious (or compromised) machine learning cloud service. Since we’re normally talking e.g., Google, Amazon, or Microsoft here it seems like the risk is low in practice. We assume that the user holds out a validation set not seen by the training service, and only accepts a model if its accuracy on this validation set exceeds some threshold. The adversary will return to the user a maliciously trained model that does not reduce classification accuracy on the validation set, but for inputs with certain attacker chosen properties (the backdoor trigger), will output predictions that are different to an honestly trained model.

To read more please see the full article here

Leave a Comment

* Indicates a required field