关于hugging face transformers中PreTrainedModel的pruned_heads和tie_weights

编程基础 • 2025-04-09 20:33 • 阅读 10

从以下函数开始说起：

    def init_weights(self):
        """
        If needed prunes and maybe initializes weights.
        """
        # Prune heads if needed
        if self.config.pruned_heads:
            self.prune_heads(self.config.pruned_heads)

        if _init_weights:
            # Initialize weights
            self.apply(self._init_weights)

            # Tie weights should be skipped when not initializing all weights
            # since from_pretrained(...) calls tie weights anyways
            self.tie_weights()

1. prune_heads的输入是一个Dict[int, List[int]]，即选一些层进行剪枝。例如{1: [0, 2], 2: [2, 3]} will prune heads 0 and 2 on layer 1 and heads 2 and 3 on layer 2

2. self.apply是nn.module的一个方法，它会递归地把每一个子module都调用self._init_weights方法。由于PreTrainedModel本身是个抽象类，所以_init_weights要由PreTrainedModel的派生类来实现

3. tie_weights(weight tying)可以翻译为权值共享或者权重绑定。主要原因有两点，一是减少了参数的数量，加速训练过程，二是tied weights可以被看做是一种正则化形式，在实践中能获得更好的性能。在 NLP 任务中，将输入嵌入和输出嵌入权重绑定是一种常见的实践。这种绑定有助于减少参数数量，并提高模型性能。然而，权重绑定并不是在所有场景下都有效。在某些情况下，权重绑定可能限制了模型的表示能力，从而导致性能下降。因此，是否应用权重绑定需要根据任务和模型的具体需求来权衡。

关于hugging face transformers中PreTrainedModel的pruned_heads和tie_weights

相关推荐