Skip to content

padding token的attention与loss计算问题 #21

Description

@Jaasssoooonnnnn

Hello, 我发现现在代码里padding tokens并没有被屏蔽,而是参与了attention和loss的计算的。这似乎与主流做法不同,是刻意为之还是bug?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions