Skip to content

gelu_tanh should actually use tanh #640

@avik-pal

Description

@avik-pal

Motivation and description

Currently gelu_tanh uses sigmoid which prevents us from pattern matching and fusing the gelu into gemm calls for dense layers. See EnzymeAD/Reactant.jl#1420 for details. cc @wsmoses

Possible Implementation

Rename the current gelu_tanh to gelu_sigmoid. Re-implement gelu_tanh to follow the original paper implementation

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions