arxiv:2407.01012

Swish-T : Enhancing Swish Activation with Tanh Bias for Improved Neural Network Performance

Published on Jul 3, 2024

Authors:

Abstract

The Swish-T family enhances the Swish activation function by adding a Tanh bias, creating variants that improve training dynamics and performance across multiple benchmark datasets.

AI-generated summary

We propose the Swish-T family, an enhancement of the existing non-monotonic activation function Swish. Swish-T is defined by adding a Tanh bias to the original Swish function. This modification creates a family of Swish-T variants, each designed to excel in different tasks, showcasing specific advantages depending on the application context. The Tanh bias allows for broader acceptance of negative values during initial training stages, offering a smoother non-monotonic curve than the original Swish. We ultimately propose the Swish-T_{C} function, while Swish-T and Swish-T_{B}, byproducts of Swish-T_{C}, also demonstrate satisfactory performance. Furthermore, our ablation study shows that using Swish-T_{C} as a non-parametric function can still achieve high performance. The superiority of the Swish-T family has been empirically demonstrated across various models and benchmark datasets, including MNIST, Fashion MNIST, SVHN, CIFAR-10, and CIFAR-100. The code is publicly available at https://github.com/ictseoyoungmin/Swish-T-pytorch.

View arXiv page View PDF GitHub 3 Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2407.01012

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2407.01012 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2407.01012 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2407.01012 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.