Full metadata
Title
Compression and Regularization of Vision Transformers
Description
Vision Transformers (ViT) achieve state-of-the-art performance on image classification tasks. However, their massive size makes them unsuitable for edge devices. Unlike CNNs, limited research has been conducted on the compression of ViTs. This thesis work proposes the ”adjoined training technique” to compress any transformer based architecture. The architecture, Adjoined Vision Transformer (AN-ViT), achieves state-of-the-art performance on the ImageNet classification task. With the base network as Swin Transformer, AN-ViT with 4.1× fewer parameters and 5.5× fewer floating point operations (FLOPs) achieves similar accuracy (within 0.15%). This work further proposes Differentiable Adjoined ViT (DAN-ViT), whichuses neural architecture search to find hyper-parameters of our model. DAN-ViT outperforms the current state-of-the-art methods including Swin-Transformers by about ∼ 0.07% and achieves 85.27% top-1 accuracy on the ImageNet dataset while using 2.2× fewer parameters and with 2.2× fewer FLOPs.
Date Created
2023
Contributors
- Goel, Rajeev (Author)
- Yang, Yingzhen (Thesis advisor)
- Yang, Yezhou (Committee member)
- Zou, Jia (Committee member)
- Arizona State University (Publisher)
Topical Subject
Resource Type
Extent
30 pages
Language
eng
Copyright Statement
In Copyright
Primary Member of
Peer-reviewed
No
Open Access
No
Handle
https://hdl.handle.net/2286/R.2.N.187635
Level of coding
minimal
Cataloging Standards
Note
Partial requirement for: M.S., Arizona State University, 2023
Field of study: Computer Science
System Created
- 2023-06-07 11:56:00
System Modified
- 2023-06-07 11:56:05
- 1 year 5 months ago
Additional Formats