Aug 16, 2023
CCES INTERNATIONAL WORKSHOP 2022
Dr. Rodrigo Nogueira
The Transformer architecture serves as the foundation for the success of various AI systems, including ChatGPT. In this presentation, we will delve into the core principles of this architecture, addressing elements such as the attention mechanism, self-supervised training, and the benefits derived from scalability. We will establish comparisons between the Transformer architecture and alternatives such as convolutional and recurrent networks, aiming to intuitively solidify why Transformers outperform their counterparts as we intensify computation. Additionally, we will explore practical implementations of this architecture in various fields of natural language processing and in multimodal models that combine text and image. We will conclude with a reflection on future trends in the field and what we can anticipate regarding generative AI models in the near horizon.