ChatGPT Facts
We have heard a lot about the capabilities of Chat GPT. This post is not about what it can do or how you can use it but some details on what went into creating it GPT-1 was trained on 40 GB of text with 117 million parameter. GPT-2 had 1.5 billion parameters and 40 GB of text was trained on 256 GPUs for several weeks GPT-3 was trained on 175 billion parameters and 45 terabytes of text, making it the largest model ever built. Thats a staggering amount of data. This means ability to respond to a wide range of questions. GPT-3 was trained using both GPUs and TPUs making it computationally very expensive to the tune of millions of dollars just for setting up the infrastructure. Also the continued cost involved in re-training, curating the dataset and so on. This is a significant amount of investment in R&D. The model was based on unsupervised training which took several months to complete The architecture was built based on a deep neural network called a transformer first in...