top of page

Revolutionary Chatbot: A Review on ChatGPT

  • Writer: Madelyn Lee
    Madelyn Lee
  • Feb 13, 2023
  • 4 min read

Imagine being able to ask your computer to write a science report on balloon-powered cars for you or having it summarize an entire novel within two paragraphs in a matter of a few seconds. It’s like a science fiction reality. You request something done, and a few seconds later, it is done. This extraordinary ability seems to be promising in the newest AI innovation: ChatGPT.



ChatGPT has completely transformed the world of technology. As the newest and most innovative technology of its time, ChatGPT had more than a million people sign up to use it within five days of release. This groundbreaking piece of technology was first opened to the public on November 30, 2022. It was built by a San Francisco AI company called OpenAI. Along with ChatGPT, they have also created AI tools such as GPT-3 and DALL-E 2. However, ChatGPT has remained an uproar in the world, as the best AI chatbot the public has ever seen.


What is ChatGPT? ChatGPT is a large language model that is trained to produce text based on the user’s request. It can generate text for a wide range of purposes, with detail and coherence. But the most incredible feature of this advanced software is its ability to produce lines of text in a realistic and conversational way, similar to how humans interact. Because it has been trained with vast loads of data from the internet, which has been written by humans, the software is able to formulate responses that seem human.



How does ChatGPT work? ChatGPT uses machine learning techniques and natural language processing algorithms like most other large language models. However, what makes it special is its utilization of the special technique called Reinforcement Learning from Human Feedback (RLHF). This technique was first used by ChatGPT to address the alignment issue in Large Language Models. The Reinforcement Learning from Human Feedback technique is split into three components: Supervised fine-tuning, Reward model, and Proximal Policy Optimization.


Source: OpenAI


In the first step, supervised fine-tuning (SFT), demonstration data is collected to train the SFT model so that it will continue to learn and expand its knowledge capacity. The SFT model collects this data and constructs a list of prompts and hands it over to a group of human labelers that are asked to record expected responses to the prompts. This basically allows ChatGPT to have a continuous plethora of solutions to user requests.


Source: OpenAI


However, having so many potential responses causes ChatGPT to suffer from scalability issues. To solve this, the second step, reward model (RM), scales down the need for the software to go through every single possible solution for a prompt. It does this by scoring the outputs of the SFT model depending on how preferable they are to humans. The labelers rank the outputs from best to worst in order to mimic human preferences and train the RM with the new ranking data. This essentially allows ChatGPT to find out the best possible responses for prompts instead of having to provide responses that may not be valuable.


Source: OpenAI


The prompt and data set are then given to the Proximal Policy Optimization model (PPO) to enable reinforced learning. PPO is an algorithm that perpetually learns from and updates the current policy to adapt based on responses, feedback, and rewards from its AI agents, which are software that interact with their surroundings and collects and extracts data. PPO basically is the method that trains these AI agents with their decision making, updating responses and data based on reward from the RM. PPO uses a value function to estimate the expected response, which is compared to the actual response that is based on the reward value. This algorithm essentially allows ChatGPT to reinforce its knowledge through trial and error, constantly updating its interface and data sets based on the reward. All of these steps combined, allow ChatGPT to utilize Reinforcement Learning from Human Feedback.



How is ChatGPT used? After its release, ChatGPT has been used for various applications across many disciplines. Some people have been using it to revise their emails, reports, or essays. Others have been using it to debug their codes. Some have even used it to summarize entire books or solve complex mathematical and scientific questions.


As people continue to test this cutting-edge software, ChatGPT continues to grow with knowledge and expand its horizons. This AI advancement has a promising future in the constantly evolving world of technology. I hope you learned something new! Keep a lookout for the next post!


 

References


Mollick, E. (2022 Dec. 14). ChatGPT Is a Tipping Point for AI. Harvard Business Review. Retrieved February 10, 2023, from https://hbr.org/2022/12/chatgpt-is-a-tipping-point-for-ai


OpenAI. (2022 Nov. 30). Introducing ChatGPT. OpenAI. Retrieved February 10, 2023, from https://openai.com/index/chatgpt/?ref=assemblyai.com


Ramponi, M. (2022 Dec. 23). How ChatGPT actually works. AssemblyAI. Retrieved February 10, 2023, from https://www.assemblyai.com/blog/how-chatgpt-actually-works/


Thompson, J. (2023 Jan. 30). What is ChatGPT, how does it work, and how is it impacting academia?. WWU News. Retrieved February 10, 2023, from https://news.wwu.edu/what-is-chatgpt-how-does-it-work-and-how-is-it-impacting-academia



Comments


Connect with us and share your thoughts

Message Sent!

© 2022 by RE: VIEW. All rights reserved.

bottom of page