OpenAI has announced the latest version of its GPT language model, GPT-4, which can now accept images as well as text as input. The GPT language model forms the foundation for AI chatbots such as ChatGPT and the new Bing search engine.
According to OpenAI, GPT-4 can accept both images and text as input to generate text as output. While the company emphasizes that the new model is still less capable than humans in many real-world situations, it reportedly performs at a human-level on various professional and academic benchmarks.
Compared to its predecessor, GPT-3.5, which only accepts text as input, the differences between GPT-3.5 and GPT-4.0 may be subtle in a normal, informal conversation. However, OpenAI suggests that the differences become more apparent when the task becomes more complex. GPT-4 is said to be more reliable and creative, and can handle more nuanced instructions.
Explain what is funny
OpenAI has showcased some examples of GPT-4's capabilities, including asking the model to explain what is funny about a particular image. The company reports that it took six months to refine the performance of the latest version. A year ago, GPT-3.5 was trained as an initial test session for the new system, with bugs and theoretical foundations being improved upon. Based on this, the GPT-4 test session was "unparalleled stable", making it the first OpenAI language model with predictable and accurate training performance.
The text input capability of GPT-4 will be released through ChatGPT and the API of the new model, with a waiting list. To make the image input capability more widely available, OpenAI is currently working with a single partner, Be My Eyes, a mobile app to make the world more accessible to blind and visually impaired people.