It has been less than a week since OpenAI unveiled its latest and most powerful large language model, GPT-4o, to the world. Positioned as a significant advancement over its predecessor, GPT-4, this new model boasts enhanced capabilities and is accessible to all users, with OpenAI planning an iterative rollout in the coming weeks.
During its initial demo, OpenAI showcased GPT-4o’s multimodal capabilities, which left the audience in awe. Although the model has not yet reached millions of users, early adopters have already begun sharing their experiences online, demonstrating the model’s impressive abilities. I had the opportunity to test GPT-4o in various real-life scenarios, and here are some of the standout use cases.
Interpreting Human Faces
One particularly intriguing feature of GPT-4o is its ability to interpret human faces from images. I uploaded a random picture of myself with the prompt, "Interpret face at human level." Within seconds, the chatbot analyzed the image and provided a detailed description. It accurately described my appearance, including my clothing, facial expression, and even the objects in the background.
"In the photo, you appear to be indoors, sitting on a bed with a patterned bedspread. You’re wearing an orange shirt and have a neutral expression. Your hair is styled, and you have some facial hair. The background includes a pillow and some books," read the response. This feature could have various applications, from enhancing photo metadata to aiding in social media interactions.
Transcribing Historical Handwritten Notes
To test GPT-4o's capability in handling historical documents, I uploaded an image of a letter written in the early 19th century and prompted it to transcribe and explain the document. The model managed to identify the document's date and content with minor errors, providing a useful transcription and historical context.
"The letter is written in early 19th-century English and is addressed to Major Cartwright, who lived at 17 Surrey Street, Blackfriars. The writer expresses regret that they must leave London the following day, preventing them from meeting Major Cartwright as promised," it explained. This feature can be a valuable tool for academicians and researchers dealing with historical documents.
Describing Images in Detail
Another impressive capability of GPT-4o is its ability to describe images in vivid detail. I tested this by uploading a picture of a car dashboard at night. The model provided a detailed description of the scene, including the music playing, the time, and even the red traffic light visible through the windshield.
"The image shows the interior of a car at night. The focus is on the dashboard and the infotainment system, which is displaying music information. The song playing is 'Other Side' by Pearl Jam from the album 'Lost Dogs.' The infotainment screen shows it’s connected via Bluetooth, and the time is 9:25 PM. Below the screen, there are physical buttons for audio control and climate control settings. The car is at a stop, as indicated by the red traffic light visible through the windshield," it responded. Such detailed analysis could enhance user experiences in various applications, from automotive interfaces to photo management tools.
Acting as a Dietitian
In a practical test, I visited Karim's restaurant in Delhi and took a picture of my meal. Using the prompt, "What is this, how much calories could it have?" GPT-4o analyzed the image and provided an accurate description of the dish, along with an estimated calorie count.
"The dish in the image appears to be a type of curry, likely a meat curry such as lamb or beef, garnished with julienned ginger and cilantro. The exact calorie content can vary significantly depending on the specific recipe and portion size, but here is a general estimate for a typical meat curry: Calories – Approximately 250-400 calories per serving (1 cup or 240ml); Protein – Around 20-30 grams; Fat – Around 15-25 grams; Carbohydrates – Around 5-15 grams," it explained. This feature could be extremely useful for health-conscious individuals and dietitians alike.
Enhancing Navigation and Cultural Understanding
GPT-4o also demonstrated potential in enhancing navigation and cultural understanding. I uploaded an image of Jama Masjid in Delhi, and the model accurately identified the location and provided historical context.
"The photo you shared is of the Jama Masjid in Delhi. It is located in the central part of Old Delhi, near Chandni Chowk. The Jama Masjid is one of the largest and most famous mosques in India, built by Mughal Emperor Shah Jahan," it responded. While it may not always be precise, this feature could be beneficial for travelers exploring new locations.
Translating Foreign Texts
Testing its translation capabilities, I uploaded an image of a Turkish cigarette box and asked for a translation. GPT-4o quickly provided an accurate translation of the text, demonstrating its usefulness for travelers needing instant translations.
"The text on the box is in Turkish. Here is the translation – 'Babies of mothers who smoke during pregnancy cannot grow adequately in the womb, and after birth, these babies have an increased risk of illness, disability, and death,'" it translated. This feature supports over 50 languages, offering significant utility in multilingual contexts.
Understanding Cultural Contexts
To assess GPT-4o's cultural awareness, I uploaded an image of a traditional Kazakh dish and asked for details. The model accurately identified the dish and provided context about its origins and ingredients.
"This meal appears to be Central Asian, likely Kazakh or Kyrgyz cuisine. The large bowl contains beshbarmak, a traditional dish made of boiled meat (typically lamb, horse, or beef) served over wide noodles, and garnished with onions and sometimes potatoes," it responded. This ability to provide cultural insights can be highly educational and useful for travelers and culinary enthusiasts.
As OpenAI continues to develop and refine GPT-4o, these capabilities are expected to evolve, offering even more impressive and practical applications. This exploration using a ChatGPT Plus account provides just a glimpse into the future of AI-powered interactions, hinting at the remarkable potential of GPT-4o in various domains.
.jpeg)