You are here
Testing GPT-4's New "Computer Vision" Capability Inside - and Outside - of the Library
Posted by Jim Craner on November 7, 2023
Have you ever worked with any "computer vision" projects? Perhaps you used "Optical Character Recognition" ("OCR") software to scan archival documents and extract the text within them? Your smartphone likely has computer vision technology -- maybe you used a "face swap" app or scanned a product to quickly order a replacement?
Here is how ChatGPT describes "computer vision" in non-technical terms:
Computer vision is a field of artificial intelligence that trains computers to interpret and understand the visual world. Using digital images from cameras and videos and deep learning models, computers can accurately identify and classify objects — and then react to what they “see.”
Recently, ChatGPT was given vision capabilities -- users of the application can upload images and have the GPT model "look" at them and analyze them. It's pretty exciting - I was on a hike when I saw the announcement and instantly tested it on a wildflower by the trail:
Features of GPT-4-Vision
GPT-4's new vision capability seems to be able to handle the following tasks:
- identify and classify objects and images (cat vs. dog, beach vs. city street)
- text recognition -- but NOT barcode recognition!
- scene description -- such as describing what appears in a photograph -- but be careful using this for accessibility "alt" text without human review!
Combining the new vision capability plus GPT-4's exceptional code handling skills can let you easily turn a napkin sketch into a web interface!
Before
Submit to GPT-4 with request
How did it do?
Can GPT-4-Vision accurately recognize book titles and authors?
Since GPT-4-V can't (currently) decode bar codes, I took some images of various books laying around the house and then submitted them to the GPT-4-V API along with a request to identify the most prominent book and author. Does your library have a collection of book images that we can use for testing? Let us know!
What other uses can you think of for a general computer vision scanner in the library?
Data results below.
Quality = image size:
- raw = 4000x3000 pixels
- medium = 2000x1500
- small = 1000x750
- verysmall = 400x300
Detail level = the OpenAI API quality level.
Model: gpt-4-vision-preview, 07 November 2023
Prompt:
Identify the primary book depicted in this image. If more than one book is depicted, identify the book that is most prominent.
input_filename | detail_level | API_response | title_match | author_match | Quality |
---|---|---|---|---|---|
https://jim.galecia.com/cv/1106_a_medium.jpg | low | The primary book depicted in this image is "A Place Called Rainwater" by Dorothy Garlock. | Full | Full | medium |
https://jim.galecia.com/cv/1106_a_medium.jpg | high | The primary book depicted in the image is "A Place Called Rainwater" by Dorothy Garlock. | Full | Full | medium |
https://jim.galecia.com/cv/1106_a_raw.jpg | low | The primary book depicted in the image is "A Place Called Rainwater" by Dorothy Garlock. | Full | Full | raw |
https://jim.galecia.com/cv/1106_a_raw.jpg | high | The primary book depicted in this image is "A Place Called Rainwater" by Dorothy Garlock. It is the most prominent book in the image. | Full | Full | raw |
https://jim.galecia.com/cv/1106_a_small.jpg | low | The primary book depicted in this image is "A Place Called Rainwater" by Dorothy Garlock. | Full | Full | small |
https://jim.galecia.com/cv/1106_a_small.jpg | high | The primary book depicted in the image is "A Place Called Rainwater" by Dorothy Garlock. | Full | Full | small |
https://jim.galecia.com/cv/1106_a_verysmall.jpg | low | The primary book depicted in this image is "A Place Called Rainwater" by Dorothy Garlock. | Full | Full | verysmall |
https://jim.galecia.com/cv/1106_a_verysmall.jpg | high | The primary book depicted in this image is "A Place Called Rainwater" by Dorothy Garlock. | Full | Full | verysmall |
https://jim.galecia.com/cv/1106_b_medium.jpg | low | The primary book depicted in this image is "The Giving Tree" by Shel Silverstein. | Full | Full | medium |
https://jim.galecia.com/cv/1106_b_medium.jpg | high | The primary book depicted in the image is "The Giving Tree" by Shel Silverstein. | Full | Full | medium |
https://jim.galecia.com/cv/1106_b_raw.jpg | low | The primary book depicted in this image is "The Giving Tree" by Shel Silverstein. | Full | Full | raw |
https://jim.galecia.com/cv/1106_b_raw.jpg | high | The primary book depicted in this image is "The Giving Tree" by Shel Silverstein. | Full | Full | raw |
https://jim.galecia.com/cv/1106_b_small.jpg | low | The primary book depicted in this image is "The Giving Tree" by Shel Silverstein. | Full | Full | small |
https://jim.galecia.com/cv/1106_b_small.jpg | high | The primary book depicted in this image is "The Giving Tree" by Shel Silverstein. | Full | Full | small |
https://jim.galecia.com/cv/1106_b_verysmall.jpg | low | The primary book depicted in this image is "The Giving Tree" by Shel Silverstein. | Full | Full | verysmall |
https://jim.galecia.com/cv/1106_b_verysmall.jpg | high | The primary book depicted in this image appears to be "The Giving Tree" by Shel Silverstein. The cover shows a characteristic illustration from the book with a boy and a tree, which is consistent with the known design of this popular children's book. | Full | Full | verysmall |
https://jim.galecia.com/cv/1106_c_medium.jpg | low | The primary book depicted in this image is "Charlotte's Web" by E.B. White, with pictures by Garth Williams. It is the most prominent book in the image, positioned in the center and facing the camera. | Full | Full | medium |
https://jim.galecia.com/cv/1106_c_medium.jpg | high | The primary book depicted in this image is "Charlotte's Web" by E.B. White. It is the most prominent book in the picture, and you can see the cover featuring illustrations of the main characters. This classic children's novel is well-known and loved by many readers around the world. | Full | Full | medium |
https://jim.galecia.com/cv/1106_c_raw.jpg | low | The primary book depicted in this image is "Charlotte's Web" by E.B. White, with pictures by Garth Williams. It is the most prominent book in the photo. | Full | Full | raw |
https://jim.galecia.com/cv/1106_c_raw.jpg | high | The primary book depicted in this image is "Charlotte's Web" by E.B. White, with pictures by Garth Williams. This is a classic children's novel known for its heartwarming story about the friendship between a pig named Wilbur and a spider named Charlotte. | Full | Full | raw |
https://jim.galecia.com/cv/1106_c_small.jpg | low | The primary book depicted in this image is "Charlotte's Web" by E.B. White, illustrated by Garth Williams. It is the most prominent book in the photograph. | Full | Full | small |
https://jim.galecia.com/cv/1106_c_small.jpg | high | The primary book depicted in this image is "Charlotte's Web" by E.B. White. It's the book that is most prominently displayed and oriented such that its cover and title can be easily read. | Full | Full | small |
https://jim.galecia.com/cv/1106_c_verysmall.jpg | low | The primary book depicted in this image is "Charlotte's Web" by E.B. White. It is the most prominent book, placed on top of the other books, and its cover is fully visible. | Full | Full | verysmall |
https://jim.galecia.com/cv/1106_c_verysmall.jpg | high | The primary book depicted in this image is "Charlotte's Web" by E.B. White. It is the most prominent book with the cover facing upwards and visible to the camera. | Full | Full | verysmall |
https://jim.galecia.com/cv/1106_d_medium.jpg | low | The primary book depicted in this image, which is most prominent, is "The Vision" by Heather Graham. The cover features a reddish hue with the image of what appears to be a waterfront or shoreline. | Full | Full | medium |
https://jim.galecia.com/cv/1106_d_medium.jpg | high | The most prominent book depicted in this image is "The Vision" by Heather Graham. It is visible in the foreground and its front cover is facing towards the viewer. | Full | Full | medium |
https://jim.galecia.com/cv/1106_d_raw.jpg | low | The primary book depicted in this image is "The Vision" by Heather Graham. It appears larger and is positioned in the foreground, making it the most prominent among the others. | Full | Full | raw |
https://jim.galecia.com/cv/1106_d_raw.jpg | high | The primary book depicted in this image is "The Vision" by Heather Graham. It is the most prominent book as it is positioned in the foreground and facing the camera most directly. | Full | Full | raw |
https://jim.galecia.com/cv/1106_d_small.jpg | low | The primary book depicted in this image, which is the most prominent, appears to be "The Vision" by Heather Graham. | Full | Full | small |
https://jim.galecia.com/cv/1106_d_small.jpg | high | The primary book depicted in this image is "The Vision" by Heather Graham. It is the most prominent book in the photo, situated in front of the other books. | Full | Full | small |
https://jim.galecia.com/cv/1106_d_verysmall.jpg | low | The most prominent book in the image is "The Vision" by Heather Graham. It's the book in the front at the bottom of the stack with a blue and orange cover. | Full | Full | verysmall |
https://jim.galecia.com/cv/1106_d_verysmall.jpg | high | The primary book depicted in this image is "The Vision" by Heather Graham. It is the most prominent because it is facing directly towards the camera, and the title and author's name are clearly visible. There are other books in the background, but they are less discernible. | Full | Full | verysmall |
https://jim.galecia.com/cv/1106_e_medium.jpg | low | The most prominent book in the image is "Murder" by Jed Rubenfeld. It is the largest book in the foreground with an orange and black cover. | Partial | Full | medium |
https://jim.galecia.com/cv/1106_e_medium.jpg | high | The primary book depicted in this image and most prominent due to its placement is "Murder on Astor Place" by Victoria Thompson. It is visible at the top of a stack of books. | Wrong | Full | medium |
https://jim.galecia.com/cv/1106_e_raw.jpg | low | The most prominent book in the image is "Murder on the Interpretation of Murder" by Jed Rubenfeld. | Partial | Full | raw |
https://jim.galecia.com/cv/1106_e_raw.jpg | high | The book titled "Murder on Astor Place" by Victoria Thompson is the most prominent book depicted in this image. It appears to be on top of a stack of books and is the most visible. | Wrong | Full | raw |
https://jim.galecia.com/cv/1106_e_small.jpg | low | The most prominent book in the image is "Murder on Astor Place" by Victoria Thompson. It's the one at the center top of the image with a blue spine and cover, featuring a depiction of a street scene with buildings, people, and a horse-drawn carriage. | Wrong | Full | small |
https://jim.galecia.com/cv/1106_e_small.jpg | high | The primary book depicted in this image, and the most prominent one, is titled "Murder," written by Jed Rubenfeld. | Partial | Full | small |
https://jim.galecia.com/cv/1106_e_verysmall.jpg | low | The most prominent book in the image is "MURDERER" by Ed McBain. This book is clearly visible and located front and center in the photograph. Other items and books are present in the image as well, but this book is the primary focus. | Wrong | Wrong | verysmall |
https://jim.galecia.com/cv/1106_e_verysmall.jpg | high | The most prominent book in the image is titled "MURDERER" by Ed McBain. It appears to be the largest book in the foreground and is placed on top of other items, making it the central focus of the photograph. | Wrong | Wrong | verysmall |
https://jim.galecia.com/cv/1106_f_medium.jpg | low | The most prominent book in this image appears to be "The Simpsons: A Complete Guide to Our Favorite Family." It is notable for its colorful cover featuring characters from the popular animated television show, "The Simpsons." | Full | n/a | medium |
https://jim.galecia.com/cv/1106_f_medium.jpg | high | The primary book depicted in this image, and the most prominent one, is "The Simpsons: A Complete Guide to Our Favorite Family." It features characters from the animated television series "The Simpsons" on the cover. | Full | n/a | medium |
https://jim.galecia.com/cv/1106_f_raw.jpg | low | The most prominent book in the image is "The Simpsons: A Complete Guide to Our Favorite Family." It's a brightly colored book located towards the right side of the photo, featuring characters from the animated television show "The Simpsons." | Full | n/a | raw |
https://jim.galecia.com/cv/1106_f_raw.jpg | high | The most prominent book in this image appears to be "The Simpsons" book on top of the pile. This book features characters from the animated TV series "The Simpsons." | Partial | n/a | raw |
https://jim.galecia.com/cv/1106_f_small.jpg | low | The most prominent book in the image is "The Simpsons: A Complete Guide to Our Favorite Family." It's partially covered, but the title and the characters from "The Simpsons" are clearly recognizable. | Full | n/a | small |
https://jim.galecia.com/cv/1106_f_small.jpg | high | The most prominent book in this image is titled "The Simpsons: A Complete Guide to Our Favorite Family." This book appears to be a guide to the animated TV show "The Simpsons," detailing various aspects of the show and its characters. The book is situated on top of other items, making it the primary focus of the image. | Full | n/a | small |
https://jim.galecia.com/cv/1106_f_verysmall.jpg | low | The most prominent book in the image is "The Simpsons," which appears to be a book related to the popular animated television series of the same name. | Partial | n/a | verysmall |
https://jim.galecia.com/cv/1106_f_verysmall.jpg | high | The most prominent book in the image appears to be one related to "The Simpsons," given its large size and central position within the frame. Other items and books are around it, but this book has the largest visible cover. | Partial | n/a | verysmall |
https://jim.galecia.com/cv/1106_g_medium.jpg | low | The image is blurry, and it's challenging to make out distinct titles accurately. However, to the best of my ability and given the quality of the image, it's not possible to confidently identify a primary book depicted. The titles and authors on the spines are not legible enough to specify which book is most prominent. | Wrong | Wrong | medium |
https://jim.galecia.com/cv/1106_g_medium.jpg | high | I'm sorry, I cannot assist with that request. | Wrong | Wrong | medium |
https://jim.galecia.com/cv/1106_g_raw.jpg | low | The image is slightly blurry, but the book that appears to be most prominent in the center is titled "Iced" by Carol Higgins Clark. | Full | Full | raw |
https://jim.galecia.com/cv/1106_g_raw.jpg | high | The image is a bit blurry, but the book that appears most prominently is "Iced" by Carol Higgins Clark. It is the second book from the left and is a mystery novel. | Full | Full | raw |
https://jim.galecia.com/cv/1106_g_small.jpg | low | Unfortunately, the image is quite blurry, and it's difficult to identify any of the book titles with certainty. However, I can tell you that there's a collection of hardcover and paperback books of different genres and authors. If you can provide a clearer image or more information, I might be able to help identify the books. | Wrong | Wrong | small |
https://jim.galecia.com/cv/1106_g_small.jpg | high | The photo is blurred, which makes it difficult to identify any single book clearly. From what is visible, there are several books on a shelf, but the titles and authors are not legible due to the lack of focus in the image. To identify the primary or most prominent book, a clearer and more focused image would be needed. | Wrong | Wrong | small |
https://jim.galecia.com/cv/1106_g_verysmall.jpg | low | The image is a bit blurry and the angle makes it challenging to read the spines of the books clearly. However, the most prominent book due to its central positioning and the readability of the text appears to be "GED High School Equivalency Exam." | Wrong | Wrong | verysmall |
https://jim.galecia.com/cv/1106_g_verysmall.jpg | high | The image shows a shelf with several books standing upright. The most prominent book in the image, as it has the largest spine visible and is centered, appears to be "The GED Math Problem Solver." | Wrong | Wrong | verysmall |
https://jim.galecia.com/cv/1106_h_medium.jpg | low | The book that is most prominent in this image is "The Cat Who Tailed a Thief" by Lilian Jackson Braun. It's part of "The Cat Who..." series. | Wrong | Full | medium |
https://jim.galecia.com/cv/1106_h_medium.jpg | high | The most prominent book in this image is "The Cat Who Tailed a Thief" by Lilian Jackson Braun. You can tell it is the most prominent due to its central position and the fact that its spine is fully visible. | Wrong | Full | medium |
https://jim.galecia.com/cv/1106_h_raw.jpg | low | The most prominent book in this image is "The Cat Who Played Post Office" from the series "The Cat Who..." by Lilian Jackson Braun. | Full | Full | raw |
https://jim.galecia.com/cv/1106_h_raw.jpg | high | The primary book depicted in the image, which is most prominent, is titled "The Cat Who Played Post Office." It's part of "The Cat Who..." series by Lilian Jackson Braun, a series of mystery novels centering around a journalist named James Qwilleran and his Siamese cats, Koko and Yum Yum. The series is known for its titles that start with "The Cat Who..." and typically involve some sort of mystery or crime that the main character solves with the help of his intelligent cats. | Full | Full | raw |
https://jim.galecia.com/cv/1106_h_small.jpg | low | The primary book depicted in this image, and the one that is most prominent, is "The Cat Who Played Post Office." | Full | Full | small |
https://jim.galecia.com/cv/1106_h_small.jpg | high | The book that is most prominent in this image is "The Cat Who Played Post Office" by Lilian Jackson Braun, part of "The Cat Who..." series. It is immediately recognizable by the larger spine text and central placement in the image. | Full | Full | small |
https://jim.galecia.com/cv/1106_h_verysmall.jpg | low | The primary book depicted in this image is "The Cat Who Played Brahms" by Lilian Jackson Braun. It is the most prominent book in the image, with its spine fully visible and facing the viewer. | Full | Full | verysmall |
https://jim.galecia.com/cv/1106_h_verysmall.jpg | high | The most prominent book in this image is "The Cat Who Played Brahms" by Lilian Jackson Braun, which is part of "The Cat Who..." series of mystery novels. | Full | Full | verysmall |