Posted by Jim Craner on November 7, 2023

Have you ever worked with any "computer vision" projects?  Perhaps you used "Optical Character Recognition" ("OCR") software to scan archival documents and extract the text within them?  Your smartphone likely has computer vision technology -- maybe you used a "face swap" app or scanned a product to quickly order a replacement?

Here is how ChatGPT describes "computer vision" in non-technical terms:

Computer vision is a field of artificial intelligence that trains computers to interpret and understand the visual world. Using digital images from cameras and videos and deep learning models, computers can accurately identify and classify objects — and then react to what they “see.”

Recently, ChatGPT was given vision capabilities -- users of the application can upload images and have the GPT model "look" at them and analyze them.  It's pretty exciting - I was on a hike when I saw the announcement and instantly tested it on a wildflower by the trail:

screenshot of a mobile interface of ChatGPT. The user has submitted a photo of a purple wildflower and asked "what is it?" The chatbot has responded: "The plant with the purple flowers appears to be a species of Liatris, commonly known as "blazing star" or "gayfeather". It's a perennial plant native to North America and is often used in gardens for its tall spikes of purple flowers that attract butterflies."

 

Features of GPT-4-Vision

GPT-4's new vision capability seems to be able to handle the following tasks:

  • identify and classify objects and images (cat vs. dog, beach vs. city street)
  • text recognition -- but NOT barcode recognition!
  • scene description -- such as describing what appears in a photograph -- but be careful using this for accessibility "alt" text without human review!

Combining the new vision capability plus GPT-4's exceptional code handling skills can let you easily turn a napkin sketch into a web interface!

Before

napkin sketch of a website form reading "ILS Search" with fields for Title, Author, Subject, ISBN, and a submit button at the bottom.

 

Submit to GPT-4 with request

The napkin sketch from above, submitted into a ChatGPT window with the request "Give me the HTML to create and style this form."  Below, you can see the AI's response including a code window of HTML code.

How did it do?

screenshot of a webform titled ILS Search with fields for Title, Author, Subject, and ISBN. There is a submit button.


Can GPT-4-Vision accurately recognize book titles and authors? 

Since GPT-4-V can't (currently) decode bar codes, I took some images of various books laying around the house and then submitted them to the GPT-4-V API along with a request to identify the most prominent book and author.  Does your library have a collection of book images that we can use for testing? Let us know!

What other uses can you think of for a general computer vision scanner in the library?

 

Data results below.

Quality = image size:

  • raw = 4000x3000 pixels
  • medium = 2000x1500
  • small = 1000x750
  • verysmall = 400x300

Detail level = the OpenAI API quality level.

Model: gpt-4-vision-preview, 07 November 2023

Prompt: 

Identify the primary book depicted in this image. If more than one book is depicted, identify the book that is most prominent.

 

input_filename detail_level API_response title_match author_match Quality
https://jim.galecia.com/cv/1106_a_medium.jpg low The primary book depicted in this image is "A Place Called Rainwater" by Dorothy Garlock. Full Full medium
https://jim.galecia.com/cv/1106_a_medium.jpg high The primary book depicted in the image is "A Place Called Rainwater" by Dorothy Garlock. Full Full medium
https://jim.galecia.com/cv/1106_a_raw.jpg low The primary book depicted in the image is "A Place Called Rainwater" by Dorothy Garlock. Full Full raw
https://jim.galecia.com/cv/1106_a_raw.jpg high The primary book depicted in this image is "A Place Called Rainwater" by Dorothy Garlock. It is the most prominent book in the image. Full Full raw
https://jim.galecia.com/cv/1106_a_small.jpg low The primary book depicted in this image is "A Place Called Rainwater" by Dorothy Garlock. Full Full small
https://jim.galecia.com/cv/1106_a_small.jpg high The primary book depicted in the image is "A Place Called Rainwater" by Dorothy Garlock. Full Full small
https://jim.galecia.com/cv/1106_a_verysmall.jpg low The primary book depicted in this image is "A Place Called Rainwater" by Dorothy Garlock. Full Full verysmall
https://jim.galecia.com/cv/1106_a_verysmall.jpg high The primary book depicted in this image is "A Place Called Rainwater" by Dorothy Garlock. Full Full verysmall
https://jim.galecia.com/cv/1106_b_medium.jpg low The primary book depicted in this image is "The Giving Tree" by Shel Silverstein. Full Full medium
https://jim.galecia.com/cv/1106_b_medium.jpg high The primary book depicted in the image is "The Giving Tree" by Shel Silverstein. Full Full medium
https://jim.galecia.com/cv/1106_b_raw.jpg low The primary book depicted in this image is "The Giving Tree" by Shel Silverstein. Full Full raw
https://jim.galecia.com/cv/1106_b_raw.jpg high The primary book depicted in this image is "The Giving Tree" by Shel Silverstein. Full Full raw
https://jim.galecia.com/cv/1106_b_small.jpg low The primary book depicted in this image is "The Giving Tree" by Shel Silverstein. Full Full small
https://jim.galecia.com/cv/1106_b_small.jpg high The primary book depicted in this image is "The Giving Tree" by Shel Silverstein. Full Full small
https://jim.galecia.com/cv/1106_b_verysmall.jpg low The primary book depicted in this image is "The Giving Tree" by Shel Silverstein. Full Full verysmall
https://jim.galecia.com/cv/1106_b_verysmall.jpg high The primary book depicted in this image appears to be "The Giving Tree" by Shel Silverstein. The cover shows a characteristic illustration from the book with a boy and a tree, which is consistent with the known design of this popular children's book. Full Full verysmall
https://jim.galecia.com/cv/1106_c_medium.jpg low The primary book depicted in this image is "Charlotte's Web" by E.B. White, with pictures by Garth Williams. It is the most prominent book in the image, positioned in the center and facing the camera. Full Full medium
https://jim.galecia.com/cv/1106_c_medium.jpg high The primary book depicted in this image is "Charlotte's Web" by E.B. White. It is the most prominent book in the picture, and you can see the cover featuring illustrations of the main characters. This classic children's novel is well-known and loved by many readers around the world. Full Full medium
https://jim.galecia.com/cv/1106_c_raw.jpg low The primary book depicted in this image is "Charlotte's Web" by E.B. White, with pictures by Garth Williams. It is the most prominent book in the photo. Full Full raw
https://jim.galecia.com/cv/1106_c_raw.jpg high The primary book depicted in this image is "Charlotte's Web" by E.B. White, with pictures by Garth Williams. This is a classic children's novel known for its heartwarming story about the friendship between a pig named Wilbur and a spider named Charlotte. Full Full raw
https://jim.galecia.com/cv/1106_c_small.jpg low The primary book depicted in this image is "Charlotte's Web" by E.B. White, illustrated by Garth Williams. It is the most prominent book in the photograph. Full Full small
https://jim.galecia.com/cv/1106_c_small.jpg high The primary book depicted in this image is "Charlotte's Web" by E.B. White. It's the book that is most prominently displayed and oriented such that its cover and title can be easily read. Full Full small
https://jim.galecia.com/cv/1106_c_verysmall.jpg low The primary book depicted in this image is "Charlotte's Web" by E.B. White. It is the most prominent book, placed on top of the other books, and its cover is fully visible. Full Full verysmall
https://jim.galecia.com/cv/1106_c_verysmall.jpg high The primary book depicted in this image is "Charlotte's Web" by E.B. White. It is the most prominent book with the cover facing upwards and visible to the camera. Full Full verysmall
https://jim.galecia.com/cv/1106_d_medium.jpg low The primary book depicted in this image, which is most prominent, is "The Vision" by Heather Graham. The cover features a reddish hue with the image of what appears to be a waterfront or shoreline. Full Full medium
https://jim.galecia.com/cv/1106_d_medium.jpg high The most prominent book depicted in this image is "The Vision" by Heather Graham. It is visible in the foreground and its front cover is facing towards the viewer. Full Full medium
https://jim.galecia.com/cv/1106_d_raw.jpg low The primary book depicted in this image is "The Vision" by Heather Graham. It appears larger and is positioned in the foreground, making it the most prominent among the others. Full Full raw
https://jim.galecia.com/cv/1106_d_raw.jpg high The primary book depicted in this image is "The Vision" by Heather Graham. It is the most prominent book as it is positioned in the foreground and facing the camera most directly. Full Full raw
https://jim.galecia.com/cv/1106_d_small.jpg low The primary book depicted in this image, which is the most prominent, appears to be "The Vision" by Heather Graham. Full Full small
https://jim.galecia.com/cv/1106_d_small.jpg high The primary book depicted in this image is "The Vision" by Heather Graham. It is the most prominent book in the photo, situated in front of the other books. Full Full small
https://jim.galecia.com/cv/1106_d_verysmall.jpg low The most prominent book in the image is "The Vision" by Heather Graham. It's the book in the front at the bottom of the stack with a blue and orange cover. Full Full verysmall
https://jim.galecia.com/cv/1106_d_verysmall.jpg high The primary book depicted in this image is "The Vision" by Heather Graham. It is the most prominent because it is facing directly towards the camera, and the title and author's name are clearly visible. There are other books in the background, but they are less discernible. Full Full verysmall
https://jim.galecia.com/cv/1106_e_medium.jpg low The most prominent book in the image is "Murder" by Jed Rubenfeld. It is the largest book in the foreground with an orange and black cover. Partial Full medium
https://jim.galecia.com/cv/1106_e_medium.jpg high The primary book depicted in this image and most prominent due to its placement is "Murder on Astor Place" by Victoria Thompson. It is visible at the top of a stack of books. Wrong Full medium
https://jim.galecia.com/cv/1106_e_raw.jpg low The most prominent book in the image is "Murder on the Interpretation of Murder" by Jed Rubenfeld. Partial Full raw
https://jim.galecia.com/cv/1106_e_raw.jpg high The book titled "Murder on Astor Place" by Victoria Thompson is the most prominent book depicted in this image. It appears to be on top of a stack of books and is the most visible. Wrong Full raw
https://jim.galecia.com/cv/1106_e_small.jpg low The most prominent book in the image is "Murder on Astor Place" by Victoria Thompson. It's the one at the center top of the image with a blue spine and cover, featuring a depiction of a street scene with buildings, people, and a horse-drawn carriage. Wrong Full small
https://jim.galecia.com/cv/1106_e_small.jpg high The primary book depicted in this image, and the most prominent one, is titled "Murder," written by Jed Rubenfeld. Partial Full small
https://jim.galecia.com/cv/1106_e_verysmall.jpg low The most prominent book in the image is "MURDERER" by Ed McBain. This book is clearly visible and located front and center in the photograph. Other items and books are present in the image as well, but this book is the primary focus. Wrong Wrong verysmall
https://jim.galecia.com/cv/1106_e_verysmall.jpg high The most prominent book in the image is titled "MURDERER" by Ed McBain. It appears to be the largest book in the foreground and is placed on top of other items, making it the central focus of the photograph. Wrong Wrong verysmall
https://jim.galecia.com/cv/1106_f_medium.jpg low The most prominent book in this image appears to be "The Simpsons: A Complete Guide to Our Favorite Family." It is notable for its colorful cover featuring characters from the popular animated television show, "The Simpsons." Full n/a medium
https://jim.galecia.com/cv/1106_f_medium.jpg high The primary book depicted in this image, and the most prominent one, is "The Simpsons: A Complete Guide to Our Favorite Family." It features characters from the animated television series "The Simpsons" on the cover. Full n/a medium
https://jim.galecia.com/cv/1106_f_raw.jpg low The most prominent book in the image is "The Simpsons: A Complete Guide to Our Favorite Family." It's a brightly colored book located towards the right side of the photo, featuring characters from the animated television show "The Simpsons." Full n/a raw
https://jim.galecia.com/cv/1106_f_raw.jpg high The most prominent book in this image appears to be "The Simpsons" book on top of the pile. This book features characters from the animated TV series "The Simpsons." Partial n/a raw
https://jim.galecia.com/cv/1106_f_small.jpg low The most prominent book in the image is "The Simpsons: A Complete Guide to Our Favorite Family." It's partially covered, but the title and the characters from "The Simpsons" are clearly recognizable. Full n/a small
https://jim.galecia.com/cv/1106_f_small.jpg high The most prominent book in this image is titled "The Simpsons: A Complete Guide to Our Favorite Family." This book appears to be a guide to the animated TV show "The Simpsons," detailing various aspects of the show and its characters. The book is situated on top of other items, making it the primary focus of the image. Full n/a small
https://jim.galecia.com/cv/1106_f_verysmall.jpg low The most prominent book in the image is "The Simpsons," which appears to be a book related to the popular animated television series of the same name. Partial n/a verysmall
https://jim.galecia.com/cv/1106_f_verysmall.jpg high The most prominent book in the image appears to be one related to "The Simpsons," given its large size and central position within the frame. Other items and books are around it, but this book has the largest visible cover. Partial n/a verysmall
https://jim.galecia.com/cv/1106_g_medium.jpg low The image is blurry, and it's challenging to make out distinct titles accurately. However, to the best of my ability and given the quality of the image, it's not possible to confidently identify a primary book depicted. The titles and authors on the spines are not legible enough to specify which book is most prominent. Wrong Wrong medium
https://jim.galecia.com/cv/1106_g_medium.jpg high I'm sorry, I cannot assist with that request. Wrong Wrong medium
https://jim.galecia.com/cv/1106_g_raw.jpg low The image is slightly blurry, but the book that appears to be most prominent in the center is titled "Iced" by Carol Higgins Clark. Full Full raw
https://jim.galecia.com/cv/1106_g_raw.jpg high The image is a bit blurry, but the book that appears most prominently is "Iced" by Carol Higgins Clark. It is the second book from the left and is a mystery novel. Full Full raw
https://jim.galecia.com/cv/1106_g_small.jpg low Unfortunately, the image is quite blurry, and it's difficult to identify any of the book titles with certainty. However, I can tell you that there's a collection of hardcover and paperback books of different genres and authors. If you can provide a clearer image or more information, I might be able to help identify the books. Wrong Wrong small
https://jim.galecia.com/cv/1106_g_small.jpg high The photo is blurred, which makes it difficult to identify any single book clearly. From what is visible, there are several books on a shelf, but the titles and authors are not legible due to the lack of focus in the image. To identify the primary or most prominent book, a clearer and more focused image would be needed. Wrong Wrong small
https://jim.galecia.com/cv/1106_g_verysmall.jpg low The image is a bit blurry and the angle makes it challenging to read the spines of the books clearly. However, the most prominent book due to its central positioning and the readability of the text appears to be "GED High School Equivalency Exam." Wrong Wrong verysmall
https://jim.galecia.com/cv/1106_g_verysmall.jpg high The image shows a shelf with several books standing upright. The most prominent book in the image, as it has the largest spine visible and is centered, appears to be "The GED Math Problem Solver." Wrong Wrong verysmall
https://jim.galecia.com/cv/1106_h_medium.jpg low The book that is most prominent in this image is "The Cat Who Tailed a Thief" by Lilian Jackson Braun. It's part of "The Cat Who..." series. Wrong Full medium
https://jim.galecia.com/cv/1106_h_medium.jpg high The most prominent book in this image is "The Cat Who Tailed a Thief" by Lilian Jackson Braun. You can tell it is the most prominent due to its central position and the fact that its spine is fully visible. Wrong Full medium
https://jim.galecia.com/cv/1106_h_raw.jpg low The most prominent book in this image is "The Cat Who Played Post Office" from the series "The Cat Who..." by Lilian Jackson Braun. Full Full raw
https://jim.galecia.com/cv/1106_h_raw.jpg high The primary book depicted in the image, which is most prominent, is titled "The Cat Who Played Post Office." It's part of "The Cat Who..." series by Lilian Jackson Braun, a series of mystery novels centering around a journalist named James Qwilleran and his Siamese cats, Koko and Yum Yum. The series is known for its titles that start with "The Cat Who..." and typically involve some sort of mystery or crime that the main character solves with the help of his intelligent cats. Full Full raw
https://jim.galecia.com/cv/1106_h_small.jpg low The primary book depicted in this image, and the one that is most prominent, is "The Cat Who Played Post Office." Full Full small
https://jim.galecia.com/cv/1106_h_small.jpg high The book that is most prominent in this image is "The Cat Who Played Post Office" by Lilian Jackson Braun, part of "The Cat Who..." series. It is immediately recognizable by the larger spine text and central placement in the image. Full Full small
https://jim.galecia.com/cv/1106_h_verysmall.jpg low The primary book depicted in this image is "The Cat Who Played Brahms" by Lilian Jackson Braun. It is the most prominent book in the image, with its spine fully visible and facing the viewer. Full Full verysmall
https://jim.galecia.com/cv/1106_h_verysmall.jpg high The most prominent book in this image is "The Cat Who Played Brahms" by Lilian Jackson Braun, which is part of "The Cat Who..." series of mystery novels. Full Full verysmall