Skip to content

Understanding of Optical Character Reader – Computer Vision

Post date:
Author:
Number of comments: no comments

Understanding of Optical Character Reader

It is not a secret that images contain text. Some are more important than others. Images might contain brand names, traffic signs, instructions, letters, invoices, bank statements, or even those notes taken on a whiteboard during a crucial meeting. To read all these texts from all these images, whatever the image type (JPEG or PNG), Azure has introduced the OCR service. You can read a piece of text from an image using OCR. This technique extracts any and all the text that it detects in an image. It can be used by traffic management agencies to read out traffic signs. On other occasions, it can also be used to scan out important information from papers such as invoices, letters, or bills.

The OCR essentially depends on an API for its functioning.

This API works in a way that gives immediate results and can read the text in many different languages.

When you use the OCR API to process an image, it returns a hierarchy of information that consists of

  • Regions in the image that contain text
  • Lines of text in each region
  • Words in each line of text

The OCR API also gives back bounding box coordinates for each of these elements. These coordinates define a rectangle that shows where in the image the region, line, or word is.

And this is how small pieces of text are extracted from an image using OCR.

TipTo gain a better understanding of Microsoft Azure’s AI offerings, I recommend reading through each reference link in the book’s “References” section. These links will take you to the module’s Microsoft Learn training section.

Practical Labs

This lab will introduce you to the Microsoft Azure Computer Vision API. We have two options when working with the Computer Vision API service:

  • Automate image analysis and description
  • Automate text extraction from images

Computer Vision API – Text Extraction

With the help of code from Microsoft, this lab will show how to set up the Computer Vision API service. In this lab, you will

  • Provision a Computer Vision resource
  • Connect a C# console application to the previously created Computer Vision service instance

NoteAs writing code to create a client application is out of the context of the exam, which is just testing your knowledge in the context of AI services offered by Azure, for the lab purpose, you will use the code given by Microsoft. The link to download or reference the code is given in the second section of Practical Labs, “Connect a Console App to Computer Vision Resource.”

Leave a Reply

Your email address will not be published. Required fields are marked *