How To Convert Image/PDF To Text Using Optical Character Recognition

It is often difficult to type in, format, and redesign documents that are only available as scanned images or files. This is made easier using the Optical Character Recognition (OCR) technology, which converts your images or PDF files to editable documents.

If you are looking to convert an image file, a PDF, a handwritten document, or a scanned file, which is not editable using the native tools on Windows, you can use online services to automate the job for you. This saves you the time and hassle of manually rewriting the entire thing in a text editor.

Continue reading this article to learn how to convert an uneditable document into an editable file.

What is Optical Character Recognition (OCR)

Optical Character Recognition, also known as Optical Character Reader, or image-to-text converter, is a combination of hardware and software technology that scans a document and then assigns characters to match the ones available in the source document.

When OCR scans a document, the document is converted into machine language, using which the OCR can identify and assign characters to the scanned shapes from the document.

Nowadays, OCR technology is available in different forms. Some online resources convert an uploaded file into plain text or downloadable text files, while there are also various hardware available to purchase that scans hard copies of text and converts them into digital content.

An OCR pen that converts text on paper to digital text

How OCR Works

OCR performs a series of different tasks to convert data from one form to text. The steps below describe the workflow of the OCR technology:

  1. OCR starts by scanning the document and differentiates between the light and dark contrasts.
  2. The darker areas of the document are then associated with characters in the alphabet using one of the following 2 algorithms:
    • Pattern recognition: A scanned character, word, or block of text is compared to the existing text in the database in various languages and fonts to match the pattern.
    • Feature detection: A specific feature of the scanned character, word, or block of text is compared to the existing features listed in the database. For example, a feature of a specific character could be the number of angled lines, angles between the lines, etc.
  3. Once the characters and the words are matched, they are processed and converted into ASCII code. An ASCII code is an internationally recognized encoding standard, and a unique code is assigned to a specific character. The computer can then use this to perform any task.

In the case we are discussing, OCR uses the generated ASCII code to convert light and dark patterns into plain text so that it can be edited.

Let us now show you how you can convert an image or a PDF file to extract its text, and then use it how you please.

Online OCR Services

OnlineOCR.Net is a free, web-based OCR where you can upload your document as an image or PDF file, and then convert it into either a Word document (Doc/Docx file), plain text, or an Excel sheet (xlsx).

Follow the steps given below to convert your document into an editable file:

  1. Open using any web browser.
  2. Click Select file and then browse to the document that you want to convert and select it.
    select 3
  3. Now select the language for the file you uploaded from the drop-down menu. Note that this will also be the language for the output text, as both cannot be different.
  4. Now select the output format for the converted file from the drop-down menu. You can choose from Microsoft Word, plain text, and Microsoft Excel.
  5. When selected, click Convert.
    convert 1

It will take a moment for the tool to convert your document. When it does, you can download the output file by clicking on the link, or copy the plain text from the text field below.

download 6
Document converted using

Once downloaded, you will see that the tool has converted most of the text from the uploaded document into an editable one. Below is an example of a file that we converted.

Converted document (left) versus uploaded scanned document (right)

As you can see from the example above, most of the text has been converted. However, since the output file is not a hundred percent, we still need to double-check it for errors.

Furthermore, also maintains the formatting of the file when a JPG was converted to a DOCX file.

NewOCR.Com is another free web-based tool that can convert your scanned documents and images to digital text. However, unlike, does not maintain file formatting, but only delivers converted text.

The process to convert a file is pretty much the same:

  1. Open the website in any web browser.
  2. Click Choose file and select the file you want to convert.
    choose 3
  3. Now click Preview.
  4. Click OCR to begin the conversion process.
  5. Now scroll down and download the converted text from the Download drop-down menu, or simply copy the plain text in the text field below.
    download 2 1

You will now have your image-to-text. You can then also translate this text into any other language using the Google Translate tab available directly from the website.

Closing Thoughts

Using OCR technology which is now readily available to everyone, you can convert your scanned images and other documents to text and edit them as you please. You no longer need to rewrite lengthy documents from scratch.

There are also other OCR tools available online that we did not list here. This is because OnlineOCR and NewOCR are currently the best ones available that convert maximum data to text correctly. Some tools simply paste the uploaded document into a Word/Excel file, which is of no good to us.

Also see:

Leave a Reply

You have to agree to the comment policy.