Hide
Hide

This page is not current guidance

For Up to Date guidance please refer to

Help and Guidance


 

OCR, Gazetteers and Tables

hide
Hide

Optical Character Recognition

Optical Character Recognition (OCR), by which editable text can be created from printed material by a process involving scanning to create a digital image followed by appropriate post-processing) is a valuable technique, of particular use in the creating of text from out-of-copyright material such as Gazetteers, etc. It can convert scanned books and documents into editable text, get editable text from PDFs created via scanning, or even get text from screenshots and images containing text.

Historically, this was somewhat of a specialised, rather esoteric, activity. However, advanced systems capable of producing a high degree of recognition accuracy for most fonts are now common.

You may well already have a scanner, for producing digital images of diagrams, maps, photographs, slides and negatives; and this may have come including software for OCR. But even with a very basic scanner, you may already unknowingly have OCR capability. If using Windows, and you have a reasonably modern (Vista onwards) PC with MS Word or One-Note, you are OCR-enabled, and GENUKI’s recent move to Drupal makes this even more useful.

Here’s how to do it:

  • Scan the material of interest to make a digital image, then make it available on the screen.
  • Open the Windows Snipping Tool *, then select the desired portion of the image. Copy the selection.
    (Recent versions of MS Word (2010 onwards) and One-Note (2007 onwards) offer a similar "Screen Clipping" tool, from their respective "Insert" tabs.)
  • Open One-Note, then paste the selection. Right-click the image, then select “Copy text from image”.
  • Paste the copied text into a text editor. Notepad or similar will do, but Word is better as this will also capture any text formatting information. Repeat as necessary, then save the text document.
  • You can now use this material to facilitate generation of new GENUKI pages.

* To capture a snip or screenshot

Open Snipping Tool by clicking the Windows Start button, clicking All Programs, clicking Accessories, and then clicking Snipping Tool. (Or clicking the Windows Start button, then search for “Snipping Tool”.)

Click the arrow next to the New button, select a snip type from the menu, and then use your mouse or tablet pen to capture a snip.

You can use Snipping Tool to capture a screen shot, or snip, of any object on your screen, and then annotate, save, or share the image. Simply use a mouse or tablet pen to capture any of the following types of snips:

  • Free-form Snip. Draw an irregular line, such as a circle or a triangle, around an object.
  • Rectangular Snip. Draw a precise line by dragging the cursor around an object to form a rectangle.
  • Window Snip. Select a window, such as a browser window or dialog box, that you want to capture.
  • Full-screen Snip. Capture the entire screen when you select this type of snip.

After you capture a snip, it's automatically copied to the mark-up window, where you can annotate, save, or share the snip.

Copyright

Be aware of copyright and its constraints - see the GENUKI Guidance on Copyright.