Tuesday, July 29, 2008

ABBYY FineReader Professional 9.0

by Edward Mendelson

Abbyy FineReader Professional 9.0 ($399.99, direct) is a relative newcomer in the world of optical character recognition (OCR). In some significant ways, it has an edge over its long-established competitor OmniPage Professional, though in other ways, OmniPage remains the leader. The one you'll prefer depends on the way you work.
SLIDESHOW (6)
Slideshow | All Shots

I use OCR mostly to take scanned copies of old books and fuzzy Xerox copies of old newspaper articles and turn them into editable text, and I spend a lot of time making corrections and changes to the OCR output inside my OCR software. For that purpose, Abbyy FineReader is the almost unquestionable first choice. Corporate customers tends to use OCR software to cram stacks of paper documents into digital storage, without taking time to make sure that the software didn't misread a comma as a period. For those customers, who are more concerned with automation, FineReader gets the job done, but OmniPage does it more efficiently and flexibly. If you're trying to decide which high-end OCR product to choose, read on and see whether your needs are closer to mine or to those of a corporate IT manager.

Unlike OmniPage, with its confusing start-up options, FineReader makes a terrific first impression. I found the interface almost ideal in its combination of straightforward clarity for basic tasks and clear explanations of complex tasks. I began by choosing from a set of built-in QuickTasks that automatically perform operations, such as scanning from a document to Microsoft Word, Excel, or PDF, or converting a PDF file to an editable Word file. I first chose to convert a scanned PDF file to Word, and, within seconds, Word popped open with a moderately accurate representation of the fuzzily photographed text from a 1930s newspaper. OmniPage, by comparison, was able to perform a comparable feat only when I changed an obscure setting deep in its Options dialog so that it extracted text from an image in the PDF file instead of embedding the image itself into the Word file—something that FineReader was smart enough to do without being told.

FineReader proved generally more informative than OmniPage about its operations. For example, when the programs analyzed a scanned or imported image of a page to decide which parts were text and which pictures, both did an equally good job. But FineReader numbered the text boxes so that I could see at a glance whether it had got the sequence of text regions right, while OmniPage made me push a toolbar button before it would display the same numbers. FineReader seems to have been designed from the start for today's fast computers, whereas OmniPage is weighed down by design decisions that made more sense when computers were slower and programs didn't take time to display some information unless the user insisted on seeing it.

In this test of their automated features, I was struck by the fact that FineReader and OmniPage made roughly the same number of mistakes in reading the scanned newspaper text but were tripped up by different words. Neither one was notably superior to the other. OCR is an inexact science, and every program produces slightly different results. Yet when I tested the two programs' manual proofreading and error-correcting features, I found that making corrections was far easier with FineReader than with OmniPage.

Here's how I used the manual correction features: FineReader's left-hand task pane starts out with two big buttons labeled Scan and Open. I chose Scan, and the dialog I got—which showed me all the options most useful in scanning for OCR—was much easier to manage than the corresponding OmniPage menu. After scanning a page, FineReader marked all the regions of the page it thought it could read. I tabbed through the regions, removing the regions I didn't need, and clicked the large "Read Document" button to start OCR.

After FineReader performed its OCR, I started the spell-checker, and here's where FineReader proved its worth. For one thing, I liked the spell-checking dialog's small window showing the text I was checking. And I especially liked that FineReader also displayed three other panes. One of these three panes displayed a reduced view of the whole page, with a dotted rectangle showing me the region I was checking. Another pane showed an enlarged view of the area around the text I was proofreading, with the current text highlighted so I could see it clearly in the context of adjacent text. The third pane showed the editable text the program had already extracted from the document through OCR. If I saw a mistake the program hadn't flagged, or if I wanted to correct a lot of errors at once, I could simply switch to the panel with the editable text, and then switch back to the proofreading window. Or, if the proofreading window highlighted a doubtful word and I also noticed other errors, I could move the proofreading window as much as I liked and make multiple changes. OmniPage's far more awkward interface doesn't offer conveniences of this sort.

One other advantage of FineReader's proofreading pane was that it suggested a more useful list of possible alternative readings than OmniPage did, and it didn't clutter the list of alternatives with useless numbers like the ones in the OmniPage screen. Proofreading an OCR document is never fun because it requires a lot of close attention to tiny details, but FineReader made the experience far less frustrating than OmniPage does.

I was also grateful for FineReader's intelligently designed interface, which puts as many options as possible on a large-scale toolbar that resembles the Ribbon in Microsoft Office 2007. In OmniPage, many options were hidden in drop-down menus that made me play hide-and-seek before I found the one I wanted. In FineReader, large buttons displayed the options I needed most, and small buttons displayed the ones used less often.

One other convenience in FineReader's text-editing pane is its use of word-processing-like menus to apply quick formatting to text. I could easily select all or part of a document and apply fonts and point sizes, or choose a set of formatting attributes from a Style menu and apply the same style to other parts of the document. Like OmniPage, FineReader exports Word documents with more stylistic information that I want, but I was able to cut down the clutter of formatting details in FineReader's own editor before dealing with Word's more complicated style and format menus.

Abbyy's app includes automation features similar to, but not nearly as powerful as, those in OmniPage. I found I could use any of Abbyy's built-in automated operations to convert scanned or image files into documents or PDFs, and I could build my own sequences. But I didn't have the flexibility that OmniPage Professional gave me to acquire files from FTP sites or "watched folders"—although I could choose an option that automatically e-mails the output. Similar features are available in FineReader's $599.99 Corporate Edition, but I didn't test those. (FineReader's Corporate edition costs $100 more than OmniPage's Professional Edition, but FineReader's Professional Edition costs $100 less than OmniPage's.)
SLIDESHOW (6)
Slideshow | All Shots

I've been emphasizing all the ways that FineReader does things right, but it's important to remember that OmniPage does some things better than FineReader does. It's a matter not just of the deeper feature-set in OmniPage's automated operations, but also of OmniPage's superior skill at making sense of the layout in complicated pages from magazines and design-heavy press-releases—and at reproducing those layouts in its output. This feature doesn't factor into the kind of use I put an OCR program to (I want to extract just the text, not the layout, from scanned documents), but it obviously matters to many potential users, especially those in the world of graphic design.

In the end, both products perform text-extracting OCR more or less equally well, although it's impossible to predict, with any specific document, whether one will produce slightly better results than the other. FineReader is miles ahead on interface design and bug-free performance—two areas in which OmniPage has a lot of catching up to do. OmniPage is noticeably, but not spectacularly, better in reproducing complex layouts. What matters most to me in OCR is text extraction and trouble-free operation, and that means FineReader is the program I'll continue to use. But if you know your needs are different from mine, you owe it to yourself to try both.

No comments: