The RecognizeText Method is available as an add-on to the LEADTOOLS Document and Medical Imaging toolkits.
- callback
- Optional callback to show operation progress.
Visual Basic (Declaration) | |
---|---|
Function RecognizeText( _ ByVal callback As OcrProgressCallback _ ) As String |
Visual Basic (Usage) | Copy Code |
---|---|
Dim instance As IOcrPage Dim callback As OcrProgressCallback Dim value As String value = instance.RecognizeText(callback) |
C# | |
---|---|
string RecognizeText( OcrProgressCallback callback ) |
C++/CLI | |
---|---|
String^ RecognizeText( OcrProgressCallback^ callback ) |
Parameters
- callback
- Optional callback to show operation progress.
Return Value
A System.String containing the recognized characters found (or an empty string if zones on the page contains no recognition data).Before calling this method call AutoPreprocess Method to perform automatic pre-processing to improve image quality.
Use this method to get the document result in a simple System.String object. Getting the result as text is helpful in situations when adding zones manually for form processing. For example, suppose the form you are processing has two areas of interests, a name field at coordinates 100, 100, 400, 120 and a social security number at coordinates 100, 200, 400, 220. You can structure your application as follows:
- Create a new IOcrDocument object D:\LEAD15\Help\DotNet\Projects\Leadtools.Forms.Ocr\Exceptions\OcrException.cs
- Add the page to the OCR document using IOcrDocument.Pages
- Add the name zone manually:
OcrZone nameZone = new OcrZone(); nameZone.ZoneType = OcrZoneType.Text; nameZone.Bounds = new LogicalRectangle(100, 100, 400, 120); ocrPage.Zones.Add(nameZone);
- Recognize the page and get the value of the name field (only this one zone will recognized):
string name = ocrPage.RecognizeText(null);
- Remove the name zone from the page:
ocrPage.Zones.Clear();
- Repeat the steps from (2) above to get the social security field.
If this page is not a black/white one (i.e. it contains a gray-scale or a 24-bit color image), then an implicit secondary image conversion step will be performed automatically to convert the image to a B/W one.
RecognizeText utilizes the zone information to activate the appropriate recognition module on every zone Zones property. Each recognition module recognizes the page parts assigned to it in the zones.
If the zone collection Zones of this IOcrPage is empty (i.e. there are no zones defined), then the page-layout decomposition process will be activated automatically in order to create a zone list for the image, before recognition. Hence, AutoZone will be implicitly called.
Note: If this IOcrPage is an empty page, in other words, when the OCR engine performs automatic page decomposing with the AutoZone method and could not find any zones in it, the RecognizeText method will fail with an exception. It is recommended you call AutoZone and then check if there is at least one zone found by the engine (using Zones.Count). If the count is zero, do not call RecognizeText.
If a recognition module is not able to recognize an object (i.e. character, or checkmark etc.), this object will be marked as a rejected one. It will become marked by a rejection symbol during conversion to the final output document. Note that IOcrDocumentManager.RejectionSymbol can be set to specify the rejection symbol used in the final document.
This method uses the checking subsystem (IOcrSpellCheckManager) to either flag suspicious characters or words, or to allow auto-correction during the recognition process.
You can use the OcrProgressCallback to show the operation progress or to abort it. For more information and an example, refer to OcrProgressCallback.
Since the format of the recognized data file is not documented, you can use GetRecognizedCharacters and SetRecognizedCharacters to examine or modify the data. Any changes you make to the recognition data will be saved in the resulting document when you save IOcrDocument.
After the page is successfully recognized, the value of the IsRecognized property should be true.
Use Unrecognize to clear the recognition data stored in a page.
Use IOcrPage.Recognize to keep the recognition data stored internally inside the page. You can later use the methods of the IOcrDocument object that owns this page or pages to save the data to a file or memory using the many formats supported by this IOcrEngine such as Text, PDF or Microsoft Word.
Since the recognition algorithm may use the checking subsystem, you must set up the IOcrSpellCheckManager prior to calling RecognizeText. Checking recognized zone contents may consist of any combination of the following:
- The supplied Language dictionary set through IOcrSpellCheckManager.SpellLanguage
- User dictionary containing literals and/or regular expressions set through IOcrSpellCheckManager.UserDictionary
- The user-written global checking callback set through IOcrSpellCheckManager.SetSpellCheckCallback
To get the accuracy and timing data of the latest successful recognition process use GetLastStatistic after calling IOcrPage.Recognize.
Target Platforms: Microsoft .NET Framework 2.0, Windows 2000, Windows XP, Windows Server 2003 family, Windows Server 2008 family, Windows Vista, Windows 7
Reference
IOcrPage InterfaceIOcrPage Members
IOcrPageCollection Interface
IOcrZoneCollection Interface
Recognize Method
IsRecognized Property
Unrecognize Method
OcrZone Structure
AutoZone
GetRecognizedCharacters Method
SetRecognizedCharacters Method
OcrCharacter Structure
IOcrPageCharacters Interface
IOcrZoneCharacters Interface
Programming with Leadtools .NET OCR
Recognizing OCR Pages
OCR Confidence Reporting