Area of interest options to use with an OCR page.
public class OcrPageAreaOptions
A typical scenario when using IOcrPage is to recognize only a portion of the image (for instance, when performing rubber-banding or developing a mobile application having a guide rectangle).
One scenario is to add an OcrZone of type OcrZoneType.Text into the page with bounds set to the desired area. Then call Recognize (and optionally save the result to a final document). However, this approach can produce undesirable results for the following reasons:
AutoZone was not used: instead, a zone was added manually to the page. Therefore, text that intersects or is very close to the bounding rectangle's edges can get chopped off or treated as noise.
Recognition accuracy will be reduced if the zone area contains text with multiple font styles and sizes, or if the area contains other elements such as tables or graphics. The engine will not be able to perform zoning detection to fine-tune the OCR recognition process.
Many auto-preprocessing functions such as AutoPreprocess with inversion option, detect and process the entire page instead of just the bounding rectangle of the area of interest -- again resulting in potentially erroneous output.
Therefore, a better solution is to call IOcrPage.SetAreaOptions prior to processing to set the bounding rectangle for the area of interest. With this approach:
AutoZone is called by the engine because no zones were added manually. Any text intersecting with or very close to the bounding rectangle's edges (set in the Area property) is recognized completely and then dropped or included depending on the value set in the IntersectPercentage property. Note that this only occurs when the value of UseTextZone is false (the default); otherwise, a new text zone is added as described in the section above.
Since AutoZone is called, recognition accuracy will not be affected if the zone area contains text with multiple font styles and sizes, or other elements such as tables or graphics. The engine will be able to perform automatic zoning detection to fine-tune the OCR recognition process.
Auto-preprocessing functions as AutoPreprocess with the inversion option will detect and process only the area specified by Area and therefore will be able to produce much accurate results.
The following IOcrPage methods are affected when area options are set:
IOcrPage.AutoPreprocess: will perform its operation only on the area when used with the inversion option.
IOcrPage.AutoZone: will perform its operation as it normally would, and then drop all zones that do not intersect with the area.
IOcrPage.Recognize: will only recognize the zones that intersect with the area. Characters that intersect with the area are included or dropped depending on the value set in IntersectPercentage.
Copy.
IOcrPage.GetAreaOptions can be used along with IOcrPage.SetAreaOptions to modify or turn Area of Interest on or off (switching between complete and area-based processing), depending on the application's needs.
Adding a page with an Area of Interest to an IOcrDocument in preparation to saving to an output format is not affected by these options. The final document will contain the page with the original dimensions; however, if the page has been recognized, then all area outside of the region of interest will be treated as graphics. IOcrPage.Copy can be used in such a situation to create an output document containing only the area of interest.