Syntax
struct L_OcrPageAreaOptions
{
L_UINT StructSize;
L_RECT Area;
L_UINT IntersectPercentage;
L_BOOL UseTextZone;
};
typedef struct L_OcrPageAreaOptions L_OcrPageAreaOptions;
Represents the area of interest options to use with an OCR page.
Structure size. It should be equal to sizeof(L\_OcrPageAreaOptions).
Area of interest rectangle in the page. The default value is an empty rectangle.
Percentage of the bounds to use when determining if the character bounds is inside the area of interest. The possible value are range from 0 to 100 to use when determining if the character bounds is inside the area of interest. The default value is 0 meaning 50.
Add a text zone during auto-zone. Possible values are:
Value | Meaning |
---|---|
TRUE | Add a single OCR zone. |
FALSE | Use intelligent zoning. This is the default value. |
Remarks
Using an empty rectangle results in area options that is the size of the whole page and therefore, the area of interest is not used. The engine will use only the values of area that intersect with the page dimension.
When performing recognition through L_OcrPage_Recognize or copying page area using L_OcrPage_Copy and the source page contains an area of interest set through L_OcrPage_SetAreaOptions, then the bounds of each recognized OCR character (L_OcrCharacter.Bounds) is checked against Area member and if the character is outside the area then it is ignored and if the character is completely inside the area then it is included.
When only part of the character bounds is included, then the engine will use the value of IntersectPercentage member to decide whether to include the character:
A value of 25 means if less than 25% of the character is included, then it should be dropped.
A value of 75 means if less than 75% of the character is included then it should be dropped and so on.
A value of 100 means do not include the character unless all of its bounds is included in the area.
A value of 0 is treated as 50: Include the character if half or more of its area is included in the area.
When L_OcrPage_AutoZone is called on a page that has area of interset options set through L_OcrPage_SetAreaOptions, the value of UseTextZone member is used to determine how the engine process the page:
FALSE means the engine will add an L_OcrZone with L_OcrZone.ZoneType set to L_OcrZoneType.Text. This might reduce the accuracy of the recognition process that will follow if the area of interest area contains text with multiple font styles and sizes or when the area contains other elements such as tables or graphic.
TRUE means the engine will auto-zone the entire page and then drops the zones that are outside area of interest. This greatly enhance the accuracy of the recognition process that will follow.
The following functions make use of this structure: