Visual Basic (Declaration) | |
---|---|
<FlagsAttribute()> <SerializableAttribute()> Public Enum OcrXmlOutputOptions Inherits Enum |
Visual Basic (Usage) | Copy Code |
---|---|
|
C# | |
---|---|
[FlagsAttribute()] [SerializableAttribute()] public enum OcrXmlOutputOptions : Enum |
C++/CLI | |
---|---|
[FlagsAttribute()] [SerializableAttribute()] public enum class OcrXmlOutputOptions : public Enum |
Member | Description |
---|---|
None | Default. Write the recognized word values in the result XML data. |
Characters | Write the recognized character values instead of the word values in the result XML data |
CharacterAttributes | Only valid with Characters. Write the character attributes (font for example) in the result XML data. |
The various
The format of the result XML data is as follows:
<?xml version="1.0" encoding="UTF-16" standalone="yes"?>
<pages>
<page>
<zone>
<paragraph>
<line>
<word>
<character/>
<character/>
</word>
</line>
</paragraph>
</zone>
</page>
</pages>
The pages
element is repeated once per document and it has no value and no additional attributes.
The page
element is repeated for every page in the document (IOcrDocument.Pages.Count). If
this page has not been recognized or contains no zones, then the page
element will not contain any child zone
elements.
The page
element has no value and contains the following additional attributes:
Attribute | Value |
---|---|
horizontal_resolution | Horizontal resolution of the page. The value is IOcrPage.DpiX. |
vertical_resolution | Vertical resolution of the page. The value is IOcrPage.DpiY. |
width | Width of the page in pixels. The value is IOcrPage.Width. |
height | Height of the page in pixels. The value is IOcrPage.Height. |
The zone
element is repeated for every zone in the current page (IOcrPage.Zones). The
zone
element has no value and contains the following additional attributes:
Attribute | Value |
---|---|
type | The zone type. Either "text" or "graphics". If the zone element is of type "text", then it will contain zero
or more paragraph child elements. If the zone is of type "graphics", then it will not contain and other child elements. |
left | The zone left position in pixels. The value is OcrZone.Bounds.Left converted to pixels. |
top | The zone top position in pixels. The value is OcrZone.Bounds.Top converted to pixels. |
right | The zone right position in pixels. The value is OcrZone.Bounds.Right converted to pixels. |
bottom | The zone bottom position in pixels. The value is OcrZone.Bounds.Bottom converted to pixels. |
subtype | The zone type. The value is OcrZone.ZoneType. |
recognition_module | The zone recognition module. The value is OcrZone.RecognitionModule. |
fill_method | The fill method. The value is OcrZone.FillMethod. |
The paragraph
element is repeated for every text paragraph in the current zone and it has no attributes. If this zone
has no recognition text, then the paragraph
element will not contain any child line
elements.
The paragraph
element has no attributes and no value.
The line
element is repeated for every line of text in the current paragraph. The line
element
has no value and contains the following additional attributes:
Attribute | Value |
---|---|
left | The line left position in pixels. |
top | The line top position in pixels. |
right | The line right position in pixels. |
bottom | The line bottom position in pixels. The value of left , top , right and bottom is calculated from the
summation of all the boundaries of the words that make up this line. |
base | The position of the baseline of this line. The value is calculated from the summation of the baselines of all the words that make up this line. |
The word
element is repeated for every word of text in the current line. If
OcrXmlOutputOptions.Characters was not specified in the generation options; then the word
element
will contain the value of the word as its value. Otherwise, the word
element will contain no value.
The word
element has the following attributes:
Attribute | Value |
---|---|
left | The word left position in pixels. |
top | The word top position in pixels. |
right | The word right position in pixels. |
bottom | The word bottom position in pixels. The value of left , top , right and bottom is calculated from the
summation of all the boundaries of the characters that make up this word. |
base | The position of the baseline of this word. The value is calculated from the summation of the baselines of all the characters that make up this word. |
The character
element is repeated for every character in the following word only if
OcrXmlOutputOptions.Characters was specified in the generation options. Otherwise, the word
element will
contain no child character
elements. If OcrXmlOutputOptions.Characters was specified in the generation options; then the character
element will contain the value of the character as its value. Otherwise, the character
element will contain no value.
The character
element contains the following additional attributes:
Attribute | Value |
---|---|
left | The character left position in pixels. |
top | The character top position in pixels. |
right | The character right position in pixels. |
bottom | The character bottom position in pixels. The value of left , top , right and bottom is
calculated from OcrCharacter.Bounds. |
base | The position of the baseline of this character. The value is OcrCharacter.Base. |
confidence | The confidence of this character. The value is OcrCharacter.Confidence. |
font_size | The font size in points. The value is OcrCharacter.FontSize. Only available if OcrXmlOutputOptions.CharacterAttributes is specified. |
proportional | "yes" if the character font is proportional, "no"; otherwise. The value is calculated from OcrCharacter.FontStyle. Only available if OcrXmlOutputOptions.CharacterAttributes is specified. |
serif | "yes" if the character font is serif, "no"; otherwise. The value is calculated from OcrCharacter.FontStyle. Only available if OcrXmlOutputOptions.CharacterAttributes is specified. |
bold | "yes" if the character font is bold, "no"; otherwise. The value is calculated from OcrCharacter.FontStyle. Only available if OcrXmlOutputOptions.CharacterAttributes is specified. |
italic | "yes" if the character font is italic, "no"; otherwise. The value is calculated from OcrCharacter.FontStyle. Only available if OcrXmlOutputOptions.CharacterAttributes is specified. |
underline | "yes" if the character font is underline, "no"; otherwise. The value is calculated from OcrCharacter.FontStyle. Only available if OcrXmlOutputOptions.CharacterAttributes is specified. |
The following is an example of the XML output when OcrXmlOutputOptions.None is specified:
<?xml version="1.0" encoding="UTF-16" standalone="yes"?>
<pages>
<page horizontal_resolution="300" vertical_resolution="300" width="2544" height="3294">
<zone type="Text" left="371" top="370" right="831" bottom="420" subtype="Text" recognition_module="Auto" fill_method="Default">
<paragraph>
<line left="372" top="371" right="830" bottom="419" base="29">
<word left="372" top="371" right="554" bottom="409" base="30">License</word>
<word left="570" top="372" right="830" bottom="419" base="29">Agreement</word>
</line>
</paragraph>
</zone>
</page>
</pages>
Here is the same XML output when OcrXmlOutputOptions.Characters is specified:
<?xml version="1.0" encoding="UTF-16" standalone="yes"?>
<pages>
<page horizontal_resolution="300" vertical_resolution="300" width="2544" height="3294">
<zone type="Text" left="371" top="370" right="831" bottom="420" subtype="Text" recognition_module="Auto" fill_method="Default">
<paragraph>
<line left="372" top="371" right="830" bottom="419" base="29">
<word left="372" top="371" right="554" bottom="409" base="30">
<character left="372" top="372" right="398" bottom="408" base="36" confidence="100">L</character>
<character left="402" top="371" right="409" bottom="408" base="37" confidence="100">i</character>
<character left="414" top="381" right="438" bottom="409" base="27" confidence="100">c</character>
<character left="442" top="381" right="468" bottom="409" base="27" confidence="100">e</character>
<character left="472" top="381" right="496" bottom="408" base="27" confidence="100">n</character>
<character left="501" top="381" right="525" bottom="408" base="27" confidence="100">s</character>
<character left="529" top="381" right="554" bottom="408" base="27" confidence="100">e</character>
</word>
<word left="570" top="372" right="830" bottom="419" base="29">
<character left="570" top="372" right="604" bottom="408" base="36" confidence="100">A</character>
<character left="607" top="381" right="633" bottom="419" base="27" confidence="100">g</character>
<character left="639" top="381" right="655" bottom="408" base="27" confidence="100">r</character>
<character left="657" top="381" right="682" bottom="408" base="27" confidence="100">e</character>
<character left="685" top="381" right="710" bottom="408" base="27" confidence="100">e</character>
<character left="715" top="381" right="753" bottom="408" base="27" confidence="100">m</character>
<character left="758" top="381" right="783" bottom="408" base="27" confidence="100">e</character>
<character left="788" top="381" right="812" bottom="408" base="27" confidence="100">n</character>
<character left="815" top="374" right="830" bottom="408" base="34" confidence="100">t</character>
</word>
</line>
</paragraph>
</zone>
</page>
</pages>
System.Object
System.ValueType
System.Enum
Leadtools.Forms.Ocr.OcrXmlOutputOptions
Target Platforms: Microsoft .NET Framework 3.0, Windows XP, Windows Server 2003 family, Windows Server 2008 family
Reference
Leadtools.Forms.Ocr NamespaceDocumentFormat
IOcrDocumentManager Interface
IOcrDocument Interface
IOcrDocument.Save
IOcrDocument.SaveXml
IOcrPage.Recognize
IOcrEngine Interface
OcrEngineManager Class
OcrEngineType Enumeration
Programming with Leadtools .NET OCR
Files to be Included with Your Application