Build a list of the words found in the document page.
public void BuildWords()
Public Sub BuildWords()
public:
void BuildWords()
public void buildWords()
The text words are created from the characters found in the document based on the IsEndOfWord returned by document parsing engine. Whenever an "end of word" is found, the last set of characters are grouped together and stored as an item in the Words list. This is not performed automatically, instead, you must call BuildWords to populate the Words list from the Characters.
The following explains how this method works. If the page text consists of the following string Hello World
, then
the text parser engine will populate Characters as follows (ignoring Bounds):
Index |
Code |
IsEndOfWord |
IsEndOfLine |
---|---|---|---|
0 | H | false | false |
1 | e | false | false |
2 | l | false | false |
3 | l | false | false |
4 | o | true | false |
5 | W | false | false |
6 | o | false | false |
7 | r | false | false |
8 | l | false | false |
9 | d | true | true |
BuildWords will loop through each character till an item with IsEndOfWord equals to true
is reached, it will then create a word from the characters found so far, in this case from index 0 to 4. The characters are combined into
DocumentWord.Value (The string Hello
) and the union of these characters position and location
(DocumentCharacter.Bounds) are set into DocumentWord.Bounds. The
first index (0) and the last index (4) are set into DocumentWord.FirstCharacterIndex
and DocumentWord.LastCharacterIndex.
The method then continues to the next character (index 5) and repeat the operation, this time using indices 5 and 9 and the result is DocumentWord with Value set to 5 and LastCharacterIndex set to 9.
For more information, refer to Parsing Text with the Document Library.
For an example, refer to DocumentPageText.
Help Collections
Raster .NET | C API | C++ Class Library | HTML5 JavaScript
Document .NET | C API | C++ Class Library | HTML5 JavaScript
Medical .NET | C API | C++ Class Library | HTML5 JavaScript
Medical Web Viewer .NET
Multimedia
Direct Show .NET | C API | Filters
Media Foundation .NET | C API | Transforms
Supported Platforms
.NET, Java, Android, and iOS/macOS Assemblies
Imaging, Medical, and Document
C API/C++ Class Libraries
Imaging, Medical, and Document
HTML5 JavaScript Libraries
Imaging, Medical, and Document