LEADTOOLS Support
General
General Questions
How correctly OCR large pdf document using java language
#1
Posted
:
Thursday, July 25, 2019 3:54:02 PM(UTC)
Groups: Registered
Posts: 11
Could you please provide any information how to optimize memory usage during OCR process for large non-searchable PDF documents on JAVA.
Thanks.
Code:OcrEngine ocrEngine = OcrEngineManager.createEngine(OcrEngineType.LEAD);
ocrEngine.startup(null, null, null, ocrRuntimePath);
setInitialOcrConfiguration(processingDocument, ocrEngine);
RasterCodecs rasterCodecs = ocrEngine.getRasterCodecsInstance();
OcrDocument ocrDocument = ocrEngine.getDocumentManager()
.createDocument(null, OcrCreateDocumentOptions.AUTO_DELETE_FILE.getValue());
OcrProgressCallback ocrProgressCallback = ....;
Path inputPath = ....;
Path outputPath = generateOutputDocumentPath(ocredDocumentDirectory, processingDocument.getName());
try {
int pageCount = rasterCodecs.getTotalPages(inputPath.toString());
IntStream.rangeClosed(1, pageCount)
.forEach(pageNumber -> {
// monitorOcrProgress(processingDocument, pageCount, pageNumber);
RasterImage rasterImage = rasterCodecs.load(inputPath.toString(), pageNumber);
OcrPage ocrPage = ocrEngine.createPage(rasterImage, OcrImageSharingMode.AUTO_DISPOSE);
ocrPage.autoZone(null);
ocrPage.recognize(ocrProgressCallback);
ocrDocument.getPages().add(ocrPage);
ocrPage.dispose();
});
ocrDocument.save(outputPath, DocumentFormat.PDF, null);
} finally {
if (Objects.nonNull(rasterCodecs)) {
rasterCodecs.dispose();
}
if (Objects.nonNull(ocrDocument)) {
ocrDocument.dispose();
}
ocrEngine.dispose();
if (Objects.nonNull(outputPath)) {
FileUtils.deleteQuietly(outputPath.getParent().toFile());
}
}
}
#2
Posted
:
Thursday, August 1, 2019 3:19:16 PM(UTC)
Groups: Registered, Tech Support, Administrators
Posts: 54
Thanks: 2 times
Was thanked: 10 time(s) in 10 post(s)
Apologies for the delayed response. I would have to see the rest of your project to understand the issue fully. Can you email a sample project to our
support@leadtools.com email? I can take a look at it and see if there are any optimizations that we could do to it to increase performance.
Josh Clark
Developer Support Engineer
LEAD Technologies, Inc.
LEADTOOLS Support
General
General Questions
How correctly OCR large pdf document using java language
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.