Welcome Guest! To enable all features, please Login or Register.

Notification

Icon
Error

Options
View
Last Go to last post Unread Go to first unread post
#1 Posted : Thursday, June 21, 2018 3:48:55 PM(UTC)
Duncan Quirk

Groups: Registered, Tech Support, Administrators
Posts: 70

Was thanked: 4 time(s) in 4 post(s)

When working with any file, it is important to bear in mind that some files will contain sensitive information. When archiving digital files, it is often important to remove any sensitive data (such as social security numbers, or MICR information on checks). The attached demo written in C# using V20 of the LEADTOOLS SDK showcases how to take an input file, extract all of the text, and how to search and redact the text based off a regular expression. For the purposes of this demo, we are searching for any word containing LEAD or LEADTOOLS.

The code:
Code:

            string inputFile = @"C:\Users\Public\Documents\LEADTOOLS Images\leadtools.pdf";
            string outputFile = $@"{Path.GetDirectoryName(inputFile)}\{Path.GetFileNameWithoutExtension(inputFile)}-redacted.pdf";
            using (var ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD, false))
            {
               ocrEngine.Startup(null, null, null, OcrEnginePath);
               var options = new LoadDocumentOptions();
               using (var document = DocumentFactory.LoadFromFile(inputFile, options))
               {
                  document.IsReadOnly = false;
                  document.Text.OcrEngine = ocrEngine;

                  RasterImage redactedDocument = document.Pages.First().GetImage();
                  foreach(var page in document.Pages)
                  {
                     if (page.PageNumber == 1)
                        continue;
                     redactedDocument.AddPage(page.GetImage());
                  }

                  Parallel.ForEach(document.Pages, (page) =>
                  {
                     AnnContainer container = new AnnContainer();
                     var pageText = page.GetText();
                     pageText.BuildWords();

                     //Regex to search for all instances of LEADTOOLS or LEAD
                     var pattern = "(LEADTOOLS|LEAD)";
                     var rgx = new Regex(pattern, RegexOptions.IgnoreCase);
                     var annotations = new ConcurrentBag<AnnRedactionObject>();
                     Parallel.ForEach(pageText.Words, (word) =>
                     {
                        if (rgx.Match(word.Value.ToLower()).Success)
                        {
                           AnnRedactionObject redactionObject = new AnnRedactionObject();
                           redactionObject.Rect = word.Bounds;
                           redactionObject.Fill = AnnSolidColorBrush.Create("Black");
                           annotations.Add(redactionObject);
                        }
                     });

                     var imagePage = page.GetImage();
                     foreach (var annotation in annotations)
                        container.Children.Add(annotation);

                     AnnWinFormsRenderingEngine e = new AnnWinFormsRenderingEngine();
                     e.RenderOnImage(container, imagePage);

                     redactedDocument.ReplacePage(page.PageNumber, imagePage);
                  });

                  using (RasterCodecs codecs = new RasterCodecs())
                     codecs.Save(redactedDocument, outputFile, RasterImageFormat.RasPdfJpeg, 0);

                  redactedDocument.Dispose();
                  Console.WriteLine($"File has been successfully redacted, and saved to {outputFile}");
               }
            }


File Attachment(s):
Redact Document.zip (4kb) downloaded 287 time(s).

Edited by moderator Monday, February 3, 2020 3:22:55 PM(UTC)  | Reason: Not specified

Duncan Quirk
Developer Support Engineer
LEAD Technologies, Inc.

LEAD Logo
 

Try the latest version of LEADTOOLS for free for 60 days by downloading the evaluation: https://www.leadtools.com/downloads

Wanna join the discussion? Login to your LEADTOOLS Support accountor Register a new forum account.

#2 Posted : Monday, February 17, 2020 1:42:35 PM(UTC)
Christopher

Groups: Registered, Tech Support, Administrators
Posts: 89

Was thanked: 4 time(s) in 4 post(s)

The attached project is a sample for the same redaction functionality in Visual Basic.

Included is a LEADTOOLS sample PDF document and an example of the output file.

File Attachment(s):
simpleRedactVB_20.zip (2,925kb) downloaded 76 time(s).
Chris Thompson
Developer Support Engineer
LEAD Technologies, Inc.

LEAD Logo
 
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.

Powered by YAF.NET | YAF.NET © 2003-2024, Yet Another Forum.NET
This page was generated in 0.052 seconds.