ExtractText

Extracts text from a file and can be called with a POST Request to the following URL:

[POST]  https://azure.leadtools.com/api/Recognition/ExtractText

Common Service Request URL Parameters

The following parameters are required unless indicated otherwise, and are used by all Conversion and Recognition API calls:

Parameter Description Accepted Values
fileUrl (Optional) The URL to the file to be processed. For more information, refer to the Cloud Services Overview section. A string or URI containing a valid URL to the file to be uploaded.
firstPage The first page in the file to process. An integer value between 1 and the total number of pages in the file.
lastPage The last page in the file to process. Passing a value of -1 or 0 will indicate to the service that all pages between the First Page parameter, and the last page in the file will be processed. Otherwise, an integer value between 1 and the total number of pages in the file must be passed, and the value must be greater than or equal to the value specified in the FirstPage parameter.
guid (Optional) Unique identifier corresponding to an uploaded file. This value will be returned when a file is uploaded using the UploadFile service call. A valid GUID
filePassword (Optional) The password to unlock a password protected file. A string containing the password for a secure PDF.
callbackUrl (Optional) Passing a callbackURL to the service will allow us to notify you when your file has finished processing. If the callbackUrl is invalid or malicious, it will be ignored. The LEADTOOLS Cloud Services will send the request’s ID in the body of the message sent to the callbackUrl. A string or URI containing a valid URL to message.
ocrLanguage (Optional) The OCR Language to use when OCRing a Raster file. Defaults to en (English) if no languages are specified. 0 - en
1 - bg
2 - hr
3 - cs
4 - da
5 - nl
6 - fr
7 - de
8 - el
9 - hu
10 - it
11 - pl
12 - pt
13 - sr
14 - es
15 - sv
16 - tr
17 - uk

Request Specific Parameters

Additional parameters available are listed below.

Parameter Description Accepted Values
characterinfo (Optional) Value indicating whether you want to receive additional data regarding the Characters found in each page and their locations. A Boolean

Status Codes

The following status codes will be returned when the method is called:

Status Description
200 The request has been successfully received.
400 The request was not valid for one of the following reasons:

Required request parameters were not included.
GUID value was not provided.
File information provided was malformed.
Attempting to queue a request on a file that has not yet been verified.
401 The AppID/Password combination is not valid or does not correspond with the GUID provided.
402 There are not enough pages left in the Application to process the request.
500 There was an internal error processing your request.

Returns

If performing a single-service call, a unique-identifier will be returned that can be used to query the progress of the extraction.

Online Demo

This method is available for free in our live Online Demo. You do not need an account and you can test out your own files to see the results.

Examples

JavaScript (Node.js)
C#
Python
PHP
Perl
//Simple script to make and process the results of an ExtractText request to the LEADTOOLS CloudServices. 
 
const request = require('request'); 
 
var servicesUrl = "https://azure.leadtools.com/api/"; 
 
//The first page in the file to mark for processing 
var firstPage = 1; 
 
//Sending a value of -1 will indicate to the services that the rest of the pages in the file should be processed. 
var lastPage = -1; 
 
//We will be uploading the file via a URL.  Files can also be passed by adding a PostFile to the request.  Only 1 file will be accepted per request. 
//The services will use the following priority when determining what a request is trying to do GUID > URL > Request Body Content 
var fileURL = 'https://demo.leadtools.com/images/pdf/leadtools.pdf'; 
 
var recognitionUrl = servicesUrl + 'Recognition/ExtractText?firstPage=' + firstPage + '&lastPage=' + lastPage + '&fileurl=' + fileURL; 
 
request.post(getRequestOptions(recognitionUrl), recognitionCallback); 
 
 
function recognitionCallback(error, response, body){ 
    if(!error && response.statusCode == 200){ 
        var guid = body; 
        console.log("Unique ID returned by the Services: " + guid); 
    } 
} 
 
function getRequestOptions(url){ 
    //Function to generate and return HTTP request  options. 
    var requestOptions ={ 
        url: url, 
        headers: { 
            'Content-Length' : 0 
        }, 
        auth: { 
            user:"Enter Application ID", 
            password:"Enter Application Password" 
        } 
    }; 
    return requestOptions; 
} 
using System; 
using System.Collections.Generic; 
using System.Linq; 
using System.Text; 
using System.Threading.Tasks; 
using System.Threading; 
using System.Net; 
using System.Net.Http; 
using System.Net.Http.Headers; 
using Newtonsoft.Json.Linq; 
 
namespace Azure_Code_Snippets.DocumentationSnippets 
{ 
   class CloudServices_ExtractText_Demo 
   { 
      private string hostedServicesUrl = "https://azure.leadtools.com/api/"; 
      public async void ExtractTextAdditional() 
      { 
         //The first page in the file to mark for processing 
         int firstPage = 1; 
 
         //Sending a value of -1 will indicate to the service that all pages in the file should be processed. 
         int lastPage = -1; 
 
         string fileURL = "https://demo.leadtools.com/images/pdf/leadtools.pdf"; 
 
         string recognitionUrl = string.Format("Recognition/ExtractTextAdditional?firstPage={0}&lastPage={1}&fileurl={2}", firstPage, lastPage, fileURL); 
 
         var client = InitClient(); 
         var result = await client.PostAsync(recognitionUrl, null); 
         if (result.StatusCode == HttpStatusCode.OK) 
         { 
            //Unique ID returned by the services 
            string id = await result.Content.ReadAsStringAsync(); 
            Console.WriteLine("Unique ID returned by the services: " + id); 
         } 
         else 
            Console.WriteLine("Request failed with the following response: " + result.StatusCode); 
 
      } 
 
      private HttpClient InitClient() 
      { 
         string AppId = "Enter Application ID"; 
         string Password = "Enter Application Password"; 
 
         HttpClient client = new HttpClient(); 
         client.BaseAddress = new Uri(hostedServicesUrl); 
         client.DefaultRequestHeaders.Accept.Clear(); 
         client.DefaultRequestHeaders.Accept.Add(new System.Net.Http.Headers.MediaTypeWithQualityHeaderValue("application/json")); 
 
         string authData = string.Format("{0}:{1}", AppId, Password); 
         string authHeaderValue = Convert.ToBase64String(Encoding.UTF8.GetBytes(authData)); 
         client.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Basic", authHeaderValue); 
 
         return client; 
      } 
   } 
} 
#Simple script to make an ExtractText request to the LEADTOOLS CloudServices and parse the resulting JSON. 
 
import requests, sys, time 
 
servicesUrl = 'https://azure.leadtools.com/api/' 
 
baseRecognitionUrl ='{}Recognition/ExtractText?firstPage={}&lastPage={}&fileurl={}' 
 
#The first page in the file to mark for processing 
firstPage = 1 
 
#Sending a value of -1 will indicate to the services that the rest of the pages in the file should be processed. 
lastPage = -1 
 
 
#We will be uploading the file via a URL.  Files can also be passed by adding a PostFile to the request.  Only 1 file will be accepted per request. 
#The services will use the following priority when determining what a request is trying to do GUID > URL > Request Body Content 
fileURL = 'https://demo.leadtools.com/images/pdf/leadtools.pdf' 
 
formattedRecognitionUrl = baseRecognitionUrl.format(servicesUrl,firstPage, lastPage, fileURL) 
 
#The application ID. 
appId = "Enter Application ID"; 
 
#The application password. 
password = "Enter Application Password"; 
 
request = requests.post(formattedRecognitionUrl, auth=(appId, password)) 
if request.status_code != 200: 
    print("Error sending the conversion request \n") 
    print(request.text) 
    sys.exit() 
 
#Grab the GUID from the Request 
guid = request.text 
print("Unique ID returned by the services: " + guid + "\n") 
<?php 
    //Simple script to make an ExtractText request to the LEADTOOLS CloudServices and parse the resulting JSON. 
 
    $servicesBaseUrl = "https://azure.leadtools.com/api/"; 
 
    $baseRecognitionURL = '%sRecognition/ExtractText?firstPage=%s&lastPage=%s&fileurl=%s'; 
 
    //The first page in the file to mark for processing 
    $firstPage = 1; 
 
    //Sending a value of -1 will indicate to the services that the rest of the pages in the file should be processed. 
    $lastPage = -1; 
 
    //We will be uploading the file via a URL.  Files can also be passed by adding a PostFile to the request.  Only 1 file will be accepted per request. 
    //The services will use the following priority when determining what a request is trying to do GUID > URL > Request Body Content 
    $fileURL = 'https://demo.leadtools.com/images/pdf/leadtools.pdf'; 
 
    $formattedConversionURL = sprintf($baseRecognitionURL, $servicesBaseUrl, $firstPage, $lastPage, $fileURL); 
 
    $conversionRequestOptions = GeneratePostOptions($formattedConversionURL); 
 
    $request = curl_init(); 
    curl_setopt_array($request, $conversionRequestOptions); //Set the request URL 
 
    if(!$guid = curl_exec($request)) 
    { 
        echo "There was an error processing the request. \n\r"; 
        echo $guid; 
        exit; 
    } 
    curl_close($request); //Close the request 
 
    echo "Unique ID returned by the services: $guid \n\r"; 
 
    function GeneratePostOptions($url) 
    { 
        $appId = "Enter Application ID"; 
        $password = "Enter Application Password"; 
        $headers = array( 
            "Content-Length : 0" 
            ); 
        $postOptions = array( 
            CURLOPT_POST => 1, 
            CURLOPT_URL => $url, 
            CURLOPT_FRESH_CONNECT => 1, 
            CURLOPT_RETURNTRANSFER => 1, 
            CURLOPT_USERPWD => "$appId:$password", 
            CURLOPT_FORBID_REUSE => 1, 
            CURLOPT_HTTPHEADER => $headers 
        ); 
        return $postOptions; 
    } 
?> 
#Simple script to make and process the results of an ExtractText request to the LEADTOOLS CloudServices. 
 
use base 'HTTP::Message'; 
use LWP::UserAgent (); 
 
require HTTP::Request; 
require HTTP::Headers; 
 
my $servicesUrl = "https://azure.leadtools.com/api/"; 
 
#The first page in the file to mark for processing 
my $firstPage = 1; 
 
#Sending a value of -1 will indicate to the services that the rest of the pages in the file should be processed. 
my $lastPage = -1; 
 
#We will be uploading the file via a URL.  Files can also be passed by adding a PostFile to the request.  Only 1 file will be accepted per request. 
#The services will use the following priority when determining what a request is trying to do GUID > URL > Request Body Content 
my $fileURL = 'https://demo.leadtools.com/images/pdf/leadtools.pdf'; 
 
my $appId = 'Enter Application ID'; 
my $password = 'Enter Application Password'; 
my $headers = HTTP::Headers->new( 
    Content_Length => 0 
); 
$headers->authorization_basic($appId, $password); 
 
 
#The User Agent to be used when making requests 
my $ua = LWP::UserAgent->new; 
 
#For the purposes of this script, we will be extracting info from a barcode. 
my $recognitionUrl = $servicesUrl . 'Recognition/ExtractText?firstPage=' . $firstPage . '&lastPage=' . $lastPage . '&fileurl=' . $fileURL; 
 
my $request = HTTP::Request->new(POST => $recognitionUrl, $headers); 
my $response = $ua->request($request); 
if(!$response->is_success){ 
    print STDERR $response->status_line, "\n"; 
    exit; 
} 
 
my $guid = $response->decoded_content; 
print("Unique ID returned by the services: " . $guid . "\n"); 

See Also

Resources

Legal

Help Version 21.0.2021.9.2
Products | Support | Contact Us | Intellectual Property Notices
© 1991-2021 LEAD Technologies, Inc. All Rights Reserved.
Products | Support | Contact Us | Intellectual Property Notices
© 1991-2021 LEAD Technologies, Inc. All Rights Reserved.