This tutorial shows how to extract text data in a file using the LEADTOOLS Cloud Services in a Python application.
Overview | |
---|---|
Summary | This tutorial covers how to make ExtractText requests and process the results using the LEADTOOLS Cloud Services in a Python application. |
Completion Time | 30 minutes |
Project | Download tutorial project (1 KB) |
Platform | LEADTOOLS Cloud Services API |
IDE | Visual Studio 2019 |
Language | Python |
Development License | Download LEADTOOLS |
Try it in another language |
Be sure to review the following sites for information about LEADTOOLS Cloud Services API.
LEADTOOLS Service Plan offerings:
Service Plan | Description |
---|---|
Free Trial | Free Evaluation |
Page Packages | Prepaid Page Packs |
Subscriptions | Prepaid Monthly Processed Pages |
To further explore the offerings, refer to Pricing Information for LEADTOOLS Hosted Cloud Services > Service Plan Terms.
For pricing details, refer to https://www.leadtools.com/sdk/products/hosted-services#pricing > Page Packages and Subscriptions.
To obtain the necessary Application ID and Application Password, refer to Create an Account and Application with the LEADTOOLS Hosted Cloud Services.
With the project created and the requests
package added, coding can begin.
In the Solution Explorer, open ExtractText.py
. Add the following variables at the top.
Note
Where it states
Replace with Application ID
andReplace with Application Password
, be sure to place your Application ID and Password accordingly.
# Simple script to make and process the results of a ExtractText request to the LEADTOOLS CloudServices.
import requests
import sys
import time
servicesUrl = "https://azure.leadtools.com/api/"
# The application ID.
appId = "Replace with Application ID"
# The application password.
password = "Replace with Application Password"
# The first page in the file to mark for processing
firstPage = 1
# Sending a value of -1 will indicate to the services that the rest of the pages in the file should be processed.
lastPage = -1
# We will be uploading the file via a URl. Files can also be passed by adding a PostFile to the request. Only 1 file will be accepted per request.
# The services will use the following priority when determining what a request is trying to do GUID > URL > Request Body Content
fileURL = 'http://demo.leadtools.com/images/cloud_samples/ocr1-4.tif'
baseRecognitionUrl = '{}Recognition/ExtractText?firstPage={}&lastPage={}&fileurl={}'
formattedRecognitionUrl = baseRecognitionUrl.format(
servicesUrl, firstPage, lastPage, fileURL)
Add a request.post
call to process the ExtractText
request and capture the GUID from the resulting request.text
, then provide it to the next section.
This sends an ExtractText
request to the LEADTOOLS Cloud Services API, if successful, a unique identifier (GUID) will be returned and then a query using this GUID will be made.
request = requests.post(formattedRecognitionUrl, auth=(appId, password))
# If uploading a file alongside the HTTP request
#baseRecognitionUrl ='{}Recognition/ExtractText?firstPage={}&lastPage={}'
#formattedRecognitionUrl = baseRecognitionUrl.format(
# servicesUrl,firstPage, lastPage)
#file = {'file' : open('path/to/file', 'rb')}
#request = requests.post(
# formattedRecognitionUrl, auth=(appId, password), files = file)
if request.status_code != 200:
print("Error sending the conversion request")
print(request.text)
sys.exit()
# Grab the GUID from the Request
guid = request.text
print("Unique ID returned by the services: " + guid + "\n")
Next, create a Query
request that utilizes the GUID provided by ExtractText
request.
If successful the response will contain all the request data in JSON format.
# Now, we need to Query the results
print("Now Querying Results....")
baseQueryUrl = '{}Query?id={}'
formattedQueryUrl = baseQueryUrl.format(servicesUrl, guid)
while True: # Poll the services to determine if the request has finished processing
request = requests.post(formattedQueryUrl, auth=(appId, password))
returnedData = request.json()
if returnedData['FileStatus'] != 100 and returnedData['FileStatus'] != 123:
break
time.sleep(5)
print("File finished processing with file status: " +
str(returnedData['FileStatus']))
if returnedData['FileStatus'] != 200:
sys.exit()
Finally, parse the JSON data into a readable format.
try:
print("Results:")
returnedJson = returnedData['RequestData']
for requestObject in returnedJson:
print("Service Type: " + requestObject['ServiceType'])
if requestObject['ServiceType'] == 'Recognition' and requestObject['RecognitionType'] == 'Text':
print("Data: " + requestObject['data'])
except Exception as e:
print("Failed to Parse JSON")
print(str(e))
Run the project by pressing F5, or by selecting Debug -> Start Debugging.
If the steps were followed correctly, the console appears and the application displays the extracted text information from the returned JSON data.
This tutorial showed how to extract text information via the LEADTOOLS Cloud Services API.