Skip to main content

Check out Interactive Visual Stories to gain hands-on experience with the SSE product features. Click here.

Skyhigh Security

Scan and Process Image Files with OCR

Optical Character Recognition (OCR) engine extracts text from image files. The Skyhigh Security DLP engine uses best -in-class OCR to extract supported image files. This extracted text is then compared against predefined classification definitions, allowing for the effective classification of content and Data Loss Prevention (DLP) policies can be applied to manage and remediate files as needed. You can use policies and classifications to detect violations and trigger incident alerts. OCR and Classification are available for both Skyhigh Security Service Edge or Skyhigh CASB.

OCR scanning works with all Skyhigh DLP-supported languages and most Western and Asian languages.

Integrating OCR engine in DLP framework allows:

Enhanced DLP Protection. OCR significantly extends DLP protection for sensitive documents including tax paperwork, passports, credit card information, or any other personally identifiable data. This protection extends to images uploaded to the cloud or shared as images including screenshots and handwritten formats, addressing potential vulnerabilities by safeguarding confidential content, even in scenarios where users are restricted from copying and pasting data. 

Evaluation and Compliance. The OCR engine evaluates extracted text based on the match rule criteria defined in the DLP policies. 

  • Example 1: When a credit card image is processed, the OCR engine extracts the card number and checks it against the classifications and conditions specified in the DLP policy. This ensures compliance and helps prevent data leaks.
  • Example 2: If sections of a design document are encountered as images—whether as standalone images or embedded within another file—the text is extracted and compared against the established fingerprint to detect and prevent data leaks.

Seamless Integration with Existing Policies. No modifications to existing DLP policies are required when utilizing OCR. The established rules, exception criteria, and response rules are applicable to images, ensuring a seamless integration of OCR technology into the DLP framework.

If you purchase the OCR feature, it is enabled by default for Skyhigh Security Service Edge or Skyhigh CASB DLP policies. You can also disable the feature to avoid a slowdown. For details, see Configure OCR

NOTE:   

  • OCR only works with Classifications. It does not support legacy data identifiers. 
  • To enable Classifications for existing GovCloud (FedRAMP) tenants, contact Skyhigh Support.

Image Processing Specifications 

Supported Image Formats 

The following image formats are supported with OCR:

  • GIF
  • JPEG,  JPEG 2000, JFIF
  • JB2, JBIG2
  • PNTG
  • PCX
  • PNG
  • PDF
  • TIFF
  • BMP
Operational Conditions

For optimal image recognition, it is essential that the image is of high quality and contains at least one line of machine-printed text comprising a minimum of 25 to 30 characters. This text may include a mix of uppercase and lowercase letters.

NOTE: During the OCR scan, certain images may be excluded from processing, particularly when processing requests with smaller timescales for fail-open conditions. For instance, in inline email processing, the system prioritizes speed and efficiency, which can lead to some images being dropped from the queue if they don't meet criteria or if the system fails to read them within the limited timeframe.

Supported Angle of Rotation

Automatic rotation is the default feature supported by the OCR system. If no specific rotation settings are configured, the system will utilize the default rotation angle to accurately detect and process images.

NOTE: Automatic rotation is not supported on Hebrew and Thai languages.

Supported Image Size and Resolution

The size limits for successful image processing are influenced by available memory, the computing environment, and the properties of individual image files.

  • By default, the engine can process images with dimensions up to 8400 pixels in both height and width. If an image exceeds these limits, the OCR system will decline the processing request, and classification evaluation will not be conducted on the image's contents, resulting in an error message displayed during the loading process.
  • For successful image processing, the resolution should be less than 75 dpi or greater than 2400 dpi.
Supported Angle Limits

The OCR effectively detects skew in images with an angle less than 15 degrees. However, OCR may not be able to extract text if the image is skewed beyond this threshold