Skip to main content

Check out Interactive Visual Stories to gain hands-on experience with the SSE product features. Click here.

Skyhigh Security

ML Auto Classifiers

Limited Availability: ML Auto Classifiers are a Limited Availability feature. To enable ML Auto Classifiers, contact Skyhigh Support

Auto Classifiers, powered by Artificial Intelligence (AI) and Machine Learning (ML), is an advanced DLP feature in the Classification Editor. It automatically detects and categorizes sensitive files based on Skyhigh pre-trained AI and ML models. Skyhigh uses two types of ML Auto Classifiers, text and image classifiers to identify various types of sensitive files in sanctioned and shadow/web services. With ML Auto Classifiers, you can identify sensitive files such as financial reports and statements, patient records, patents, source code, and ID files in various file formats. For details on the supported file formats, see Supported File Formats.

NOTE: ML Auto Classifiers are supported for files uploaded to sanctioned services and are not supported for email bodies, email headers, or other content types. For details, see FAQs.

 

 

ML Auto Classifiers provide a quick and effective method to discover sensitive data in real-time, empowering organizations with granular DLP policy controls. They simplify identifying and classifying sensitive data and are helpful for administrators unfamiliar with complex DLP rules such as regular expression, dictionaries, and more. This automated data classification approach can minimize the chances of inaccurate matches that could lead to false positives and negatives, making it an effective way to ensure accurate results. 

Security Operations Center (SOC) analysts can gain insights into the matches for ML Auto Classifiers, which enables them to reduce their investigation time and efficiently respond to data loss incidents. These capabilities enable SOC analysts to identify and mitigate potential security threats within their organization proactively.

► Advantages of ML Auto Classifiers
  • AI-ML Powered Automatic Data Discovery and Classification. Automatically discovers and classifies files with sensitive data such as PII, financial records, healthcare records, and intellectual property using AI and ML models. 
  • Comprehensive Categorization. Utilizes AI and ML to automatically categorize data across all exfiltration vectors, enhancing data governance.
  • Robust Policy Framework. Leverages the categories and subcategories for ML Auto Classifiers within the policy framework to build robust DLP policies.
  • Simplified DLP Administration. Streamlines DLP management by eliminating the need for manual data classification.
  • Enhanced Operational Efficiency. Significantly boosts operational efficiency in incident management.
  • Scalability. Provides flexible scalability to support large data volumes across standard file formats. 
  • Confidence. Offers clear insights into the confidence percentage in data classification, reducing the risk of data breaches. 
  • Risk Reduction. Minimize the risk of inaccurate matches, preventing false positives/negatives.

IMPORTANT: Skyhigh Security does not use your confidential data to train its AI and ML models for ML Auto Classifiers.

 

For example, a Security Operations Center (SOC) may want to restrict the upload of sensitive ID files such as passports to Google Drive. To achieve this use case, the SOC must first create a classification using the ML Auto Classifier condition and select the PII category including the ID Documents (Image) subcategory for ML auto classifier on the Classifications page. Subsequently, the SOC can use the newly created ML Auto Classifier classification in their sanctioned DLP policy for Google Drive to apply the block response action. This enables admins to identify and secure ID files such as passports, driver's licenses, and more containing PII data uploaded to Google Drive.

Getting Started

Follow these steps to get started with the ML Auto Classifiers feature:

  1. Create a classification using the ML Auto Classifier condition. For details, see Create a Classification using ML Auto Classifier.
  2. Create a Sanctioned or Shadow/Web DLP policy using the newly created ML Auto Classifier classification. For details, see Create a Sanctioned or Shadow/Web DLP policy.
  3. View the matches triggered for ML Auto Classifiers and their confidence percentages in the Sanctioned or Shadow/Web DLP Incident cloud card. For details, see  Sanctioned or Shadow/Web DLP Incident cloud card. 

Supported File Formats

Skyhigh Security supports the following file formats for detecting sensitive files, text and image based files, in sanctioned and shadow/web services using ML Auto Classifiers.

► Supported File Formats
File Category File Format MIME Type Extension(s)
Text Files

 

 

 

 

 

 

 

 

Spreadsheet Files

 

 

 

 

 

 

Microsoft Excel

 

 

 

VND.MS-EXCEL XLS, XLT, and XLA
VND.OPENXMLFORMATS-OFFICEDOCUMENT.SPREADSHEETML.SHEET XLSX

VND.OPENXMLFORMATS-OFFICEDOCUMENT.SPREADSHEETML.TEMPLATE

XLTX

VND.MS-EXCEL.ADDIN.MACROENABLED.12

XLAM

VND.MS-EXCEL.SHEET.BINARY.MACROENABLED.12

XLSB

OpenDocument Spreadsheet Document

VND.OASIS.OPENDOCUMENT.SPREADSHEET ODS

Comma-separated values (CSV)

Text/CSV

CSV

 

 

 

 

 

 

 

 

 

Presentation Files 

 

 

 

 

 

 

 

Microsoft PowerPoint

VND.MS-POWERPOINT PPT, PPS, POT, and PPA
VND.OPENXMLFORMATS-OFFICEDOCUMENT.PRESENTATIONML.PRESENTATION PPTX
VND.OPENXMLFORMATS-OFFICEDOCUMENT.PRESENTATIONML.TEMPLATE POTX
VND.OPENXMLFORMATS-OFFICEDOCUMENT.PRESENTATIONML.SLIDESHOW PPSX
VND.MS-POWERPOINT.TEMPLATE.MACROENABLED.12 POTM
VND.MS-POWERPOINT.PRESENTATION.MACROENABLED.12 PPTM

VND.MS-POWERPOINT.SLIDESHOW.MACROENABLED.12
PPSM
OpenDocument Presentation Document VND.OASIS.OPENDOCUMENT.PRESENTATION ODP

 

 

 

 

 

 

 

Word Files

 

 

 

 

 

Microsoft Word

MSWORD DOC, DOT
VND.OPENXMLFORMATS-OFFICEDOCUMENT.WORDPROCESSINGML.DOCUMENT DOCX
VND.OPENXMLFORMATS-OFFICEDOCUMENT.WORDPROCESSINGML.TEMPLATE DOTX
VND.MS-WORD.DOCUMENT.MACROENABLED.12 DOCM
VND.MS-WORD.TEMPLATE.MACROENABLED.12 DOTM
OpenDocument Text Document VND.OASIS.OPENDOCUMENT.TEXT ODT
Rich Text Format RTF     RTF    

PDF Files

Adobe Portable Document Format PDF PDF
Image Files

 

Standard Image Files N/A JPEG, PNG, TIFF, BMP, FITS, GIF, JP2, WEBP, X-DCX, X-PCX, X-PHOTO-CD, X-PORTABLE-BITMAP, X-RGB, and X-TARGA
Adobe Photoshop Files VND.ADOBE.PHOTOSHOP PSD

FAQs

► Does Skyhigh Security use your confidential data to train or build its AI and ML models for ML Auto Classifiers?
Skyhigh Security does not use your confidential data to train or build its AI and ML models for ML Auto Classifiers.

► What is the character limit for scanning text-based files using ML Auto Classifiers?
ML Auto Classifiers need at least 250 characters and can scan up to 50 million characters in text-based files.

► What is the image size and image resolution required for scanning image-based files using ML Auto Classifiers?
ML Auto Classifiers need at least 1024 bytes (1 kilobyte) and can scan up to 50 megabytes (MB) in image-based files. The image resolution should be at least 200 pixels in width and height.

► What are the content types supported by ML Auto Classifiers?
ML Auto Classifiers support the following content types:
  • Files uploaded to sanctioned services
  • Web service submissions
  • Web POST bodies
  • Email attachments

► What are the content types that are not supported by ML Auto Classifiers?
ML Auto Classifiers do not support the following content types:
  • Email bodies
  • Email headers
  • Subject lines
  • Web headers
  • Images embedded in PDF when OCR is enabled

► What are the file formats supported by ML Auto Classifiers?
ML Auto Classifiers support various file formats to detect various types of sensitive files, text and image based files, in sanctioned and shadow/web services. For details, see Supported File Formats.

► Do ML Auto Classifiers send your confidential data to an external LLM (Large Language Model) based service for processing?
No, ML Auto Classifiers do not send your confidential data to any external LLM based service for processing.

► What are the AI and ML techniques used in Skyhigh pre-trained models for ML Auto Classifiers?
Skyhigh Data Scientists evaluate various ML techniques and select the one that provides the highest accuracy for each ML Auto Classifier. These methods include both multi-class classifiers and binary classifiers using various supervised and unsupervised learning techniques. Skyhigh trains and validates its models for ML Auto Classifiers using diverse datasets from multiple sources.

► Are ML Auto classifiers supported for Secure Web Gateway (Cloud and On-Prem) and CASB DLP policies?
Yes, ML Auto Classifiers are supported for SWG (Cloud) and CASB DLP policies but are not supported for SWG (On-Prem) DLP policies, as data classifications are not supported in SWG (On-Prem) appliances.

► Are ML Auto Classifiers supported for a Trellix ePolicy Orchestrator (ePO) integration use case by syncing your existing DLP classifications from Trellix ePO to Skyhigh?
No, ML Auto Classifiers are supported only for classifications defined in the Skyhigh console (Classifications). 

► Are ML Auto Classifiers supported with Data Identifiers?
No, ML Auto Classifiers are not supported with Data Identifiers, as Data identifiers are legacy DLP features that will no longer be supported after June 2025. Skyhigh recommends using a classification-based approach for all your DLP use cases. For details, see Migration Guide for Legacy Data Identifiers.

► Can ML Auto Classifiers process large files?
Yes, ML Auto Classifiers can process large text-based and image-based files up to 50 MB for classification.

► Can you create custom ML Auto Classifiers currently?
No, you cannot create custom ML Auto Classifiers currently.

► Do ML Auto Classifiers detect classified data in all languages or only in English?
Text-based ML-Auto Classifiers such as Financial Reports/Statements, Patient Records, Patents, and Source Code classifiers can detect classified data only in English. Whereas, image-based ML-Auto Classifiers such as the ID Documents classifier can detect classified data in all languages.

► Does Skyhigh Security support sample data in the plain text and file formats uploaded to the Classification Tester for ML Auto Classifiers?
Skyhigh supports sample data in file format but does not support sample data in plain text format uploaded to the Classification Tester for ML Auto Classifiers.

  • Was this article helpful?