ML Auto Classifiers
Limited Availability: ML Auto Classifiers are a Limited Availability feature. To enable ML Auto Classifiers, contact Skyhigh Support. |
Auto Classifiers, powered by Artificial Intelligence (AI) and Machine Learning (ML), is an advanced DLP feature in the Classification Editor. It automatically detects and categorizes sensitive files based on Skyhigh pre-trained AI and ML models. Skyhigh uses two types of ML Auto Classifiers, text and image classifiers to identify various types of sensitive files in sanctioned and shadow/web services. With ML Auto Classifiers, you can identify sensitive files such as financial reports and statements, patient records, patents, source code, and ID files in various file formats. For details on the supported file formats, see Supported File Formats.
NOTE: ML Auto Classifiers are supported for files uploaded to sanctioned services and are not supported for email bodies, email headers, or other content types. For details, see FAQs.
ML Auto Classifiers provide a quick and effective method to discover sensitive data in real-time, empowering organizations with granular DLP policy controls. They simplify identifying and classifying sensitive data and are helpful for administrators unfamiliar with complex DLP rules such as regular expression, dictionaries, and more. This automated data classification approach can minimize the chances of inaccurate matches that could lead to false positives and negatives, making it an effective way to ensure accurate results.
Security Operations Center (SOC) analysts can gain insights into the matches for ML Auto Classifiers, which enables them to reduce their investigation time and efficiently respond to data loss incidents. These capabilities enable SOC analysts to identify and mitigate potential security threats within their organization proactively.
- ► Advantages of ML Auto Classifiers
-
- AI-ML Powered Automatic Data Discovery and Classification. Automatically discovers and classifies files with sensitive data such as PII, financial records, healthcare records, and intellectual property using AI and ML models.
- Comprehensive Categorization. Utilizes AI and ML to automatically categorize data across all exfiltration vectors, enhancing data governance.
- Robust Policy Framework. Leverages the categories and subcategories for ML Auto Classifiers within the policy framework to build robust DLP policies.
- Simplified DLP Administration. Streamlines DLP management by eliminating the need for manual data classification.
- Enhanced Operational Efficiency. Significantly boosts operational efficiency in incident management.
- Scalability. Provides flexible scalability to support large data volumes across standard file formats.
- Confidence. Offers clear insights into the confidence percentage in data classification, reducing the risk of data breaches.
- Risk Reduction. Minimize the risk of inaccurate matches, preventing false positives/negatives.
IMPORTANT: Skyhigh Security does not use your confidential data to train its AI and ML models for ML Auto Classifiers.
For example, a Security Operations Center (SOC) may want to restrict the upload of sensitive ID files such as passports to Google Drive. To achieve this use case, the SOC must first create a classification using the ML Auto Classifier condition and select the PII category including the ID Documents (Image) subcategory for ML auto classifier on the Classifications page. Subsequently, the SOC can use the newly created ML Auto Classifier classification in their sanctioned DLP policy for Google Drive to apply the block response action. This enables admins to identify and secure ID files such as passports, driver's licenses, and more containing PII data uploaded to Google Drive.
Getting Started
Follow these steps to get started with the ML Auto Classifiers feature:
- Create a classification using the ML Auto Classifier condition. For details, see Create a Classification using ML Auto Classifier.
- Create a Sanctioned or Shadow/Web DLP policy using the newly created ML Auto Classifier classification. For details, see Create a Sanctioned or Shadow/Web DLP policy.
- View the matches triggered for ML Auto Classifiers and their confidence percentages in the Sanctioned or Shadow/Web DLP Incident cloud card. For details, see Sanctioned or Shadow/Web DLP Incident cloud card.
Supported File Formats
Skyhigh Security supports the following file formats for detecting sensitive files, text and image based files, in sanctioned and shadow/web services using ML Auto Classifiers.
- ► Supported File Formats
-
File Category File Format MIME Type Extension(s) Text Files Spreadsheet Files
Microsoft Excel
VND.MS-EXCEL XLS, XLT, and XLA VND.OPENXMLFORMATS-OFFICEDOCUMENT.SPREADSHEETML.SHEET XLSX VND.OPENXMLFORMATS-OFFICEDOCUMENT.SPREADSHEETML.TEMPLATE
XLTX
VND.MS-EXCEL.ADDIN.MACROENABLED.12
XLAM
VND.MS-EXCEL.SHEET.BINARY.MACROENABLED.12
XLSB
OpenDocument Spreadsheet Document
VND.OASIS.OPENDOCUMENT.SPREADSHEET ODS Comma-separated values (CSV)
Text/CSV
CSV
Presentation Files
Microsoft PowerPoint
VND.MS-POWERPOINT PPT, PPS, POT, and PPA VND.OPENXMLFORMATS-OFFICEDOCUMENT.PRESENTATIONML.PRESENTATION PPTX VND.OPENXMLFORMATS-OFFICEDOCUMENT.PRESENTATIONML.TEMPLATE POTX VND.OPENXMLFORMATS-OFFICEDOCUMENT.PRESENTATIONML.SLIDESHOW PPSX VND.MS-POWERPOINT.TEMPLATE.MACROENABLED.12 POTM VND.MS-POWERPOINT.PRESENTATION.MACROENABLED.12 PPTM
VND.MS-POWERPOINT.SLIDESHOW.MACROENABLED.12PPSM OpenDocument Presentation Document VND.OASIS.OPENDOCUMENT.PRESENTATION ODP Word Files
Microsoft Word
MSWORD DOC, DOT VND.OPENXMLFORMATS-OFFICEDOCUMENT.WORDPROCESSINGML.DOCUMENT DOCX VND.OPENXMLFORMATS-OFFICEDOCUMENT.WORDPROCESSINGML.TEMPLATE DOTX VND.MS-WORD.DOCUMENT.MACROENABLED.12 DOCM VND.MS-WORD.TEMPLATE.MACROENABLED.12 DOTM OpenDocument Text Document VND.OASIS.OPENDOCUMENT.TEXT ODT Rich Text Format RTF RTF PDF Files
Adobe Portable Document Format PDF PDF Image Files Standard Image Files N/A JPEG, PNG, TIFF, BMP, FITS, GIF, JP2, WEBP, X-DCX, X-PCX, X-PHOTO-CD, X-PORTABLE-BITMAP, X-RGB, and X-TARGA Adobe Photoshop Files VND.ADOBE.PHOTOSHOP PSD
FAQs
- ► Does Skyhigh Security use your confidential data to train or build its AI and ML models for ML Auto Classifiers?
- Skyhigh Security does not use your confidential data to train or build its AI and ML models for ML Auto Classifiers.
- ► What is the character limit for scanning text-based files using ML Auto Classifiers?
- ML Auto Classifiers need at least 250 characters and can scan up to 50 million characters in text-based files.
- ► What is the image size and image resolution required for scanning image-based files using ML Auto Classifiers?
- ML Auto Classifiers need at least 1024 bytes (1 kilobyte) and can scan up to 50 megabytes (MB) in image-based files. The image resolution should be at least 200 pixels in width and height.
- ► What are the content types supported by ML Auto Classifiers?
- ML Auto Classifiers support the following content types:
- Files uploaded to sanctioned services
- Web service submissions
- Web POST bodies
- Email attachments
- ► What are the content types that are not supported by ML Auto Classifiers?
- ML Auto Classifiers do not support the following content types:
- Email bodies
- Email headers
- Subject lines
- Web headers
- Images embedded in PDF when OCR is enabled
- ► What are the file formats supported by ML Auto Classifiers?
- ML Auto Classifiers support various file formats to detect various types of sensitive files, text and image based files, in sanctioned and shadow/web services. For details, see Supported File Formats.
- ► Do ML Auto Classifiers send your confidential data to an external LLM (Large Language Model) based service for processing?
- No, ML Auto Classifiers do not send your confidential data to any external LLM based service for processing.
- ► What are the AI and ML techniques used in Skyhigh pre-trained models for ML Auto Classifiers?
- Skyhigh Data Scientists evaluate various ML techniques and select the one that provides the highest accuracy for each ML Auto Classifier. These methods include both multi-class classifiers and binary classifiers using various supervised and unsupervised learning techniques. Skyhigh trains and validates its models for ML Auto Classifiers using diverse datasets from multiple sources.
- ► Are ML Auto classifiers supported for Secure Web Gateway (Cloud and On-Prem) and CASB DLP policies?
- Yes, ML Auto Classifiers are supported for SWG (Cloud) and CASB DLP policies but are not supported for SWG (On-Prem) DLP policies, as data classifications are not supported in SWG (On-Prem) appliances.
- ► Are ML Auto Classifiers supported for a Trellix ePolicy Orchestrator (ePO) integration use case by syncing your existing DLP classifications from Trellix ePO to Skyhigh?
- No, ML Auto Classifiers are supported only for classifications defined in the Skyhigh console (Classifications).
- ► Are ML Auto Classifiers supported with Data Identifiers?
- No, ML Auto Classifiers are not supported with Data Identifiers, as Data identifiers are legacy DLP features that will no longer be supported after June 2025. Skyhigh recommends using a classification-based approach for all your DLP use cases. For details, see Migration Guide for Legacy Data Identifiers.
- ► Can ML Auto Classifiers process large files?
- Yes, ML Auto Classifiers can process large text-based and image-based files up to 50 MB for classification.
- ► Can you create custom ML Auto Classifiers currently?
- No, you cannot create custom ML Auto Classifiers currently.
- ► Do ML Auto Classifiers detect classified data in all languages or only in English?
- Text-based ML-Auto Classifiers such as Financial Reports/Statements, Patient Records, Patents, and Source Code classifiers can detect classified data only in English. Whereas, image-based ML-Auto Classifiers such as the ID Documents classifier can detect classified data in all languages.
- ► Does Skyhigh Security support sample data in the plain text and file formats uploaded to the Classification Tester for ML Auto Classifiers?
- Skyhigh supports sample data in file format but does not support sample data in plain text format uploaded to the Classification Tester for ML Auto Classifiers.