Skip to main content

Check out Interactive Visual Stories to gain hands-on experience with the SSE product features. Click here.

Skyhigh Security

Create Custom Advanced Patterns using Add Regex

You can use this option to add regex manually and then validate it using the validation algorithm. Additionally, you can use the BIN or Luhn validator to manage your credit or debit cards' Bank Identification Numbers (BIN). To minimize the number of false positives, you can add Ignored Expressions to exclude specific keywords or regular expressions from being processed as matches in DLP classifications.

To manually add regex to your Custom Advanced Patterns:

  1. Go to Policy > DLP Policies > Classifications.
  2. Click Create Classification
  3. Classification Name. Enter a name for this classification. For example, New Advanced Pattern. Enter an optional description to describe its use or purpose.
  4. Category. Select a Category from the list.  For example, Sensitive.
  5. Conditions. Click Select Criteria and choose Advanced Pattern. The Select Advanced Patterns cloud card displays.
  6. Condition Operators. Select an operator for your condition based on your use case. For example, select is for Advanced Pattern.  To understand the functionality of each Condition operator, see Condition Operators for DLP Classification.
    • Count each match string only one time. When you select this checkbox, a string that matches the advanced pattern in the advanced pattern rule will not be counted again. To learn more about the use case, see Count each match string only one time feature.

NOTE: If you enable the Count each match string only one time checkbox, the unique match criteria apply to each advanced pattern in the classification. For example, if your classification has two advanced patterns with the same regular expression, then the classification will trigger two separate matches for the same regular expression.

 

  1. Click New.

    clipboard_ea8bc81b72199dcf3f937b8bb4189123b.png
     
  2. Enter a name and optional description for your Custom Advanced Pattern.
  3. To manually enter a regular expression, click Add RegEx.

    clipboard_e2de7115475935f55f86d08eeba48999e.png
     
  4. Enter a regular expression, your newly added regular expression can be seen on the Regular Expressions tab.
  5. To ensure your regular expressions are accurate, click No Validation to open the Validation Algorithm cloud card. 

    clipboard_e08e7bc517943c83643311548d0d9a8e4.png

NOTE: \b indicates the matches non-empty string at a word boundary. The use of \b as a separator is intentional, as there are instances where the detection of numbers may not be appropriate due to regional differences in number formatting. For example, in the UK and the US, a comma (,) is used as a thousand separator (e.g., 1,000), whereas in some other European countries, a period (.) serves this purpose (e.g., 1.000).

  1. Select the appropriate Validation Algorithm from the list and click Done. To add Luhn 10 Validation Algorithm and BINs for your custom regular expressions, click Add BIN Validator. For more details, see Add BIN Validator.

    clipboard_e02062d88c7d0678238efe89cbe6940d2.png
     
  2. Add a Score to weigh the new regex Advanced Pattern. Scores can be between negative or positive, -99 to 999. The higher the number, the greater the weight given to the keyword, which will exceed the threshold and trigger an incident. 
  3. To reduce false positives, add expressions in the Exceptions tab to exclude specific keywords or regular expressions from being processed as matches in DLP classifications. 
  4. Go to the Exceptions tab and click Add Exception.

    clipboard_e3f28da286729ffa8bfd2b6978f10ea27.png
     
  5. Enter the keywords or Google RE2 expressions (RegEx) and select the Type of the exception from the menu. To add more expressions, click Add Exception.

    clipboard_e57c14d8c73eaad695d2b861186626e9e.png
     
  6. To save your new Advanced Pattern with regular expressions and exceptions, click Save.
  7. The new Advanced Pattern is now added to the Classification and Advanced Pattern list.

    clipboard_e5d0048c0da07fa194536daab5ce18b09.png
     
  8. Optionally, you can edit the threshold by clicking [1]. Enter a number to indicate the weight of the Advanced Pattern in threshold matching.

    Reg 1.png
     
  9. Add more classification conditions as needed and click Save.

Your custom classification with custom advanced patterns, validations, and exceptions are saved to the selected category in the Classifications list. Add the classification to your DLP policies as needed.​​​​​​

NOTE: You can view events for new, updated, and deleted advanced patterns in the Audit Log. For details, see View DLP Classification Events in the Audit Log

 

Custom Advanced Pattern Use Cases

Count each match string only one time feature 

Suppose you have a bank document with multiple instances of the pattern for France IBAN and you have set the score for this regular expression as 10 in the custom advanced pattern. This means that a match will only be triggered if the pattern France IBAN appears 10 or more times in the document. However, if you want to avoid triggering matches for duplicate counts, you can activate the Count each match string only one time checkbox. During the policy evaluation, the match will count only once, even though the score for the regular expression is set to 10. To find this option on UI, see Count each match string only one time.

Reg 2.png

Set Scores for Regular Expressions on the Custom Advanced Pattern List

Let's say you have a confidential bank document containing sensitive information or patterns that should only be accessed by authorized personnel. To ensure the security of the document, you can set the scores for regular expressions that alert the DLP scanning engine with more precise information whenever someone tries to access sensitive patterns beyond a specific limit. If a match is found, an incident is triggered to maintain the document's security.

To set scores for each regular expression in a custom advanced pattern list, follow these steps:

  1. Create a classification using custom advanced patterns. Perform the initial steps of creating your advanced pattern classification as provided in steps 1 to 11 in the Create Custom Advanced Patterns using Add Regex section.
  2. Score. Once you add the necessary regular expressions, you can set different scores for each regular expression in the list by editing the default score [1]. For example, configure the scores for three regular expressions - France IBAN, German IBAN, and UK IBAN. Set the score for France IBAN to 10, German IBAN to 6and UK IBAN to 5. This means that when the patterns for France IBAN are accessed 10 or more times in the content, German IBAN is accessed 6 or more times, and UK IBAN is accessed 5 or more times then it triggers a match.

    clipboard_e9c17f4ae4aebed3a5c9e7978df5378b1.png

Re-use Regular Expressions in Custom Advanced Pattern List

Suppose you have multiple confidential documents containing common patterns, such as credit card numbers, that should only be accessed by authorized personnel. To ensure the security of these documents, you can create a custom advanced pattern list using regular expressions. This list can then be reused across classifications, eliminating the need to create or update custom advanced pattern lists repeatedly. 

To re-use regular expressions in a custom advanced pattern list:

  1. Create a classification using custom advanced patterns. Follow the steps of creating your advanced pattern classification as outlined in steps 1 to 5 in the Create Custom Advanced Patterns using Add Regex section. 
  2. On the Select Advanced Patterns cloud card, click All and select Custom.

    clipboard_e786f5ebadfd37b019694e6aa2720e9c4.png
     
  3. Select one or more existing Custom Advanced Patterns.
  4. Click i to view the Usage of the selected Advanced Patterns in other classifications.

    Reg 3.png

Exclude Matches on Keywords in Custom Advanced Pattern List

Suppose you have a financial document that contains a broad range of sensitive keywords, but you want to exclude specific keywords from being processed as matches by the DLP engine. To exclude matches on keywords, you can create a custom advanced pattern list using regular expressions and exceptions. These exceptions prevent specific keywords from triggering matches, thereby reducing false positives and ensuring accuracy in your data protection measures. 

To exclude matches on keywords in a custom advanced pattern list:

  1. Create a classification using custom advanced patterns. Follow the steps of creating your advanced pattern classification as outlined in steps 1 to 14 in the Create Custom Advanced Patterns using Add Regex section.
  2. Exception and Type. Once you add the necessary regular expressions, you can add exceptions to exclude specific keywords or regular expressions from being processed as matches by the DLP engine. For example, add exceptions such as two keywords - Account No and Balance, and add a regular expression for Spain IBAN. This means that a match will not be triggered if the keywords Account No and Balance, and patterns for Spain IBAN are accessed within the document.

    clipboard_e1b552ed8fe3c78d0761de034dbae4c49.png

Exclude Matches on Regular Expressions in Custom Advanced Pattern List

Suppose you have a financial document that contains a broad range of sensitive patterns, but you want to exclude specific patterns from being processed as matches by the DLP engine. To exclude matches on regular expressions, you can create a custom advanced pattern list using regular expressions and exceptions. These exceptions prevent specific patterns from triggering matches, thereby reducing false positives and ensuring accuracy in your data protection measures. 

To exclude matches on regular expressions in a custom advanced pattern list:

  1. Create a classification using custom advanced patterns. Follow the steps of creating your advanced pattern classification as outlined in steps 1 to 14 in the Create Custom Advanced Patterns using Add Regex section.
  2. Exception and Type. Once you add the necessary regular expressions, you can add exceptions to exclude specific keywords or regular expressions from being processed as matches by the DLP engine. For example, add exceptions such as two regular expressions - Netherlands IBAN and Italian IBAN No, and add a keyword Account No. This means that a match will not be triggered if the patterns for Netherlands IBAN and Italian IBAN, and keyword Account No are accessed within the document.

    clipboard_e9bce692f908ed24fde909e93d92ddfe5.png

Match Regular Expressions in Specific Email Sections

Suppose you have a medical email that contains a broad range of sensitive text patterns, but you want the DLP engine to match text patterns in specific sections of the email. To match regular expressions in specific sections of the email, you must first create a classification using a custom advanced pattern list of regular expressions. You can then configure a DLP policy with the newly created classification to specify the sections (Everywhere, Email Header) of the email. This enables the DLP engine to trigger matches on regular expressions in specific sections of the email, thereby reducing false positives and ensuring accuracy in your data protection measures. 

For example, create a classification using a custom advanced pattern list of regular expressions named Bank Account Numbers, and configure a sanctioned DLP policy with the new classification to specify the Email Header section of the email. This ensures that a match is only triggered if the regular expressions in the Bank Account Numbers advanced pattern list are accessed in the header section of the email.

To match regular expressions in specific email sections:

  1. Create a classification by selecting any of or all of Custom Advanced Patterns.

    Reg 4.png
     
  2. Create a Sanctioned or Shadow DLP policy using the newly created classification. For example, create a sanctioned DLP policy. 
  3. Use the Skyhigh CASB DLP policy wizard to perform the initial steps of creating your Sanctioned DLP policy as provided in steps 1 to 4 in Create a Sanctioned DLP Policy

    clipboard_e958a9a839b32eaae1b998a9d992ba0ae.png
     
  4. On the Rules & Exceptions page, configure the following:
    • Rules. For IF, select Classifications. The Select Classification cloud card appears.

      clipboard_eef16a82ff2584f62b07c0bce19d65871.png
       
      • Classification. Select the newly created classification from the list of supported classifications and click Done.

        clipboard_e7c37918b3b3ed8f109656f3fc1d9ab44.png
         
        • Location. Select Email Header. By default, All is selected.

          clipboard_e44460e38c34b1c3d612b163f35cb7f52.png
           
  5. Complete the remaining steps to configure your DLP policy as mentioned from step 5 (c) in Create a Sanctioned DLP Policy.

Named Capture Group  for Improved Data Detection

You can use custom advanced patterns to search for specific text and generate incidents upon matches. To further refine this capability, utilize enhanced regular expressions to fine-tune the matching criteria, ensuring that certain text does not trigger a match. For example, enhanced regex can be used to prevent the detection of matches preceded by a period. Refining the regular expression pattern enhances the accuracy of detection and reduces false positives without altering the matched text in your incident.  

Additionally, you can improve your regex with the Named Capture Group to identify specific portions of a regex pattern as a matched term, enhancing unique match counts and efficacy of the resulting DLP Incident.

For example, it was observed that during Credit Card Number (CCN) Policy detection, decimal numbers with high precision were incorrectly identified as matches, which was not intended. The following are examples of fine-tuning the detection process using enhanced regular expressions and Named Capture Groups to improve the accuracy and efficacy of CCN detection.

  • Reduced False Positives. The existing regex pattern (\b4\d{15}\b) triggers false positives for numbers preceded by a period (.)
    The updated pattern (?:^|[^.,\x{66b}])\b4\d{15}\b uses a non-capture group denoted by (?: pattern) to prevent the regex from matching when the number is preceded by a period, reducing false positives.

  • Accurate Unique Match Count with Named Capture Group. When the Count Match String Only Once functionality is enabled with the updated regex pattern, the character preceding the 16-digit credit card numbers (CCNs) is included in the match; therefore, it is possible to have unique matches that share the same credit card number. This can lead to duplicate match count incidents.
    To address this, the regex pattern is further refined using the Named Captured Groups pattern: (?:^|[^.])\b(?P<ccn>4\d{15})\b

    By using a Named Captured Group (?P<ccn\>4\d{15}), only the 16-digit CCN is considered as the unique matched term. This eliminates duplicate match count incidents.

For a detailed understanding of the CCN detection policy, refer to the section below.

Enhanced Regex Pattern for Visa Credit Card Number (CCN) Detection

The DLP admin is tasked with scanning documents for Visa credit card numbers, which typically start with the digit 4 and consist of 16 digits. The DLP admin employs regular expressions (regex) to identify these Visa credit card numbers. However, it is crucial to ensure that the regex does not result in DLP incidents for non-sensitive data that may resemble credit card numbers but are preceded by specific characters, such as a period (.).

To address this requirement, the DLP admin designed the regex pattern as below:

Current Regex Pattern: \b4\d{15}\b

simple.png

Example of Detections using the Current Regex Pattern

The table below provides examples of CCN matches in a document and indicates whether these matches would trigger incidents based on the current regex pattern.

Credit Card Numbers 

Matches Found

Comment

4123456789012349

1 match - trigger a DLP incident

1 match was found for this number, which is deemed valid as it starts with a 4 and consists of 16 digits. 

0.4123456789012349

1 match - trigger a DLP incident

1 match was found for this number, which is deemed invalid as it is preceded by 0. 

RESULT. The current regex pattern does not accurately match Visa CCNs, which results in a false DLP incident being triggered for non-sensitive data.
Reduce False Positives with Non-Capture Group

To refine the regex and prevent false incidents for numbers preceded by certain characters (like a period), the DLP admin used a non-capture group denoted by (?: pattern). The updated regex pattern could be:

Updated Regex Pattern:(?:^|[^.,\x{66b}])\b4\d{15}\b)

enhanced regex.png

Example of Detections using the Updated Regex Pattern

The table below provides examples of CCN matches in a document and indicates whether these matches would trigger incidents based on the updated regex pattern.

Credit Card Numbers 

Matches Found 

Comment

4123456789012349

1 match - trigger a DLP incident

1 match was found for this number, which is deemed valid as it starts with a 4 and consists of 16 digits.

0.4123456789012349

0 match - No incident

No match was found for this number, which is deemed valid as it is preceded by 0.

RESULT. This regex pattern ensures that the Visa CCN is not preceded by a period, thereby reducing false positives and improving the accuracy of sensitive data detection.
Accurate Unique Match Count with Named Capture Group

When the DLP administrator implemented the updated regex pattern with the Count Match String Only Once functionality enabled, the unique match count includes the character that follows the 16-digit credit card numbers (CCNs) which leads to duplicate match count incidents.

To address the issue of duplicates, the DLP admin implemented the Named Captured Groups in the regex pattern. 

What is the purpose of a Named Captured Group?

A named capture group allows you to specify a portion of a regex pattern that you want to capture as a matched term. This matched term will then be displayed in the incident. This feature is particularly useful when dealing with large regex patterns, as it allows you to focus on capturing a specific segment while ignoring the non-capturing parts. This results in greater precision in the matching process and simplifies the handling of complex regex patterns. 

NOTE

  • Each Advanced Pattern supports only a single named capture group. 

  • If multiple named capture groups are present, they will be disregarded during detection, and the results will be displayed for the entire regex as a single term.

The updated regex pattern could be:

Regex Pattern with Named Captured Group: (?:^|[^.])\b(?P<ccn>4\d{15})\b

Named_Capture_Group.png

Example of Detections using the Updated Regex Pattern with Unique Match Count Enabled

This table provides examples of matches identified in the document and their corresponding triggered incidents when enabled by the Unique Match Count feature.

Credit Card Numbers 

Matches Found 

Comment

4123456789012349

matches as a unique count - triggers a DLP incident 

1 match found for this number, which is deemed valid as the whole term is counted as a unique match count

0.4123456789012349

0 match - No incident

No match was found for this number, which is deemed valid as it is preceded by 0.

+4123456789012349

matches as a unique count - triggers a DLP incident 

1 match was found for this number, which is deemed as a non-unique match; +4123456789012349 is counted as a match because the "+" is part of the term but it is a duplicate match count.

RESULT. The unique match count considers the entire regex term for the matched count, including the characters preceding 16-digit Visa CCN numbers, and generates duplicate match count incidents.
Example of Detections with Named Captured Group. The below-identified CCN is found based on the inclusion of a named capture group in the regular expression:

+4123456789012349

0 match - No incident

No match was found for this number, which is deemed valid. Non-captured group ”+” is excluded from the unique match count. 

4123456789012349 - Named capture group qualifies for a unique match count 

RESULT. The named captured group regex pattern ensures only the specific portion of the CCN is considered for a match, preventing the entire regex term from being matched. This ensures that only the named capture group is considered for the unique match count, thereby preventing duplicate match count incidents.

 

  • Was this article helpful?