Skip to main content

Check out Interactive Visual Stories to gain hands-on experience with the SSE product features. Click here.

Skyhigh Security

DNS Health Check

Summary

SWG operations rely heavily on stable DNS connectivity to access servers on the web. Within the SWG Management console, administrators have the option to configure three DNS servers (Primary, Secondary, Tertiary). The SWG prioritizes the first DNS server in the list for resolving domain names. However, if the primary server experiences a failure, there is a delay in connectivity until the SWG detects the issue and switches to the next available DNS server (Secondary). This delay persists until the primary server is restored or manually updated via the user interface, which can result in an inconsistent browsing experience for end users, impacting overall reliability.

Implementing periodic DNS Health Checks within the SWG would address this issue by enabling the SWG to proactively detect DNS failures. With this feature, the SWG
can smoothly transition from an unreachable DNS server to an available one, facilitating early recovery and minimizing the impact on user experience. By ensuring a more reliable DNS resolution process, this enhancement would improve overall user satisfaction, system reliability, and network throughput.

Description

Secure Web Gateway (SWG) reads the DNS server information from the Management UI, extracts the Primary, Secondary and Tertiary DNS addresses. These addresses are then placed inside /etc/resolv.conf to be leveraged by SWG Proxy for the purpose of Domain Name resolution.

clipboard_e00427373c3d5587ab0d2bb944badeb2c.png

Therefore, by periodically polling the health of each of the DNS server entries stored inside /etc/resolv.conf, we aim to detect, if any of the DNS server is not performing, and ultimately disabled from the DNS server list within /etc/resolv.conf, preventing unnecessary failed lookups.

If in subsequent polls, the DNS server is found to be healthy again, it would be added back to the DNS server list, as described in the Re-induction section below.

The SWG DNS health check is created as a service dnshealthcheck and deployed as a daemon which periodically performs a health check against the first three nameserver entries in /etc/resolv.conf.

The below section indicates how the behavior of the health check can be further customized based on the deployment.

Tunable parameters

The service uses systemd environment variables and expects the following variables to be defined in /etc/systemd/system/dnshealthcheck.service.d/var.conf:

 

Variable name

Mandatory

Default value

Range

Description

DnsHealthCheck_F QDN

 

N/A

This variable defines the healthcheck host name. The service will try to resolve the defined FQDN via the nameserver entries in

/etc/resolv.conf. Customers are suggested to select a stable FQDN according to the environment where SWG is deployed.

 

Note: The service will fail to start if this variable is not defined explicitly.

DnsHealthCheck_In terval

5

[1,172800]

This variable defines the frequency of the healthcheck runs in seconds. Suggested Default is 5 seconds.

DnsHealthCheck_P rimaryFailureThres hold

1

[1,20]

This variable defines the threshold value for disabling the primary DNS server (more on this in the next section)

DnsHealthCheck_N onPrimaryFailureTh reshold

20

[1,20]

This variable defines the threshold value for disabling the non-primary DNS servers (more on this in the next section)

 

How DNS Healthcheck works

DNS Health Check service periodically polls the first 3 nameservers declared in /etc/resolv.conf with the interval defined by DnsHealthCheck_Interval. After the poll is over, should there be a change detected, one of the following events will occur for every DNS entry the poll has run for:

DNS Server Failure detection

Primary DNS server failures:

If the DNS server for which the resolution has failed is the primary DNS server, it is disabled after n consecutive failures, where n is defined by the tunable parameter DnsHealthCheck_PrimaryFailureThreshold.

Note: Since the Primary DNS server is the first server to be queried, we recommend setting it to a lower threshold.

Suggestion to set DnsHealthCheck_PrimaryFailureThreshold to 1

Secondary/Tertiary DNS server failures:

Similarly, if the DNS server for which the resolution has failed is a non-primary DNS server (secondary or tertiary) it is disabled as well but in that case, n is defined by the tunable parameter DnsHealthCheck_NonPrimaryFailureThreshold

Note: Since Secondary and Tertiary DNS servers are not the first ones to be queried, we recommend setting it to a slightly higher threshold so as to restrict frequent DNS configuration being reloaded to Proxy.

Suggestion to set DnsHealthCheck_PrimaryFailureThreshold to 20

In the case, where primary DNS server gets disabled due to health check failure, the Secondary DNS server will switch to accepting ‘DnsHealthCheck_PrimaryFailureThreshold’ since, this will now become our first DNS server to be queried, and so should be subjected to Primary DNS Failure check frequency.

Likewise, a similar flow would repeat for Tertiary DNS server, if both Primary and Secondary were detected to be failed.

DNS Server Re-induction

The DNS server for which the resolution is found to be successful, will be re-inducted into /etc/resolv.conf, while preserving the order as found in the SWG Management UI under the sections. The order of the DNS server will then be matched against the sequence as configured in the Primary, Secondary and Tertiary DNS servers. We will try to preserve the order as per figure described in the ‘Description’ section above

How to setup DNS Healthcheck for the first time

  1. Login to SWG shell
  2. Open the DNS healthcheck configuration file

/etc/systemd/system/dnshealthcheck.d/vars.conf in any text editor

  1. Create/Change the configuration as per the current deployment’s need. Below is a sample configuration:

[Service] Environment="DnsHealthCheck_FQDN=skyhighsecurity.com"

Environment="DnsHealthCheck_Interval=5"

Environment="DnsHealthCheck_PrimaryFailureThreshold=1"

Environment="DnsHealthCheck_NonPrimaryFailureThreshold=20"

Note: Changing the threshold values anything other than the recommended values might affect the efficiency of the program. Please consult support before making a change in any production environment.

  1. Save and close the file.
  2. Run systemctl daemon-reload
  3. Run systemctl start dnshealthcheck

How to change DNS Healthcheck configuration

  1.  Follow steps 1 to 5 under How to setup DNS Healthcheck for the first time
  2. Run systemctl restart dnshealthcheck

How to stop DNS Healthcheck service

  1. Run systemctl stop dnshealthcheck

How to enable DNS Healthcheck service at boot

  1.  Run systemctl enable dnshealthcheck

Troubleshooting 

The logs generated by dnshealthcheck service are written to /var/log/dnshealthcheck.log

  1. Service loaded the configuration successfully

Configuration:
            [Service]
            Environment="DnsHealthCheck_FQDN=skyhighsecurity.com"
            Environment="DnsHealthCheck_Interval=5"
            Environment="DnsHealthCheck_PrimaryFailureThreshold=1"
            Environment="DnsHealthCheck_NonPrimaryThreshold=20"

Log:

clipboard_e7fe6e41e213d290dd1a4f04eeeffdaa8.png

  1. Service failed to load the configuration file due to a mandatory field missing from the configuration

Configuration:
            [Service]
            Environment="DnsHealthCheck_Interval=5"
            Environment="DnsHealthCheck_PrimaryFailureThreshold=1"
            Environment="DnsHealthCheck_NonPrimaryThreshold=20"

Log: 

clipboard_e8585063f66e6dff7726dce2498ce60de.png

  1. Service failed to load the configuration file due to a value being out of range
    Configuration:

            [Service]
            Environment="DnsHealthCheck_FQDN=skyhighsecurity.com"
            Environment="DnsHealthCheck_Interval=500000"
            Environment="DnsHealthCheck_PrimaryFailureThreshold=1"
            Environment="DnsHealthCheck_NonPrimaryThreshold=2

Log:

clipboard_e4c4691e5583522e651adcf3212b1a493.png

  1. Service loaded the configuration successfully with default values
    Configuration:

            [Service]
            Environment="DnsHealthCheck_FQDN=skyhighsecurity.com"

 

Log:

clipboard_e04d99ce56b577dfda9a9269177dac874.png

  1. Service disables a nameserver after the threshold has crossed

clipboard_e09f2c1e29d2d4a6b96ad3c746dfaedaa.png

Note: The text in the above log might change in the future, if that happens this document will be revised as well.

  • Was this article helpful?