How to Optimize DLP High Speed Discovery
Provision smarter, scan faster and stay compliant with built-in reporting
Managing extensive data sets within a limited timeframe demands precision and speed. But ongoing compliance with the latest regulations requires regularly scanning vast data repositories, often packed with static files. To achieve accurate and high-speed discovery, you need to know how to fine-tune the right configurations.
Tasked with implementing a High Speed Discovery (HSD) set-up, Data Loss Prevention (DLP) administrators may find themselves in one of these three scenarios:
- You’re new to HSD: Exploring or implementing the solution for the first time.
- You’re achieving desired scan speeds: The HSD cluster has been provisioned and is delivering the expected scan performance, but you’re unsure if the cluster is overprovisioned and can be safely scaled back.
- You’re not achieving desired scan speeds: The HSD cluster is provisioned but underperforming.
No matter where you fall, Symantec’s HSD solution addresses all these scenarios.
Ensuring success with lightning-fast scanning
Introduced with Symantec DLP 16, this enhancement in data-at-rest scanning is designed to deliver speeds of 1 TB per hour or more. However, provisioning the solution is only the first step—optimization is just as important. With insights gleaned from the Scan Details report, administrators can refine their data protection strategies and unlock the full potential of lightning-fast scanning.
In simple terms, an HSD solution comprises a Data Node and one or multiple Worker Nodes (WN) intended to scan large volumes of data. However, various factors can impact the scan throughput.
Among many parameters, scan throughput primarily depends on:
- Network speed
- Repository load
- Data type
- Policy complexity
- Hardware specs
- Disk I/O
Understanding how these factors affect scan throughput helps properly size a Network Discover Cluster for your data repository. The Scan Details report generated by HSD scan offers valuable insights into these key indicators that can help inform adjustments for optimization.
What insights can the Scan Details report provide?
Network Discover scans follow four phases to detect and remediate the sensitive data: crawling, content fetching, detection and remediation. The downloadable Scan Details report details specific parameters for each phase that reflect its performance and health.
However, you don’t need to understand every parameter in detail to identify where tuning makes sense. The Cluster Metrics section of the report simplifies this with actionable insights into cluster resource utilization for the Network Discover Cluster Metrics. Each entry in the Cluster Metrics table reflects cumulative data collected over a 3-minute interval and covers a wide range of indicators which can be used to take the right action to achieve equilibrium.
Take data-driven action with Cluster Metrics
Keep in mind that your scan should be of a reasonable duration to gather meaningful data for analysis. Let’s look at two common Cluster Metrics report results requiring administrators to take action.
Scenario A: The detection queue backlog or the detection wait time (ms) shows a higher count or wait time.
These parameters could indicate whether a cluster is under- or over-provisioned. A large backlog of unprocessed items in the detection queue or higher detection wait time suggests adding additional WN could boost scan throughput. Conversely, very few items or zero or very low detection wait time in the detection queue consistently may indicate that the cluster is over-provisioned, allowing the possibility of offloading some WNs.
The content fetching speed (MB/sec) parameter refers to the rate at which the files are being downloaded from the target repository per second and indicates the cumulative download size for a Network Discover Cluster across all WNs. The bytes downloaded indicates the total bytes downloaded from the target repository at a given point in time. If your Cluster Metrics report indicates slower content fetching speeds and/or fewer bytes downloaded consistently, it is an indication that the target repository or the network bandwidth could need further analysis for potential slowness.
There are many other parameters in this report which, when correlated, could allow administrators to fine-tune the cluster further. To learn more about these, contact Symantec.
Kindly note that the above scenarios and guidelines offer general advice for more effective use of system resources (e.g. number of nodes, hardware resources) for the servers already configured in your cluster and for making informed decisions to tune it. These are guidelines and should not be considered as a definitive formula for achieving your targeted goal.
How this approach works in the real world
Consider a scenario with a Discover Cluster consisting of 10 WNs. DLP administrators can assess parameter insights to make informed decisions and fully leverage the deployment’s potential. For instance, if content fetching speed is slow and the repository has reached maximum capacity—making additional WNs ineffective—and each WN’s CPU utilization averages around 30%, reducing the WN count by half may be beneficial. This change is likely to increase CPU utilization to about 60% per WN. From there, administrators can adjust the number of WNs incrementally while monitoring other factors.
This data-led strategy helps organizations optimize their DLP investments and make the most of deployed hardware, but success hinges on set-up. As with many tools, incorrect configuration may lead to higher hardware costs without improving performance. And scan throughput results may vary by environment and other factors. So, these above guidelines offer a starting point on interpreting the report and fine-tuning the DLP environment. These indicators are tailored to the specific run and environment, and recommended solutions may differ depending on content types or DLP environments/clusters.
For additional details, the references below provide insights on various tunable parameters.
We encourage you to share your thoughts on your favorite social platform.