Blue Coat ProxySG – ICAP, deferred scanning, and data trickling

BlueCoatRecently I was digging into a BlueCoat ProxySG / ProxyAV setup for ICAP and noticed some things that had room for improvement. Not a major overhaul, but some things that were missed from the best practices guide that just so happened to be causing a bit of an issue. Below is part of the small case study I completed to explain the options and differences between them, as well as my recommendations to management on how to proceed.


Scope:

At least once a month, if not more, I would hear from the HelpDesk or network team that the proxy load balancers were alerting that the “Real Server Is Down” (can’t resolve www.google.com). This alert is often triggered when a ProxySG can no longer process HTTP requests. In many cases, a ProxySG can no longer process HTTP requests due to ICAP connections being held open to a ProxyAV, which may be waiting for the end of an infinite stream, that will never arrive. This will inevitably hold one, or many, of the ICAP connections on a ProxySG and once they are filled up, the box must be rebooted (a BlueCoat support recommendation).

This document is to provide information from BlueCoat Proxy SG/AV Integration Guide and my research, as well as my recommendation as to which available option should be deployed to mitigate these issues.

Conserving Scanning Resources

HTTP Web objects range from very small to very large in size, and for each scanned object, a scanning resource (connection) is used on the ProxyAV. Some objects, referred to as infinite streams or slow downloads, do not have finite object ends. For example, a stock ticker is an infinite data stream that is transmitted over HTTP using a Web browser.

Since the ProxyAV has a finite number of ICAP connections available at any given time, attempting to scan this type of data can potentially consume significant time and ProxyAV resources (potentially slowing other scans)—until an error is returned. If allowed to continue, these transfers fail with one of the following ICAP error codes:

• Maximum file size exceeded
• Scan timeout

The default configuration of the ProxyAV triggers such errors after the file size exceeds 100MB or after 800 seconds of scanning. While these settings are appropriate for other types of Web objects, they don’t work for infinite streams such as Web cams and stock tickers. To conserve system resources and prevent scanning of infinite streams, select either solution A or solution B listed below. Each solution offers a different approach and should not be used concurrently.

Solution A: No-Scan Policy

To enhance user satisfaction and achieve maximum performance from the ProxyAV, some customers choose not to scan data streams that are known to cause issues. One benefit of this policy is reduced load on the ProxyAV. The risk is that the exemption could potentially allow malicious content to slip viruses through unscanned.

Note: If you have enabled deferred scanning as recommended in “Task 3: Enable Malware Scanning” on page 4-27, this policy is not required. The deferred scanning capability on the ProxyAV prevents scanning of excessively large or slow downloads.

This policy is based on request/response patterns that indicate an overly large or slow download. This policy assumes that the ICAP response rule is already defined; the actions in the policy will reset it back to (no) upon an attempt to scan a streaming object or an object that shouldn’t be scanned. The policy looks for very long content or objects in which no content length is provided; these are signs that this object may tie up ProxyAV resources. The policy also looks for common infinite streaming media types as well as user agents that are known to cause scanning problems. In addition, the policy defines URL domains that shouldn’t be scanned (such as finance.google.com and youtube.com) because they are known to contain infinite streams.

Solution B: Scan-Until-Error Policy

Some administrators choose to wait for one of the symptomatic errors (Maximum file size exceeded or Scan timeout) to occur and then serve the data stream unscanned. This approach ensures that all data is still sent to the ProxyAV—thus, the maximum amount of scanning can occur.

The downside to this approach is that all requests for infinite data streams must reach the maximum file size or scan timeout configured on the ProxyAV. If a sufficient number of concurrent requests for such data streams occur, the request queue will slow or delay other traffic. This policy example serves the data stream if the error is Maximum file size exceeded or Scan timeout; other errors are denied. Blue Coat has written the CPL for this policy and you can download the file, customize it for your own needs, and install it on your ProxySG.

Improving the User Experience

To avoid having users abort and reinitiate their web requests due to scanning delays, we can provide feedback to let users know that scanning is in progress. This feedback can take the form of a patience page (not compatible with infinite streams), or we can use data trickling and deferred scanning to mitigate scanning delays.

Patience Pages
Patience pages are HTML pages displayed to the user if an ICAP content scan exceeds the specified duration. For example, the HTML page can display an informative message, such as

The content of the page you requested is currently being scanned. Please be patient…

You can configure the content of these pages to include a custom message and a help link. Patience pages refresh every five seconds and disappear when object scanning is complete.

Patience pages are not compatible with infinite stream connections—or live content streamed over HTTP—such as a webcam or video feed. ICAP scanning cannot begin until the object download completes. Because this never occurs with this type of content, the ProxySG continues downloading until the maximum ICAP file size limit is breached. At that point, the ProxySG either returns an error or attempts to serve the content to the client (depending on fail open/closed policy). However, even when configured to fail open and serve the content, the delay added to downloading this large amount of data is often enough to cause the user to give up before reaching that point. See Chapter 9: Configuration Best Practices for some alternate solutions.

Data Trickling
Patience pages provide a solution to appease users during relatively short delays in object scans. However, scanning relatively large objects, scanning objects over a smaller bandwidth pipe, or high loads on servers might disrupt the user experience because connection timeouts occur. To prevent such time-outs, you can allow data trickling to occur. Depending on the trickling mode you enable, the ProxySG either trickles—or allows at a very slow rate—bytes to the client at the beginning of the scan or near the very end.

The ProxySG begins serving server content without waiting for the ICAP scan result. However, to maintain security, the full object is not delivered until the results of the content scan are complete (and the object is determined to not be infected).

Note: This feature is supported for HTTP/HTTPS connections but data trickling for FTP connections is not supported.

Trickling Data From the Start
In trickle from start mode, the ProxySG buffers a small amount of the beginning of the response body. As the
ProxyAV continues to scan the response; the ProxySG allows one byte per second to the client.

After the ProxyAV completes its scan:

• If the object is deemed to be clean (no response modification is required), the ProxySG sends the rest of the object bytes to the client at the best speed allowed by the connection.
• If the object is deemed to be malicious, the ProxySG terminates the connection and the remainders of the response object bytes are not sent to the client.

This method is the more secure option because the client receives only a small amount of data pending the outcome of the virus scan. However, the drawback is that users might become impatient, especially if they notice the browser display of bytes received. They might assume the connection is poor or the server is busy, close the client, and restart a connection.

Trickling Data at the End
In trickle at end mode, the ProxySG sends the response to the client at the best speed allowed by the connection, except for the last 16 KB of data. As the ProxyAV performs the content scan, the ProxySG allows one byte per second to the client.

After the ProxyAV completes its scan:

• If the object is deemed to be clean (no response modification is required), the ProxySG sends the rest of the object bytes to the client at the best speed allowed by the connection.
• If the object is deemed to be malicious, the ProxySG terminates the connection and the remainders of the response object bytes are not sent to the client.

Blue Coat recommends this method for media content, such as Flash objects. This method is more user-friendly than trickle at start because users tend to be more patient when they notice that 99 percent of the object is downloaded. Therefore, they are less likely to perform a connection restart. However, network administrators might perceive this method as the less secure method, as a majority of the object is delivered before the results of the ICAP scan.

Deciding Between Data Trickling and Patience Pages

Depending upon the type of traffic, the ProxySG configuration options plus policy allow you to provide different ICAP feedback actions:

• For interactive traffic, that is a request involving a Web browser, you can use either data trickling or a patience page.
• For Non-interactive traffic, that is a request that originates from a non-browser based application, such as automatic software download or update client, patience pages are incompatible; You can choose to use data trickling or to provide no feedback to the user.

Based on whether your enterprise places a higher value on security or availability, the ProxySG allows you to choose between patience pages and data trickling.

Deferred Scanning (BlueCoat Recommended over No-Scan – Chpt 9-pg76 of ProxySG/AV Integration Guide)

The deferred scanning feature helps to avoid network outages due to infinite streaming. Infinite streams are connections such as webcams, streaming radio, or Flash media—traffic over an HTTP connection—that conceivably have no end. Characteristics of infinite streams may include no content length, slow data rate and long response time. Because the object cannot be fully downloaded, the ICAP content scan cannot start; however, the connection between the ProxySG and the ProxyAV remains open, causing wastage of finite connection resources.

With deferred scanning, ICAP requests that are unnecessarily holding up ICAP connections are detected and deferred until the full object has been received. When the number of ICAP resources in use has reached a certain threshold,

the ProxySG starts deferring scanning of the oldest outstanding ICAP requests. Once the defer threshold has been reached, for every new ICAP request, the ProxySG defers the oldest ICAP connection that has not yet received a full object. When an ICAP connection is deferred, the connection to the ProxyAV is closed. The application response continues to be received; when the download is complete, the ICAP request is restarted. The new ICAP request may still be queued if there are no available ICAP connections. Once a request is deferred, the ProxySG waits to receive the full object before restarting the request. If there is a queue when a deferred action has received a complete object, that action is queued behind other deferred actions that have finished. However, it will be queued before other new requests.

Conclusion

Deferred Scanning vs. No-Scan vs. Scan Until Error: I feel that “Deferred Scanning” is a better option than the no-scan policy, because with deferred scanning the current security policy that dictates the amount of data that gets sent to be scanned, remains unaltered. It will be the same as the current policy dictates, with the added benefit of not taking up an ICAP connection while waiting for a never-ending file to be downloaded. “No-Scan” runs the risk of not scanning a stream that may harbor malicious content, and we would never even know it. “Scan Until Error” would force the scan until an error popped, which is secure, but it holds the ICAP connection open until said error occurs.

Patience Page vs. Trickle From Start vs. Trickle At End: I feel that both “Trickle” methods are good options. At the current time, “feedback” is not provided to the user (patience page) or application/user agent (trickle), so the user experience by enabling either one. “Trickle From Start” is a good option as the current security policy on what to scan, remains unaltered, and the app/user agent experience would be improved by actually letting the app/user agent receive data – even at 1 byte per second, the app/user agent is now seeing data, and will not time out while the file to be scanned is in transit.

“Patience Pages” would be fine for interactive traffic originating from a web browser, and providing “Trickle” options to non browser-based applications for a best of both worlds approach, but I feel we need to start small. We have been fine without patience pages thus far, and should not try to complicate things unless there is a solid reason based on evidence, to make multiple changes all at once.

Proxy Config Instructions:
1. Enable Deferred Scanning, with threshold at default of 80%
2. Enable ICAP feedback of “Trickle Data from Start” for both interactive and non-interactive traffic.

References:
BlueCoat Proxy SG/AV Integration Guide
BlueCoat CMG for SGOS 5.5.x