Versa Analytics Scaling Recommendations
For supported software information, click here.
This article describes the following logging and storage recommendations for handling performance and scaling of Versa Analytics:
- Analytics cluster node scaling recommendations.
- Firewall and SD-WAN log collection recommendations—Versa Operating SystemTM (VOSTM) device settings for collecting statistics related to firewall and SD-WAN operations.
- Flow log collection recommendations—VOS device settings for flow log collection.
- SLA metrics collection recommendations—VOS device and Analytics cluster settings for SLA data collection.
- Flow log storage recommendations—Analytics node settings for direct-to-archive flow log storage and throttling or disabling flow logs.
- Analytics datastore limits—Analytics cluster settings for search-engine datastores and noSQL databases.
Analytics Cluster Node Recommendations
Versa Analytics clusters are a collection of interconnected nodes, which include analytics, log-forwarder, and search nodes. Each node type performs specific tasks. For a description of each node type, see Analytics Cluster Node Types.
The following are the recommended number of nodes of each type to include in an Analytics cluster, based on the number of CPEs.
Node Type |
2500 CPEs |
1000 CPEs |
500 CPEs |
---|---|---|---|
Analytics |
4 |
2 |
2 |
Log forwarder |
8 |
4 |
2 |
Search |
2 |
2 |
2 |
The following are the recommended operating parameters for Analytics clusters. For information about viewing operating parameters, see Monitor Analytics Clusters.
Parameter |
2500 CPEs |
1000 CPEs |
500 CPEs |
Comments |
---|---|---|---|---|
Log data ingestion rate into an instance of the Analytics database or search engine |
|
|
|
This is the average rate of ingestion into a database or search engine assuming Versa recommended resources are used per instance. A maximum of 4 analytics-type nodes are recommended per cluster.
For individual cluster nodes, the recommended maximum is 4,000 logs per second per analytics-type node and 4,000 logs per second per search-type node. |
Log data streaming |
80,000 logs per second |
40,000 logs per second |
20,000 logs per second |
Streaming logs to third-party collectors with or without ingesting to the Analytics platform. Logs are streamed by the log collector exporter on the Analytics node. This is typically done on log-forwarder type nodes.
For individual cluster nodes, the recommended maximum is 10000 logs/sec per Analytics node. |
Maximum storage per analytics-type or search-type node |
2 TB | 2 TB | 2 TB | |
Maximum archive storage per log forwarder node |
1TB |
512 GB |
256 GB |
You can expand disk storage for Analytics nodes running on virtual machines. See Expand Disk Storage for Analytics Nodes. |
Maximum number of search logs |
200 million |
200 million |
200 million |
For additional logging and retention, you can export logs to the cloud-hosted ALS service. See Configure the Versa Advanced Logging Service. |
Maximum number of CPE connections |
4,096 |
2,048 |
1,024 |
Connections are typically received by log-forwarder type nodes.
For individual cluster nodes, the recommendation is 512 connections maximum. |
Firewall and SD-WAN Log Collection Recommendations
When you enable log export functionality (LEF) statistics logging for firewall and SD-WAN services on VOS devices, the VOS devices monitor firewall and SD-WAN traffic activity and they export various categories of usage statistics to Analytics nodes.
For the firewall service, the usage monitoring statistics are aggregated for each unique source and destination IP address for each tenant. For the SD-WAN service, the usage monitoring statistics are aggregated for each unique combination of tenant, application, source IP address, and access circuit.
By default, a VOS device exports all aggregated usage monitoring records up to a maximum of 16,384 (16K) records in a 5-minute interval.
For VOS devices that handle a large amount of active traffic, the number of unique source and destination IP addresses and the number of applications that they use can be very high. In these cases, exporting all the statistics logs every 5 minutes can result in performance issues, such as overutilization of WAN links and excessive consumption of storage, memory, and CPU on the Analytics nodes. These issues can lead to a loss of critical logs because of bursts of log traffic. For these VOS devices, you can reduce the number of exported firewall and SD-WAN statistics log records by exporting logs only for the busiest traffic flows. The "busy"-ness of a traffic flow is defined by a combination of the traffic volume and the number of flows using unique source and destination IP addresses.
To reduce the number of statistics log records that are exported, you configure the maximum number of log records to export per category or report type. For the busiest VOS devices, such as hubs and back offices, it is recommended that you decrease the number of logs exported in each 5-minute interval. The result of this configuration change is that, for each report type, the VOS device exports statistics for records that have the highest traffic volume and session activity. These records are sometimes referred to as the top records.
Note: The log record export interval is fixed at 5 minutes, and you cannot modify it.
For information about configuring SD-WAN and firewall LEF parameters for VOS devices, see Configure Firewall and SD-WAN Usage Monitoring Controls.
Flow Log Collection Recommendations
Flow logging can be enabled for various Versa services, such as security firewall, UTM and traffic monitoring for forensics, troubleshooting and compliance purposes. These logs can be streamed from VOS to Versa's Analytics platform and/or directly to third-party security information and management (SIEM) or log analytics tools. Versa's Analytics platform has log forwarders which can also be configured to selectively stream the logs to third-party SIEM or log analytics tools and to archive or ingest the logs inside Versa's search engine.
The Versa Analytics search engine provides limited ability to analyze the flow logs in real time. It indexes the log fields so that it can provide fast and accurate retrieval and filtering of logs across various services. For more information, see Flow Logs. These engines require high maintenance and are very resource intensive. It requires dedicated computing and high-performance storage for scaling and performance. The volume of data that can be stored for optimal real-time query performance is approximately 100 million per instance in an on-premises cluster. The maximum number of search instances recommended is two for on-premises clusters. This allows around 200 million logs to be stored in the cluster.
If logging is enabled for all traffic of various services on a VOS device, the volume/rate of flow logs can be large and unpredictable. For example, a typical SD-WAN branch can generate more than 2 million access logs per day. For large deployments, ingesting all these logs in the on-premises Versa Analytics cluster may not be practical. To optimally use search functionality, we recommend ingesting flow logging for troubleshooting purposes or for critical traffic, such as threat logs or deny logs. These logs are kept for a shorter period of time. By default, the retention period is set for each log type. You can change the retention period. The following table describes the configuration of and recommendations for limiting flow logging for different flow log types.
For information about configuring logging for specific features and services, see Apply Log Export Functionality.
For customers who require support for high volume logging ability with higher retention, the Versa-managed, cloud-hosted Advanced Logging Service (ALS) can be used as an extension to on-premises clusters. See Configure Versa Advanced Logging Service.
Flow Log Type and Purpose | Configuration | Recommendation |
---|---|---|
Antivirus logs
|
|
|
CGNAT logs
|
|
|
DoS threat logs
|
|
|
Firewall access and deny logs
|
|
|
IDP logs
|
|
|
IP filtering logs
|
|
|
PCAP (packet capture) logs
|
|
|
Traffic monitoring flow logs
|
|
|
URL filtering logs
|
|
|
SLA Metrics Collection Recommendations
In an SD-WAN network, SLA monitoring is performed on every active path (a path is defined by a combination of tenant, local site identifier, local access circuit, remote site identifier, and remote access circuit) and every configured forwarding class. The number of active paths depends on a number of factors, including the topology (for example, hub and spoke or full mesh), the number of local and remote access circuits, the type of transport domains, the traffic activity between two endpoints, and whether adaptive monitoring is enabled.
By default, each endpoint logs SLA metrics calculated on the active path and forwarding class to the Analytics node every 5 minutes. The number of SLA logs generated can be very large and can result in usage of large amounts CPU or disk on the Analytics node.
To reduce the SLA logging load on Analytics nodes, consider the following:
- In a hub-and-spoke topology, hubs report the same SLA metrics information as spokes. For example, if a hub-and-spoke topology has 100 sites with one link, there are 100 SLA logs from the hub and one SLA log from each of the 100 sites. To reduce the number of logs generated, disable logging from the hub by setting its logging interval in the SLA metrics configuration to 0. For more information, see Configure Continuous SLA Monitoring in Configure SLA Monitoring for SD-WAN Traffic Steering.
- Each Controller node reports SLA metrics about all the paths to and from branch devices. To reduce the number of logs from the Controller node, set the SLA logging interval on the Controller node to 0. For more information, see Configure Continuous SLA Monitoring in Configure SLA Monitoring for SD-WAN Traffic Steering.
- In a multitenant branch, enable SLA logging towards the Controller node or shared hub for only one tenant instead of for all tenants.
- In branch networks where SLA monitoring is enabled on multiple forwarding classes, enable SLA logging only on important or critical forwarding classes.
- To aggregate SLA metrics that change little over time, enable and configure SLA monitor log optimization on VOS devices. For more information, see SLA Monitor Log Optimization in Configure IP SLA Monitor Objects.
- To reduce the storage required for SLA logs on an Analytics cluster, set the global configuration to ignore the forwarding class. This configuration aggregates multiple forwarding classes for a path into one entry. For more information, see Ignore SLA Forwarding Class.
- To reduce storage requirements on Analytics clusters, reduce how long you retain daily and hourly data for SLA status and SLA violation features. When reducing the retention time, consider how often you might need to access SLA logging data to perform diagnostics testing. For more information, see Configure Retention Times for NoSQL Databases.
Ignore SLA Forwarding Class
To configure an Analytics cluster to ignore the SLA forwarding class:
- In Director view, select the Analytics tab in the top menu bar.
- In the horizontal menu bar, select a connector to any node in the Analytics cluster to be configured.
- Select Administration > Configurations > Settings in the left menu bar, and then select the Data Configurations tab.
- To ignore the SLA forward class for all tenants, select Global Configurations in the Scope field. To ignore the SLA forward class for a specific tenant, select the tenant name in the Scope field.
- Click Advanced Settings.
- Toggle the Ignore SLA Forward Class field to On.
Flow Log Storage Recommendations
When you enable flow logging on VOS devices, the Analytics cluster receiving the logs stores them in datastores by default. If there are many flow logs, the cluster may not be able to handle the load, which can result in delayed processing of the data and increased disk utilization. To avoid such backlogs, you can configure the local collector on the Analytics node receiving the logs to bypass the datastores and store flow logs directly to an archive. You can also throttle or disable incoming flow logs in the local collector configuration.
For a description of Analytics log collector nodes and local collectors, see Versa Analytics Configuration Concepts. For more information about flow logs, see Flow Logs.
Configure Flow Log Storage Directly to an Archive
To store flow logs directly to an archive, you can configure a local collector to store incoming logs in a non-standard directory, one other than /var/tmp/log. When logs are stored in a non-standard directory, the Versa Analytics driver does not process them into the datastores within the cluster, allowing you to archive the logs without analysis. To archive the logs, you must manually configure a cron job.
For information about configuring a local collector, see Modify or Add a Local Collector in Configure Log Collectors and Log Exporter Rules. For information about creating a non-default archive cron job and managing log archives, see Manage Versa Analytics Log Archives.
Throttle or Disable Flow Log Storage
You can slow down or stop incoming flow logs at a local collector, which is the point where an Analytics log collector node accepts incoming logs. To do this, you select the throttle or disable setting in the local collector configuration. For more information, see Modify or Add a Local Collector in Configure Log Collectors and Log Exporter Rules.
Configure Analytics Datastore Limits
In many deployments, the number of Analytics nodes and the amount of storage and memory per node are limited, and as a result the nodes can run out of disk space or queries can time out because of lack of processing power. To optimally retrieve the data, Analytics clusters limit the maximum number of rows retained (approximately 100 million per node). Also, other factors affect the storage and query performance, such as machine type (bare metal or virtual machine), disk type (SSD or HDD), memory, number of cores, and hyperthreading.
You can influence the size of the datastores in an Analytics cluster by setting limits on retention times and daily log volume. In Analytics clusters, you can configure retention times by log type for search engine datastores and by feature type for noSQL databases. However, increasing the retention limits can have an adverse impact on the database load, which can lead to issues such as disk exhaustion or suboptimal performance. Contact Versa Technical Support before making any modifications to these limits.
The Director GUI can access multiple Analytics clusters. When configuring datastore limits from the Director GUI, ensure that you select a connector to the correct cluster. For more information, see Versa Director Nodes and Analytics Clusters in Versa Analytics Configuration Concepts.
Configure Search Engine Log Storage Limits
To limit the number of logs that are stored in search-engine datastores on a cluster, you can set a daily limit on the number of logs stored. You set the limit globally, and within the global daily limit you can configure limits for individual tenants. If the global daily limit is reached, the cluster generates an alarm. Critical logs such as alarms and threats are not affected by this configuration. The daily global limit is determined based on the cluster size, the amount of storage available per node, and the log retention policy.
To set daily log storage limits:
- In Director view, select the Analytics tab in the top menu bar.
- Select a connector to the cluster in the horizontal menu bar.
- Select Administration > Configurations > Settings in the left menu bar.
- Select the Data Configurations tab, then click Search Logs Daily Limit Configurations.
- In the Global Daily limit field, enter a value for the maximum number of global daily logs, and enter daily limits for each tenant. Values are in thousands of logs.
- Click Save.
To check if the limit has been exceeded:
- In Director view, select the Analytics tab in the top menu bar.
- Select a connector to the cluster in the horizontal menu bar.
- Select Administration > System Status > Alarms in the left menu bar. The Alarms table displays, including alarms for exceeding log limits.
Configure Retention Times for Search Engine Datastores
You can set the retention time for search engine datastores on Analytics clusters. To configure the default retention time for tenants, you configure the global retention time. If desired, you can change the default retention time for individual tenants.
You configure retention times by log type, such as ADC logs or alarm logs. Retention time units are in days, after which the Analytics cluster automatically removes the logs. For example, if you retain ADC logs for three days, the Analytics cluster automatically removes ADC logs older than three days from the datastore.
To change search engine datastore retention times:
- In Director view, select the Analytics tab in the top menu bar.
- In the horizontal menu bar, select a connector to any node in the Analytics cluster.
- Select Administration > Configurations > Settings in the left menu bar, and then select the Data Configurations tab.
- Click the Scope drop-down menu to display a list of tenants and the Global Configurations options. By default, tenants use the global configurations settings. An asterisk indicates tenants whose data retention settings are activated. Settings for these tenants override the global configuration settings.
- To activate or deactivate the retention settings for a tenant, click the tenant name. Then, click the box next to Search Data Configurations to activate the settings, or unclick to deactivate. The asterisk in front of the tenant name in the Scope drop-down changes to reflect the activation status.
- Select Search Data Configurations.
- Enter a retention time, in days, for each log type. For information about which features and services relate to items listed on this screen, see .
- Click Save.
Configure Retention Times for NoSQL Databases
NoSQL databases in an Analytics cluster store performance and fault monitoring data for historical reporting. This data is stored in two time intervals, daily and hourly. By default, daily data is retained for three months and hourly data is retained for 30 days. Depending on available storage and the cluster size, you can keep daily data longer. For hourly data, you can aggregate it every 5, 15, 30, or 60 minutes. Choose a resolution of 15, 30, or 60 minutes for better performance.
To configure the daily and hourly retention times and the hourly resolution for Analytics-type nodes:
- From Director view, click Analytics.
- In the horizontal menu bar, select a connector to any node in the Analytics cluster.
- Select Administration > Configurations > Settings.
- Select the Data Configurations tab.
- Click the Scope drop-down menu to display a list of tenants and the Global Configurations options. By default, tenants use the global configurations settings An asterisk indicates tenants whose data retention settings are activated. Settings for these tenants override the global configuration settings.
- To activate or deactivate the retention settings for a tenant, click the tenant name. Then, click the box next to Search Data Configurations to activate the settings, or unclick to deactivate. The asterisk in front of the tenant name in the Scope drop-down changes to reflect the activation status.
- To change retention settings, click Analytics Data Configurations. Enter information for the following fields.
Field Description Daily TTL Enter the storage retention time for daily data, in days. Hourly TTL Enter the storage retention for hourly data, in days. Resolution Enter the resolution for aggregating hourly data. For performance and storage efficiency, use a value of 15 minutes or higher. For more precision, use a value 5 minutes. Active Toggle to activate (on) or deactivate (off) automatic deletion of expired data for the given feature in the database. - Click Save to save the settings.
- Click Analytics Data Configurations to compress this section of the screen.
Supported Software Information
Releases 20.2 and later support all content described in this article.
Additional Information
Apply Log Export Functionality
Configure CGNAT
Configure DoS Protection
Configure Firewall and SD-WAN Usage Monitoring Controls
Configure Intrusion Detection and Prevention
Configure IP Filtering
Configure IP SLA Monitor Objects
Configure Log Collectors and Log Exporter Rules
Configure Log Export Functionality
Configure NGFW
Configure SLA Monitoring for SD-WAN Traffic Steering
Configure Stateful Firewall
Configure the Versa Advanced Logging Service
Configure URL Filtering
Expand Disk Storage for Analytics Nodes
Flow Logs
Manage Versa Analytics Log Archives
Monitor Analytics Clusters
Troubleshoot Analytics Disk Storage Issues
Versa Analytics Configuration Concepts
Versa Analytics Log Collector Log Type
View Analytics Dashboards and Log Screens.