Versa Analytics Scaling Recommendations

Analytics Cluster Node Recommendations

Versa Analytics clusters are a collection of interconnected nodes, which include analytics, log-forwarder, and search nodes. Each node type performs specific tasks. For a description of each node type, see Analytics Cluster Node Types.

The following are the recommended number of nodes of each type to include in an Analytics cluster, based on the number of CPEs.

Node Type	2500 CPEs	1000 CPEs	500 CPEs
Analytics	4	2	2
Log forwarder	8	4	2
Search	2	2	2

The following are the recommended operating parameters for Analytics clusters. For information about viewing operating parameters, see Monitor Analytics Clusters.

Parameter	2500 CPEs	1000 CPEs	500 CPEs	Comments
Log data ingestion rate into an instance of the Analytics database or search engine	8,000 logs per second for Analytics data 4,000 logs per second for search logs	4,000 logs per second for Analytics data 4,000 logs per second for search logs	4,000 logs per second for Analytics data 4,000 logs per second for search logs	This is the average rate of ingestion into a database or search engine assuming Versa recommended resources are used per instance. A maximum of 4 analytics-type nodes are recommended per cluster. For individual cluster nodes, the recommended maximum is 4,000 logs per second per analytics-type node and 4,000 logs per second per search-type node.
Log data streaming	80,000 logs per second	40,000 logs per second	20,000 logs per second	Streaming logs to third-party collectors with or without ingesting to the Analytics platform. Logs are streamed by the log collector exporter on the Analytics node. This is typically done on log-forwarder type nodes. For individual cluster nodes, the recommended maximum is 10000 logs/sec per Analytics node.
Maximum storage per analytics-type or search-type node	2 TB	2 TB	2 TB
Maximum archive storage per log forwarder node	1TB	512 GB	256 GB	You can expand disk storage for Analytics nodes running on virtual machines. See Expand Disk Storage for Analytics Nodes.
Maximum number of search logs	200 million	200 million	200 million	For additional logging and retention, you can export logs to the cloud-hosted ALS service. See Configure the Versa Advanced Logging Service.
Maximum number of CPE connections	4,096	2,048	1,024	Connections are typically received by log-forwarder type nodes. For individual cluster nodes, the recommendation is 512 connections maximum.

Firewall and SD-WAN Log Collection Recommendations

When you enable log export functionality (LEF) statistics logging for firewall and SD-WAN services on VOS devices, the VOS devices monitor firewall and SD-WAN traffic activity and they export various categories of usage statistics to Analytics nodes.

For the firewall service, the usage monitoring statistics are aggregated for each unique source and destination IP address for each tenant. For the SD-WAN service, the usage monitoring statistics are aggregated for each unique combination of tenant, application, source IP address, and access circuit.

By default, a VOS device exports all aggregated usage monitoring records up to a maximum of 16,384 (16K) records in a 5-minute interval.

For VOS devices that handle a large amount of active traffic, the number of unique source and destination IP addresses and the number of applications that they use can be very high. In these cases, exporting all the statistics logs every 5 minutes can result in performance issues, such as overutilization of WAN links and excessive consumption of storage, memory, and CPU on the Analytics nodes. These issues can lead to a loss of critical logs because of bursts of log traffic. For these VOS devices, you can reduce the number of exported firewall and SD-WAN statistics log records by exporting logs only for the busiest traffic flows. The "busy"-ness of a traffic flow is defined by a combination of the traffic volume and the number of flows using unique source and destination IP addresses.

To reduce the number of statistics log records that are exported, you configure the maximum number of log records to export per category or report type. For the busiest VOS devices, such as hubs and back offices, it is recommended that you decrease the number of logs exported in each 5-minute interval. The result of this configuration change is that, for each report type, the VOS device exports statistics for records that have the highest traffic volume and session activity. These records are sometimes referred to as the top records.

Note: The log record export interval is fixed at 5 minutes, and you cannot modify it.

For information about configuring SD-WAN and firewall LEF parameters for VOS devices, see Configure Firewall and SD-WAN Usage Monitoring Controls.

Flow Log Collection Recommendations

Flow (session) logging can be enabled for various Versa services, such as security firewall, UTM and traffic monitoring for forensics, troubleshooting and compliance purposes. These logs can be streamed from VOS to Versa's Analytics platform and/or directly to third-party security information and management (SIEM) or log analytics tools. Versa's Analytics platform has log forwarders which can also be configured to selectively stream the logs to third-party SIEM or log analytics tools and to archive or ingest the logs inside Versa's search engine.

The Versa Analytics search engine provides limited ability to analyze the flow logs in real time. It indexes the log fields so that it can provide fast and accurate retrieval and filtering of logs across various services. For more information, see Flow Logs. These engines require high maintenance and are very resource intensive. It requires dedicated computing and high-performance storage for scaling and performance. The volume of data that can be stored for optimal real-time query performance is approximately 100 million per instance in an on-premises cluster. The maximum number of search instances recommended is two for on-premises clusters. This allows around 200 million logs to be stored in the cluster.

If logging is enabled for all traffic of various services on a VOS device, the volume/rate of flow logs can be large and unpredictable. For example, a typical SD-WAN branch can generate more than 2 million access logs per day. For large deployments, ingesting all these logs in the on-premises Versa Analytics cluster may not be practical. To optimally use search functionality, we recommend ingesting flow logging for troubleshooting purposes or for critical traffic, such as threat logs or deny logs. These logs are kept for a shorter period of time. By default, the retention period is set for each log type. You can change the retention period. The following table describes the configuration of and recommendations for limiting flow logging for different flow log types.

For information about configuring logging for specific features and services, see Apply Log Export Functionality.

For customers who require support for high volume logging ability with higher retention, the Versa-managed, cloud-hosted Advanced Logging Service (ALS) can be used as an extension to on-premises clusters. See Configure Versa Advanced Logging Service.

Flow Log Type and Purpose	Configuration	Recommendation
Antivirus logs Use to detect antivirus events.	Configure traffic matching antivirus rules for logging.	Enable for all antivirus rules with action drop must not generate too many logs, because approximately 10% of log traffic is threat logs. An antivirus profile is associated with a rule in an NGFW policy. The rule determines the logging behavior. See Configure Access Policy Rules in Configure NGFW.
CGNAT logs Generated at the beginning and end of the flow.	Configure logging in a CGNAT rule.	Do not store in the Analytics database because of the high volume of logs. To capture CGNAT logs for auditing requirements, send them to a local collector configured for archive storage only or export them to third-party collectors. To configure CGNAT logging, see Configure CGNAT Rules in Configure CGNAT. To configure a local collector for archive-only storage, use a storage directory other than /var/tmp/log, see Modify or Add a Local Collector and Set Up an Additional Log Storage Directory in Configure Log Collectors and Log Exporter Rules.
DoS threat logs Use to detect a flood of new sessions or IP address and port scanning events. Log is specific for a tenant or VOS device, not a per-flow log, and it is for specific DoS threat types.	Configure logging of DoS threat events in a DoS policy.	Enabling DoS threat logs must not generate too many logs, because approximately 10 percent of log traffic is threat logs. See Configure DoS Policy Rules in Configure DoS Protection.
Firewall access and deny logs Use for security auditing, because every flow seen by a VOS device that matches a rule is logged.	Enable globally or for a specific firewall rule. Rule action can be allow or deny. Logging can be done at the start or end of a session, or both.	Enable for critical traffic with allow and deny rules. Enabling for all traffic can generate a large volume of logs. Enable at the end of flow, because all flow details are available at the end of the flow. Aggregate summary reports are available for firewall (even if flow logging is not enabled) to provide visibility into and metrics for application usage, firewall rule usage, and source and destination traffic usage. To capture all logs for auditing requirements, send them to a log collector that is configured for archive storage only. See Configure Access Policy Rules in Configure NGFW and Configure Security Access Policy Rules in Configure Stateful Firewall.
IDP logs Use to detect IDP events.	Configure traffic matching IDP rules for logging.	Enabling in IDP profiles must not generate too many logs, because approximately 10% of log traffic is threat logs. Be aware of false positive log events. To limit IDP log traffic by using a less inclusive vulnerability profile, see Configure Intrusion Detection and Prevention.
IP filtering logs Use for detailed logging of flows that match specific IP filtering rules.	Configure traffic matching IP filtering rules for logging. IP filtering rule actions include allow and block.	Enabling for all IP addresses can generate a large volume of logs. Optionally enable for IP filtering rules with action as block. See Configure Custom IP Filtering Profiles in Configure IP Filtering
PCAP (packet capture) logs Analyze the first n packets of traffic that matches a rule.	Configure for firewall, IPS, and traffic monitoring rules.	Enable only for diagnostics, because packet capture is very resource intensive on both VOS and Analytics devices. See View the Packet Capture Log Table in View Analytics Dashboards and Log Screens.
Traffic monitoring flow logs Use for diagnostics, because every flow seen by a VOS device that matches a rule is logged. For SD-WAN devices, provide path information for a traffic flow.	Enable globally or for a specific traffic monitoring rule. Logging can be done at the start or end of a session, or both, and during the session.	Enable only for diagnostics or for selected critical application traffic rules, because the traffic flow log volume can be very high. Enable at the end of flow, because all flow details are available at the end of the flow. Aggregate summary reports are available for SD-WAN (even if flow logging is not enabled) to provide visibility about applications, path usage, and VRFs. To capture all logs for auditing requirements, send them to a log collector that is configured for archive storage only. See Configure Traffic Monitoring Policy for Log Export to an Analytics Node in Configure Log Export Functionality.
URL filtering logs Use for detailed logging of flows that match specific URL filtering rules.	Configure traffic matching URL filtering rules for logging. URL filtering rule actions include allow and block.	Enabling for all URLs, both allow and deny (whitelist and blacklist), can generate a large volume of logs. Optionally enable for deny (blacklist) URLs to detect users of the denied URLs only. Aggregate summary reports are available for URL category and reputation (even if flow logging is not enabled) to provide visibility into overall traffic. See Configure Global URL-Filtering Settings and Configure a URL-Filtering Profile in Configure URL Filtering.

SLA Metrics Collection Recommendations

In an SD-WAN network, SLA monitoring is performed on every active path (a path is defined by a combination of tenant, local site identifier, local access circuit, remote site identifier, and remote access circuit) and every configured forwarding class. The number of active paths depends on a number of factors, including the topology (for example, hub and spoke or full mesh), the number of local and remote access circuits, the type of transport domains, the traffic activity between two endpoints, and whether adaptive monitoring is enabled.

By default, each endpoint logs SLA metrics calculated on the active path and forwarding class to the Analytics node every 5 minutes. The number of SLA logs generated can be very large and can result in usage of large amounts CPU or disk on the Analytics node.

To reduce the SLA logging load on Analytics nodes, consider the following:

In a hub-and-spoke topology, hubs report the same SLA metrics information as spokes. For example, if a hub-and-spoke topology has 100 sites with one link, there are 100 SLA logs from the hub and one SLA log from each of the 100 sites. To reduce the number of logs generated, disable logging from the hub by setting its logging interval in the SLA metrics configuration to 0. For more information, see Configure Continuous SLA Monitoring in Configure SLA Monitoring for SD-WAN Traffic Steering.
Each Controller node reports SLA metrics about all the paths to and from branch devices. To reduce the number of logs from the Controller node, set the SLA logging interval on the Controller node to 0. For more information, see Configure Continuous SLA Monitoring in Configure SLA Monitoring for SD-WAN Traffic Steering.
In a multitenant branch, enable SLA logging towards the Controller node or shared hub for only one tenant instead of for all tenants.
In branch networks where SLA monitoring is enabled on multiple forwarding classes, enable SLA logging only on important or critical forwarding classes.
To aggregate SLA metrics that change little over time, enable and configure SLA monitor log optimization on VOS devices. For more information, see SLA Monitor Log Optimization in Configure IP SLA Monitor Objects.
To reduce the storage required for SLA logs on an Analytics cluster, set the global configuration to ignore the forwarding class. This configuration aggregates multiple forwarding classes for a path into one entry. For more information, see Ignore SLA Forwarding Class.
To reduce storage requirements on Analytics clusters, reduce how long you retain daily and hourly data for SLA status and SLA violation features. When reducing the retention time, consider how often you might need to access SLA logging data to perform diagnostics testing. For more information, see Configure Retention Times for NoSQL Databases.

Ignore SLA Forwarding Class

To configure an Analytics cluster to ignore the SLA forwarding class:

In Director view, select the Analytics tab in the top menu bar.
In the horizontal menu bar, select a connector to any node in the Analytics cluster to be configured.
Select Administration > Configurations > Settings in the left menu bar, and then select the Data Configurations tab.
To ignore the SLA forward class for all tenants, select Global Configurations in the Scope field. To ignore the SLA forward class for a specific tenant, select the tenant name in the Scope field.
Click Advanced Settings.
Toggle the Ignore SLA Forward Class field to On.

Flow Log Storage Recommendations

When you enable flow logging on VOS devices, the Analytics cluster receiving the logs stores them in datastores by default. If there are many flow logs, the cluster may not be able to handle the load, which can result in delayed processing of the data and increased disk utilization. To avoid such backlogs, you can configure the local collector on the Analytics node receiving the logs to bypass the datastores and store flow logs directly to an archive. You can also throttle or disable incoming flow logs in the local collector configuration.

For a description of Analytics log collector nodes and local collectors, see Versa Analytics Configuration Concepts. For more information about flow logs, see Flow Logs.

Configure Flow Log Storage Directly to an Archive

To store flow logs directly to an archive, you can configure a local collector to store incoming logs in a non-standard directory, one other than /var/tmp/log. When logs are stored in a non-standard directory, the Versa Analytics driver does not process them into the datastores within the cluster, allowing you to archive the logs without analysis. To archive the logs, you must manually configure a cron job.

For information about configuring a local collector, see Modify or Add a Local Collector in Configure Log Collectors and Log Exporter Rules. For information about creating a non-default archive cron job and managing log archives, see Manage Versa Analytics Log Archives.

Throttle or Disable Flow Log Storage

You can slow down or stop incoming flow logs at a local collector, which is the point where an Analytics log collector node accepts incoming logs. To do this, you select the throttle or disable setting in the local collector configuration. For more information, see Modify or Add a Local Collector in Configure Log Collectors and Log Exporter Rules.

Configure Analytics Datastore Limits

In many deployments, the number of Analytics nodes and the amount of storage and memory per node are limited, and as a result the nodes can run out of disk space or queries can time out because of lack of processing power. To optimally retrieve the data, Analytics clusters limit the maximum number of rows retained (approximately 100 million per node). Also, other factors affect the storage and query performance, such as machine type (bare metal or virtual machine), disk type (SSD or HDD), memory, number of cores, and hyperthreading.

You can influence the size of the datastores in an Analytics cluster by setting limits on retention times and daily log volume. In Analytics clusters, you can configure retention times by log type for search engine datastores and by feature type for noSQL databases. However, increasing the retention limits can have an adverse impact on the database load, which can lead to issues such as disk exhaustion or suboptimal performance. Contact Versa Technical Support before making any modifications to these limits.

The Director GUI can access multiple Analytics clusters. When configuring datastore limits from the Director GUI, ensure that you select a connector to the correct cluster. For more information, see Versa Director Nodes and Analytics Clusters in Versa Analytics Configuration Concepts.

Configure Search Engine Log Storage Limits

To limit the number of logs that are stored in search-engine datastores on a cluster, you can set a daily limit on the number of logs stored. You set the limit globally, and within the global daily limit you can configure limits for individual tenants. If the global daily limit is reached, the cluster generates an alarm. Critical logs such as alarms and threats are not affected by this configuration. The daily global limit is determined based on the cluster size, the amount of storage available per node, and the log retention policy.

To set daily log storage limits:

In Director view, select the Analytics tab in the top menu bar.
Select a connector to the cluster in the horizontal menu bar.
Select Administration > Configurations > Settings in the left menu bar.
Select the Data Configurations tab, then click Search Logs Daily Limit Configurations.
In the Global Daily limit field, enter a value for the maximum number of global daily logs, and enter daily limits for each tenant. Values are in thousands of logs.
Click Save.

To check if the limit has been exceeded:

In Director view, select the Analytics tab in the top menu bar.
Select a connector to the cluster in the horizontal menu bar.
Select Administration > System Status > Alarms in the left menu bar. The Alarms table displays, including alarms for exceeding log limits.

Configure Retention Times for Search Engine Datastores

You can set the retention time for search engine datastores on Analytics clusters. To configure the default retention time for tenants, you configure the global retention time. If desired, you can change the default retention time for individual tenants.

You configure retention times by log type, such as ADC logs or alarm logs. Retention time units are in days, after which the Analytics cluster automatically removes the logs. For example, if you retain ADC logs for three days, the Analytics cluster automatically removes ADC logs older than three days from the datastore.

To change search engine datastore retention times:

In Director view, select the Analytics tab in the top menu bar.
In the horizontal menu bar, select a connector to any node in the Analytics cluster.
Select Administration > Configurations > Settings in the left menu bar, and then select the Data Configurations tab.
Click the Scope drop-down menu to display a list of tenants and the Global Configurations options. By default, tenants use the global configurations settings. An asterisk indicates tenants whose data retention settings are activated. Settings for these tenants override the global configuration settings.
To activate or deactivate the retention settings for a tenant, click the tenant name. Then, click the box next to Search Data Configurations to activate the settings, or unclick to deactivate. The asterisk in front of the tenant name in the Scope drop-down changes to reflect the activation status.
Select Search Data Configurations.
Enter a retention time, in days, for each log type. For information about which features and services relate to items listed on this screen, see .
Click Save.

Configure Retention Times for NoSQL Databases

NoSQL databases in an Analytics cluster store performance and fault monitoring data for historical reporting. This data is stored in two time intervals, daily and hourly. By default, daily data is retained for three months and hourly data is retained for 30 days. Depending on available storage and the cluster size, you can keep daily data longer. For hourly data, you can aggregate it every 5, 15, 30, or 60 minutes. Choose a resolution of 15, 30, or 60 minutes for better performance.

To configure the daily and hourly retention times and the hourly resolution for Analytics-type nodes:

From Director view, click Analytics.
In the horizontal menu bar, select a connector to any node in the Analytics cluster.
Select Administration > Configurations > Settings.
Select the Data Configurations tab.
Click the Scope drop-down menu to display a list of tenants and the Global Configurations options. By default, tenants use the global configurations settings An asterisk indicates tenants whose data retention settings are activated. Settings for these tenants override the global configuration settings.
To activate or deactivate the retention settings for a tenant, click the tenant name. Then, click the box next to Search Data Configurations to activate the settings, or unclick to deactivate. The asterisk in front of the tenant name in the Scope drop-down changes to reflect the activation status.

To change retention settings, click Analytics Data Configurations. Enter information for the following fields.

Field	Description
Daily TTL	Enter the storage retention time for daily data, in days.
Hourly TTL	Enter the storage retention for hourly data, in days.
Resolution	Enter the resolution for aggregating hourly data. For performance and storage efficiency, use a value of 15 minutes or higher. For more precision, use a value 5 minutes.
Active	Toggle to activate (on) or deactivate (off) automatic deletion of expired data for the given feature in the database.

Click Save to save the settings.
Click Analytics Data Configurations to compress this section of the screen.

Supported Software Information

Releases 20.2 and later support all content described in this article.