Troubleshoot Analytics Disk Storage Issues

Last updated
Save as PDF

For supported software information, click here.

You can encounter disk issues on any of the node types in an Analytics cluster. When disk reserves run low, Analytics nodes can exhibit slow performance, drop logs, or crash. The node may require long periods to resume normal operations, and log data can be dropped.

This article describes how to identify and resolve disk storage issues on Analytics nodes.

All Analytics node types require sizable disk reserves for the following operations:

Analytics-type nodes—To perform the database compaction process
Search-type nodes—To store log surges in the search engine datastore until the logs pass their retention time
Log collector nodes—To store incoming logs until the logs are processed by the Analytics driver

Note: To prevent disk overload issues, you should maintain disk storage at approximately 60 percent or less of total disk.

Identify Which Nodes Have Disk Storage Issues

You can view disk usage alarms and the total and available disk storage on each node in an Analytics cluster from the Director Analytics tab. Note that total storage is relative to the size of the root filesystem. To increase the disk storage space, see Expand Disk Storage for Analytics Nodes.

To search for disk alarms and list the total and available storage on an Analytics node:

In Director view, select the Analytics tab in the top menu bar.
Select an Analytics cluster node. For Releases 22.1.1 and later, hover over the Analytics tab and then select a node. For Releases 21.2 and earlier, select a node in the horizontal menu bar
To search for disk alarms, select Administration > System Status > Alarms. Scan the main pane for alarm descriptions containing "disk usage exceeded" messages. In the following example, the alarm message indicates that the disk threshold is set to 50 percent for analytics-type nodes and that this limit has been exceeded by the node at 192.168.1.21.
To display disk usage thresholds, selection Administration > Configurations > Settings > System Monitoring. Note the values in the Search Disk Usage Threshold and Analytics Disk Usage Threshold fields. When disk usage surpasses this value, the Analytics cluster generates an alarm.
To list total and available disk storage for the nodes in an Analytics cluster, select Administration > System Status > Resources in the left menu bar. The Disk Used column displays the percentage of disk used, and the Disk Free column displays the amount of remaining disk available on each node in the cluster.

To display disk usage from the shell on a node, issue the df -kh / command. The Size column lists the total size of the root filesystem, and the Avail column lists the amount of unused storage.

admin@Analytics$ df -kh /
Filesystem               Size  Used Avail Use% Mounted on
/dev/mapper/system-root   71G   33G   34G  50% /

Identify Which Directories Have Disk Storage Issues

You can locate which specific directories are consuming disk resources on a node. To view the amount of storage used by a directory and all its subdirectories, issue the sudo du –sBM command from a shell on the node. Issue the following commands to display disk usage for each of the five directories which typically accumulate storage on Analytics nodes. Note that the /var/lib/solr directory is not present on all Analytics nodes.

admin@Analytics$ sudo du -sBM /var/lib/cassandra
admin@Analytics$ sudo du -sBM /var/lib/solr
admin@Analytics$ sudo du -sBM /var/tmp/log
admin@Analytics$ sudo du -sBM /var/tmp/log/tenant*/backup
admin@Analytics$ sudo du -sBM /var/tmp/archive

The following table describes the contents of each directory and the section that describes how to troubleshoot issues with each directory. The section assists you in recovering from disk full conditions in these directories and offers actions you can take to reduce disk usage in the future.

Directory	Directory Contents	Troubleshooting Information
/var/lib/cassandra	Cassandra database files. On the DSE platform, both search engine and analytics data is stored in Cassandra. On the Fusion platform, only analytics data is stored in this directory.	Troubleshoot Cassandra Database Filling Up Disk
/var/lib/solr	Solr search engine files. On the Fusion platform, search engine data is stored in the Solr datastore.	Troubleshoot Solr Datastore Filling Up Disk
/var/tmp/log/tenant-tenant-name	Log files that have not yet been processed by the Analytics driver.	Troubleshoot Log Processing and Archiving Issues
/var/tmp/log/tenant-tenant-name/backup	Log files that have been processed but not yet archived.	Troubleshoot Log Processing and Archiving Issues
/var/tmp/archive	Archived log files that have not yet been moved to more permanent storage.	Troubleshoot Log Processing and Archiving Issues

During the recovery process, if you determine that not enough disk resource is allocated to the node, see Resolve Not Enough Disk in General, below.

Resolve Not Enough Disk in General

To increase available storage on an Analytics node, you can do the following:

Expand the root filesystem. For more information, see Expand Disk Storage for Analytics Nodes.
Avoid sending unnecessary logs to the cluster. For more information, see Versa Analytics Scaling Recommendations.
Add log forwarder virtual machines (VMs) to the cluster. Doing this shifts the storage of unprocessed log files and log archive files to the VMs.
Export certain high-volume logs from VOS devices to the Versa Advanced Logging Service (ALS). Logs are processed by and stored in the cloud-based ALS cluster, and are available for generating reports, viewing log charts, and performing log searches. For more information, see Configure the Versa Advanced Logging Service.

Troubleshoot Cassandra Database Filling Up Disk

Analytics clusters run either the DSE platform or the newer Fusion platform. On the DSE platform, both the search engine and the Analytics database use the Cassandra database. In Fusion, only the Analytics database uses Cassandra. Cassandra stores its data in the /var/lib/cassandra directory and its subdirectories. This section describes how to identify and resolve some disk storage issues found when Cassandra is filling up the disk space on a node.

Cluster nodes that operate the Cassandra database require large amounts of free disk storage for the following operations:

Analytics data
- Database compaction process—The Cassandra database periodically performs a compaction process to optimize database tables. This process creates temporary tables and temporarily increases disk usage by a large percentage, sometimes 40 percent or more.
- Retention of database data—If data is held in the database for long periods, disk utilization can be high. Higher retention times require more disk storage.
- Storage of fine-grained time-series data—If the database is configured to store data in small increments to create detailed time-series charts, disk usage is substantially increased. Lower values require more disk storage.
Search engine data
- Retention of search engine data—If data is held in the search engine datastore for long periods, disk utilization can be high. Higher retention times require more disk storage.
- Daily log intake—If no daily log limits are set, disk utilization can be high on days when a surge of logs is received by the Analytics cluster.

To reduce the amount of disk used by the Cassandra database, you can do the following:

Decrease database retention times.
Increase database resolution times.
Decrease search engine retention times.
Set daily log limits.
Truncate database tables.

Decrease Database Retention Times

Decreasing retention settings for various types of data, such as alarms, SD-WAN QoS usage, and SLA status, can reduce the amount of disk used by the Analytics database. The Analytics cluster distinguishes between daily and hourly data for database records. You can choose, for example, to retain daily records for 90 days and hourly records for 30 days for SLA status records. Existing data is held for its original retention period and only newly added records are subject to the new settings. Reducing retention time settings does not immediately reduce disk storage. When records expire the database marks them for future deletion, and tables are reduced in size only during database compaction.

You can view and modify database retention settings from the Analytics > Administration > Configurations > Settings > Data Configurations tab. Select Analytics Data Configurations to expand the screen and display and modify the settings. For more information, see Configure Retention Times for NoSQL Databases in Versa Analytics Scaling Recommendations.

Increase Database Resolution Times

The Analytics application uses time-series data to create area, line, and stacked-bar charts for dashboards and reports. These charts can display finer-grain detail when you have configured the database to store higher-resolution data. However, Cassandra disk usage noticeably increases when you use lower (5 minutes) instead of higher (30 or 60 minutes) settings. To reduce disk usage, lower the database resolution settings for noncritical features and services. Newly received data is stored using the modified resolution time.

You can view and modify resolution settings from the Analytics > Administration > Configurations > Settings > Data Configurations tab. Select Analytics Data Configurations to expand the screen and to display and modify the settings. For more information, see Configure Retention Times for NoSQL Databases in Versa Analytics Scaling Recommendations.

Decrease Search Engine Retention Times

The search engine allows you to perform high-speed searches through logs retained in its datastore, but retaining logs requires large amounts of disk. You can decrease disk storage by reducing search data retention periods. Default retention times are set in periods of 1, 3, 7, or 30 days. Older logs are still stored in archive files and can be extracted from archive if required.

You can view and modify resolution settings from the Analytics > Administration > Configurations > Settings > Data Configurations tab. Select Search Data Configurations to expand the screen and display and modify the settings. For more information, see Configure Retention Times for NoSQL Databases in Versa Analytics Scaling Recommendations.

Set Daily Log Limits

You can set limits on the number of noncritical logs added to search engine datastores for a 24-hour period. Daily limits use UTC time, and the limit is reset at UTC 00:00 each day. You can set limits per tenant or globally for all tenants. Select the Analytics > Administration > Configurations > Settings > Data Configurations tab, and then select Search Logs Daily Limit Configurations to expand the screen and display and modify the settings. For more information, see Configure Search Engine Log Storage Limits in Versa Analytics Scaling Recommendations.

Truncate Database Tables

If you attempt to shrink the database by modifying the retention time or increasing the data resolution settings, it can take the database hours or days to reduce its size through the compaction process. To speed up the process of shrinking the database, you can choose to manually truncate some database tables. Truncation removes the table contents and leaves the empty table in place in the database so that this data is no longer available to populate Analytics dashboards or log screens. Any logs received after the truncation are added to the table.

Note: Contact Versa Networks Customer Support before performing database truncation.

To reduce the size of the database by manually truncating tables, you do the following:

List the size of database tables to help to determine which tables to truncate.
Determine which tables to truncate.
Shut down the Versa Analytics driver to temporarily stop it from adding data to the table or tables you are truncating.
Shut down the automatic compaction process on each table.
Perform the truncations.

Verify the vandb-repair Cron Job

The vandb-repair cron job performs a repair function to ensure proper clearing of truncated records. Before truncating database tables, ensure that the vandb-repair cron job is present in the /etc/cron.d directory.

To verify the database repair cron job:

Log in to the shell on the Analytics node.
Display the contents of the vandb-repair cron file:

admin@Analytics$ sudo su
root@Analytics# cd /etc/cron.d
root@Analytics# cat vandb-repair
# Every  run the van db maint script
0 0   * * 0   root /opt/versa/scripts/van-scripts/vandb-repair.sh
root@Analytics# exit
admin@Analytics$

Determine Which Tables To Truncate

Database tables are stored in subdirectories in the /var/lib/cassandra/data/van_analytics directory. The table name is the portion of the subdirectory name prior to the dash. The following example output shows the directory listing for the table tenantsrcfacts. Note the size of the table, 4.0KB in this example. You can use this information to help you select which tables to truncate.

admin@Analytics$ sudo su
root@Analytics# cd /var/lib/cassandra/data/van_analytics
root@Analytics# ls -lrt
...
drwxr-xr-x 3 cassandra cassandra 4.0K Sep 27 12:32 tenantsrcfacts-c6243c407b6c11ebac1977d4669a7d1e
...
root@Analytics# exit
admin@Analytics$

The following table lists database tables which typically use a large amount of disk. Data from these database tables is used to populate the charts and tables displayed on the indicated Analytics dashboard. After you truncate a table, only new ingested log data displays on the corresponding dashboard.

Database Table	Analytics Screen
sdwanappsubscriber	Dashboards > SD-WAN > Sites > Application > Users
sdwansite2siteslam_1	Dashboards > SD-WAN > Sites > SLA metrics
sdwansite2siteslamrt2	Dashboards > SD-WAN > Sites > SLA metrics
sdwansite2siteslapathstatus	Dashboards > SD-WAN > Sites > SLA metrics
sdwansite2siteslaviolation	Dashboards > SD-WAN > Sites > SLA metrics
tenantdestfacts	Dashboards > Security > Firewall > Destination
tenantsrcfacts	Dashboards > Security > Firewall > Source

Manually Truncate a Database Table

To truncate a database table:

Log in to the shell on the node.
List the disk usage for the root filesystem:

admin@Analytics$ df -kh /

Change to the directory containing the database table, and list the filenames:

admin@Analytics$ sudo su
root@Analytics# cd /var/lib/cassandra/data/van_analytics
root@Analytics# ls -lrth

In the following example, the grep command is used to filter the output to display only the table named tenantsrcfacts.

root@Analytics# ls -lrth | grep tenantsrc
drwxr-xr-x 3 cassandra cassandra 4.0K Sep 27 12:32 tenantsrcfacts-c6243c407b6c11ebac1977d4669a7d1e

Stop the Analytics driver and disable compaction on the table:

root@Analytics# vsh stop
root@Analytics# nodetool disableautocompaction van_analytics tenantsrcfacts

Issue the cqlsh command so that you can issue commands to the database:

root@Analytics# cqlsh -u cassandra -p cassandra
Connected to D5-VAN1 at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.0.9 | CQL spec 3.4.0 | Native protocol v4] Use HELP for help.
cassandra@cqlsh>

Truncate the table. Truncation drops all the data from the table, but leaves the empty table in place.

cassandra@cqlsh> truncate van_analytics.tenantsrcfacts 
cassandar@cqlsh> exit

Clear snapshots, reenable compaction, and restart database services.

root@Analytics# nodetool clearsnapshot
root@Analytics# nodetool enableautocompaction van_analytics tenantsrcfacts
root@Analytics# exit
admin@Analytics$ vsh start

Display disk usage to determine whether the percentage of disk usage has been reduced.

admin@Analytics$ df -kh /

Troubleshoot Solr Datastore Filling Up Disk

Cluster nodes that run the Solr search engine require large amounts of free disk storage for the Solr datastore, which is stored in subdirectories in the /var/tmp/solr directory. You can reduce disk storage immediately by deleting the contents of the datastore. However, if the root cause of the log load on the cluster remains in place, or if the daily log limit is set too high, the Solr datastore can fill up again. You can implement a number of long-term strategies to reduce future disk storage.

To reduce Solr disk storage immediately:

Delete the contents of the Solr datastore—The Solr datastore typically retains logs for three to seven days. Copies of these logs are archived in the /var/tmp/archive directory, and the logs can be extracted from the archive files. Because the logs are still accessible, it is possible to delete the entire contents of the search engine datastore without violating retention requirements. Normal log processing resumes after the datastore is deleted, and any logs received after the deletion are added to the datastore. To delete the Solr datastore, contact Versa Networks Customer Support.

To reduce Solr disk storage long term:

Reduce daily log limits and reduce retention times for the search engine, as described in Set Daily Log Limits and Increase Search Engine Retention Times, above.
Reduce or avoid exporting traffic monitoring, firewall, and packet capture logs from VOS device to the Analytics cluster. These log types generate a large volume of logs. For information about exporting these log types, see Apply Log Export Functionality. For recommendations about exporting logs, see Versa Analytics Scaling Recommendations.
Export high-volume log types to the Versa Analytics logging service (ALS). For more information, see Configure the Versa Analytics Logging Service.

Supported Software Information

Releases 20.2 and later support all content described in this article.

Additional Information

Access the CLI on a VOS Device
Apply Log Export Functionality
Configure Log Export Functionality
Configure the Versa Analytics Logging Service
Expand Disk Storage for Analytics Nodes
Troubleshoot Log Processing and Log Archiving Issues