Troubleshoot Analytics Database Issues

Last updated
Save as PDF

For supported software information, click here.

For both the DSE and Fusion Analytics platforms, the Analytics database for Analytics clusters is implemented in Apache Cassandra. For the DSE Analytics platform, the search engine is also implemented in Cassandra. Cassandra may switch to a Down state on a node when it encounters certain issues. This articles describes how to troubleshoot a node when the database has switched to a Down state.

To troubleshoot an Analytics node when the database has switched to a Down state:

Verify that the database is down.
Identify and resolve disk full conditions.
Identify and resolve reachability issues.
Restart the database.

Verify that the Database Is Down

To verify that the database is down, issue the nodetool status command from the shell. The node status displays in the area highlighted in yellow in the example below. D indicates that the database is down.

admin@Analytics$ nodetool status
Datacenter: Analytics
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Owns (effective)  Host ID                               Token    Rack                                   Rack
UN  127.0.0.1  9.16 GB    100.0%            c155ed9e-b192-42df-bbc2-9c89f09cb6ad  0        RAC1
...

You can view the /var/log/cassandra/system.log file to check for database errors.

admin@Analytics$ cat /var/log/cassandra/system.log

Identify Disk Full Conditions

Database data is stored in the root filesystem, in subdirectories in the /var/lib/cassandra directory. This section describes how to determine root filesystem disk usage and the proportion of the root filesystem that is used by the database.

To display the amount of disk used by the root filesystem, issue the df -kh / command. For example:

admin@Analytics$ df -kh /
Filesystem               Size  Used Avail Use% Mounted on
/dev/mapper/system-root   71G   31G   37G  46% /

In this example, the root filesystem uses 31 GB of disk, and the total available disk space is about 71 GB. If the output indicates that 70 percent or more of the filesystem is being used, it is likely that the database has failed during the compaction process. Compaction is a required operation that is periodically performed to optimize database tables. This operation temporarily increases disk usage by a large percentage, sometimes 40 percent or more. You can reduce the size of the database or increase disk storage to allow for compaction.

Note: It is recommended that you keep disk space at or below 60 percent usage to leave space for the database compaction process. For more information, see Troubleshoot Cassandra Database Filling Up Disk in Troubleshoot Analytics Disk Storage Issues.

To display the amount of disk used by the Analytics database, issue the vsh dbstatus or du –sh /var/lib/cassandra/data command. For example:

admin@SDWAN-Versa-Analytics$ vsh dbstatus
Datacenter: Analytics
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Owns (effective)  Host ID                               Token                                    Rack
UN  127.0.0.1  9.22 GB    100.0%            c155ed9e-b192-42df-bbc2-9c89f09cb6ad  0                                        RAC1

admin@SDWAN-Versa-Analytics$ du -sh /var/lib/cassandra/data
9.3G /var/lib/cassandra/data

To reduce the amount of disk used by the database, see Troubleshoot Cassandra Database Filling Up Disk in Troubleshoot Analytics Disk Storage Issues.

After you have reduced the percentage of disk used by the database, you can restart it, as described in Restart the Cassandra Database, below.

Troubleshoot Reachability Issues

To identify whether there are reachability issues between the nodes in the cluster, ping the listen address of each node and confirm that all required ports are open between the nodes. Listen addresses are used for internal communication between cluster nodes and are configured in the clustersetup.conf file during initial software configuration. For more information, see Set Up Analytics in Perform Initial Software Configuration. For a list of ports that must be open, see Firewall Requirements.

To verify that a port is reachable between nodes, issue the nc –zvw3 peer-node-listen-address port-number command. The following example shows that port 8983 at IP address 192.10.10.45 is reachable, but port 9042 is not reachable, from the current node:

admin@Analytics$ nc -zvw3 192.10.10.45  8983
Connection to 192.10.10.45 9042 port [tcp/*] succeeded!
admin@Analytics$ nc -zvw3 192.10.10.45  9042
nc: connect to 192.10.10.45 port 9042 (tcp) failed: Connection refused

Resolve any port connection problems, and then restart the database using the procedure in Restart the Cassandra Database, below.

Troubleshoot Transient Errors

If the disk is not full and there are no reachability issues, the database might have a transient error. In this case, restart the database as described below.

Restart the Cassandra Database

The commands you use to restart the Cassandra database depend on whether the cluster is using the DSE or Fusion Analytics platform. To determine which platform a node is running, issue the dse –v command. If the output shows dse-4.5.x or dse-4.8.x, the node is running DSE, If an error message displays, this indicates that the node is running Fusion.

To restart the Cassandra database:

To restart the database in DSE:

admin@Analytics$ sudo service monit stop
admin@Analytics$ sudo service dse stop
admin@Analytics$ sudo service dse start
admin@Analytics$ sudo service monit start

(wait a few minutes)

admin@Analytics$ nodetool status
Datacenter: Analytics
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Owns (effective)  Host ID                               Token    Rack                                   Rack
UN  127.0.0.1  9.16 GB    100.0%            c155ed9e-b192-42df-bbc2-9c89f09cb6ad  0        RAC1
...

The restart process can take up to 20 minutes. After the database restarts, the status should change to U.

To restart the database in Fusion:

admin@Analytics$ vsh db-restart   

(wait a few minutes)

admin@Analytics$ vsh dbstatus
Datacenter: Analytics
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Owns (effective)  Host ID                               Token    Rack                                   Rack
UN  127.0.0.1  9.16 GB    100.0%            c155ed9e-b192-42df-bbc2-9c89f09cb6ad  0        RAC1
...

The restart process can take up to 20 minutes. After the database restarts, the status should change to U.

If you are unable to restart the database, there might be corruption in database commits. Try to clear any pending database commits by issuing the following commands:

admin@Analytics$ sudo su
root@Analytics# cd /var/lib/cassandra/commitlog
root@Analytics# rm -rf *
root@Analytics# cd /var/lib/cassandra/saved_caches 
root@Analytics# rm -rf *
admin@Analytics$

After clearing the commits, restart the database again.
If you are still unable to restart the database, contact Versa Networks Customer Support. When you open the case, attach the file /var/log/cassandra/system.log.

Supported Software Information

Releases 20.2 and later support all content described in this article.

Additional Information

Firewall Requirements
Perform Initial Software Configuration
Troubleshoot Analytics Disk Storage Issues