Checkpoint FW SPLAT Cluster XL Troubleshooting (NGX Versions)

In this blog, let us see a Step by Step approach that you would take to troubleshoot a Cluster XL problem in Checkpoint.

Normally, this problem comes of after a reboot or an Upgrade or as a result on some change in the underlying network. When troubleshooting the CPHA, the following commands are very helpful

  • cphaprob state
  • cphaprob list
  • cphaprob –a if

The status of the CPHA is shown by the first command. First thing that you need to do is verify basic connectivity between the firewall interfaces. Allow the cluster members to ping each other (if not already allowed by the policy) and ping away, you should be able to verify the basic connectivity in this way.

If the connectivity exists, then find out which firewall has the problem, it will normally be the down firewall

Number Unique Address Assigned Load State
1 (local) 10.0.0.1 100% Active
2 10.0.0.2 0% Down

Here as you can see the second firewall is down, you need to first go ahead and just execute ‘cpstop’ and ‘cpstart’ commands on the box. The CPHA should be back up.

Also execute the command ‘cphaprob –a if’ that should give you the interface status and Cluster Xl status.

If this doesn't work, try pushing the policy from the CMA/SC server and then execute cpstop / cpstart commands. If this doesn’t clear out the problem, reboot the box. 90% of the CPHA problems will be resolved in the above given steps, but if still its not resolved then read on.

The above steps were given with the assumption that the CPHA was working and it stopped working suddenly. If this is the first time you are trying to make it happen or some major policy/network changes broke it, then here is what you need to keep in mind.

  • CPHA uses multicast, so make sure the switches you have in the path don’t drop multicast traffic, if you are using Cisco gear, then you might want to configure igmp snoop.
  • It also uses the port ‘8116’ in UDP, so make sure that the port is open between the devices. (This is crucial if you have played around with the global properties and implied rules.
  • RIB uses port 2010 so make sure that is also open between the firewalls.

After checking that your switches, execute the command, ‘cphaprob list’ and make sure both the sides have the same registration and the states of all of them are OK. If the state shows problem, then you should work in the direction of resolving that problem.

Device Name: FIB
Registration number: 4
Timeout: none
Current state: problem
Time since last report: 9.3 sec

Then you can do the snoop using the ‘fw monitor’ command and make sure the communication between the firewalls is happening.

Use the following command

fw monitor –e ‘accept dport=8116;’

  1. You will see the source 0.0.0.0 and the destination as the network ID of the network over port 8116.
  2. On port 2010 (change the dport to 2010 in above capture), you should see FIB communicating between the firewalls.

You should also check the /var/log/messages for error messages that can help you troubleshoot this further.

And yes, I forgot to mention, in this case, i was seeing that the FIB on one firewall had switched itself off after the upgrade, so went ahead and switched on advanced routing using ‘cpconfig’ command and all went well.

Comments

Popular posts from this blog

HA Proxy for Exchange 2010 Deployment & SMTP Restriction

Juniper Aggregate Interfaces (LACP/No LACP)

Configuring Multicasting with Juniper EX switches (Part 1)