Please review the following recommendations and best practices that cover these major cluster topics.
- Planning and Installation
- Backup and Restore
Note: Always review the minimum system requirements. Note that the specifics of your implementation may require more resources than outlined here. Consult your Customer Account Manager or Technical Support for more information.
Know your sizing requirements before you install
The sizing of your server cluster is critical to the stability and performance of your integration solution. Sizing refers to the volume of transactions you expect, as well as the size of the files/data that are part of those transactions.
Generally the larger the file size, the more impact on memory. While there are other factors that determine which sizing requirements are best for you - such as file type - the important thing is to provide the maximum amount of memory allocation possible.
- Evaluate the overall performance and capacity requirements for your data traffic.
- The larger the file size, the more impact it has on memory.
- Good planning prevents problems and poor performance down the line.
Understand the importance of user permissions before you install
Pay careful attention to the installation procedures, especially when assigning permissions to the files and directories that are part of a server cluster. It is critical to understand which types of access should be assigned before installation occurs.
This is especially important when installing a server cluster on a Linux system. There are only two roles that have the ability to change the permissions of a file or directory:
- The owner of the file or directory
- The root user*
* The exception is the postgres user, which should always be used for the Postgres database. Giving root ownership to the database may cause permission problems.
Under most circumstances, only a System Administrator should have access to the root account (and use it to maintain the server cluster environment). The root user is a superuser who has permission to do anything and everything.
Normal users should typically have access to files and directories in their home directory only.
For more information
Refer to the Pre-Installation: Set up your environment section of the Clarify Server Installation Guide. Topics include:
- Create an installation directory and enable sharing
- Create a domain user with administrative privileges
- Create symbolic links
- Ensure certain power-saving settings are disabled
Note: When installing the server cluster database on Linux, the installation must be done as root.
Troubleshooting common problems and recommended solutions
Problem: ClassNotFoundExceptions in Clarify logs.
Possible Cause and solution: This is due to a problem communicating with the Clarify server share and shared workspace Problem communicating with the Clarify server share and shared workspace. Antivirus software locks files, and is often the cause of this error. As stated in the Cluster installation guide, do not run virus protection software over the Server, Share, and Database directories. Turn off if running.
Also, be aware that other issues may cause this error. It can also be replication, snapshots, VMotion and other automated load balancing, network failures, etc. Basically, anything that causes downtime on the Share machine or interrupts the cluster's communication with it can impede communication.
Problem: Node excluded due to network failures.
Possible Cause and solution: Network card power settings may be set to sleep.
As stated in the Cluster installation guide, default power save settings on Windows servers may disable network adapters; doing so will cause Server Cluster performance to interrupt.
Always make sure that the power save settings for network adapters (on all servers) are disabled. This can usually be done through the Device Manager.
Problem: Node excluded due to replication.
Possible Cause and solution: Be aware that replication tools can cause a server node to inadvertently stop or pause.
Problem: Node excluded due to VM snapshots and VMotion.
Possible Cause and solution: Be aware that snapshots and VMotion can cause a virtual machine server node to inadvertently stop or pause.
Problem: Cluster appears to be too slow or experiences slowness.
Possible Cause and solution: The sharing of CPU resources is often the cause. CPU shares are not equivalent to percentages of CPU resources. Shares are used to define the relative importance of workloads in relation to other workloads. When you assign CPU shares to a project, your primary concern is not the number of shares the project has. Knowing how many shares the project has in comparison with other projects is more important. You must also take into account how many of those other projects will be competing with it for CPU resources.
Problem: Out of memory
Possible Cause and solution: Sharing of memory between Clarify and other applications is not recommended. The server cluster needs exclusive memory.
What happens to jobs being processed when an active server node fails?
Whenever a server node experiences some type of failure event, there may be Business Processes and related tasks caught in-process. The way in which these processes are affected may differ - it may cause tasks to abort, or partially complete. The important thing is to know how to identify any failed processes, and the recommended steps for recovery.
Typically, the following actions should take place:
- Check the Studio’s Admin Console Auditor for failures and specific details as to which processes and tasks may have failed.
- Re-process any Business Process that did not complete.
- Check for other, non-Clarify systems/processes that may need to be manually restarted or further analyzed.
Server cluster backups
Cleo recommends backing up the database and file system in one comprehensive process. It is imperative that these are both backed up together.
- Database backup: creates a backup of the database using a script provided by Cleo; a corresponding script can then restore the database from the backup.
- File System backup: requires use of a third-party tool (or a set of scripts to copy the files to another location) that you choose; this must backup important Clarify file system resources.
- Replication: can duplicate your entire Clarify environment in almost real-time. This is accomplished by replicating the database via a PostgreSQL tool and by replicating Clarify file system resources via a third-party replication tool.
For more information
- How to gracefully shut down a Server Cluster for an OS backup or Windows update (off-line update)
- Backup and Restore (including on-line procedures)