Source code editor What Is Ajax
Managing a MySQL Cluster involves a number of tasks, the first of which is to configure and start MySQL Cluster. This is covered in Section 15.3, “MySQL Cluster Configuration”, and Section 15.5, “Process Management in MySQL Cluster”.
The following sections cover the management of a running MySQL Cluster.
There are essentially two methods of actively managing a running MySQL Cluster. The first of these is through the use of commands entered into the management client whereby cluster status can be checked, log levels changed, backups started and stopped, and nodes stopped and started. The second method involves studying the contents of the cluster log
ndb_; this is usually found in the management server's
DataDir directory, but this location can be overridden using the
LogDestination option — see Section 188.8.131.52, “Defining the Management Server”, for details. (Recall that
node_id represents the unique identifier of the node whose activity is being logged.) The cluster log contains event reports generated by ndbd. It is also possible to send cluster log entries to a Unix system log.
In addition, some aspects of the cluster's operation can be monitored from an SQL node using the
SHOW ENGINE NDB STATUS statement. See Section 184.108.40.206, “
SHOW ENGINE Syntax”, for more information.
This section provides a simplified outline of the steps involved when MySQL Cluster data nodes are started. More complete information can be found in MySQL Cluster Start Phases.
These phases are the same as those reported in the output from the
command in the management client. (See Section 15.6.2, “Commands in the MySQL Cluster Management Client”, for more information about this command.)
Start types. There are several different startup types and modes, as shown here:
Initial Start. The cluster starts with a clean filesystem on all data nodes. This occurs either when the cluster started for the very first time, or when it is restarted using the
Disk Data files are not removed when restarting a node using
System Restart. The cluster starts and reads data stored in the data nodes. This occurs when the cluster has been shut down after having been in use, when it is desired for the cluster to resume operations from the point where it left off.
Node Restart. This is the online restart of a cluster node while the cluster itself is running.
Initial Node Restart. This is the same as a node restart, except that the node is reinitialized and started with a clean filesystem.
Setup and initialization (Phase -1). Prior to startup, each data node (ndbd process) must be initialized. Initialization consists of the following steps:
Obtain a node ID
Fetch configuration data
Allocate ports to be used for inter-node communications
Allocate memory according to settings obtained from the configuration file
When a data node or SQL node first connects to the management node, it reserves a cluster node ID. To make sure that no other node allocates the same node ID, this ID is retained until the node has managed to connect to the cluster and at least one ndbd reports that this node is connected. This retention of the node ID is guarded by the connection between the node in question and ndb_mgmd.
Normally, in the event of a problem with the node, the node disconnects from the management server, the socket used for the connection is closed, and the reserved node ID is freed. However, if a node is disconnected abruptly — for example, due to a hardware failure in one of the cluster hosts, or because of network issues — the normal closing of the socket by the operating system may not take place. In this case, the node ID continues to be reserved and not released until a TCP timeout occurs 10 or so minutes later.
To take care of this problem, you can use
PURGE STALE SESSIONS. Running this statement forces all reserved node IDs to be checked; any that are not being used by nodes actually connected to the cluster are then freed.
Beginning with MySQL 5.1.11, timeout handling of node ID assignments is implemented. This performs the ID usage checks automatically after approximately 20 seconds, so that
PURGE STALE SESSIONS should no longer be necessary in a normal Cluster start.
After each data node has been initialized, the cluster startup process can proceed. The stages which the cluster goes through during this process are listed here:
Phase 0. The
NDBCNTR blocks start (see
NDB Kernel Blocks). The cluster filesystem is cleared, if the cluster was started with the
Phase 1. In this stage, all remaining
NDB kernel blocks are started. Cluster connections are set up, inter-block communications are established, and Cluster heartbeats are started. In the case of a node restart, API node connections are also checked.
When one or more nodes hang in Phase 1 while the remaining node or nodes hang in Phase 2, this often indicates network problems. One possible cause of such issues is one or more cluster hosts having multiple network interfaces. Another common source of problems causing this condition is the blocking of TCP/IP ports needed for communications between cluster nodes. In the latter case, this is often due to a misconfigured firewall.
Phase 2. The
NDBCNTR kernel block checks the states of all existing nodes. The master node is chosen, and the cluster schema file is initialized.
Phase 3. The
DBTC kernel blocks set up communications between them. The startup type is determined; if this is a restart, the
DBDIH block obtains permission to perform the restart.
Phase 4. For an initial start or initial node restart, the redo log files are created. The number of these files is equal to
For a system restart:
Read schema or schemas.
Read data from the local checkpoint.
Apply all redo information until the latest restorable global checkpoint has been reached.
For a node restart, find the tail of the redo log.
Phase 5. Most of the database-related portion of a data node start is perfomed during this phase. For an initial start or system restart, a local checkpoint is executed, followed by a global checkpoint. Periodic checks of memory usage begin during this phase, and any required node takeovers are performed.
Phase 6. In this phase, node groups are defined and set up.
Phase 7. The arbitrator node is selected and begins to function. The next backup ID is set, as is the backup disk write speed. Nodes reaching this start phase are marked as
Started. It is now possible for API nodes (including SQL nodes) to connect to the cluster. connect.
Phase 8. If this is a system restart, all indexes are rebuilt (by
Phase 9. The node internal startup variables are reset.
Phase 100 (OBSOLETE). Formerly, it was at this point during a node restart or initial node restart that API nodes could connect to the node and begin to receive events. Currently, this phase is empty.
Phase 101. At this point in a node restart or initial node restart, event delivery is handed over to the node joining the cluster. The newly-joined node takes over responsibility for delivering its primary data to subscribers. This phase is also referred to as
SUMA handover phase.
After this process is completed for an initial start or system restart, transaction handling is enabled. For a node restart or initial node restart, completion of the startup process means that the node may now act as a transaction coordinator.
In addition to the central configuration file, a cluster may also be controlled through a command-line interface available through the management client ndb_mgm. This is the primary administrative interface to a running cluster.
Commands for the event logs are given in Section 15.6.3, “Event Reports Generated in MySQL Cluster”; commands for creating backups and restoring from backup are provided in Section 15.7, “On-line Backup of MySQL Cluster”.
The management client has the following basic commands. In the listing that follows,
node_id denotes either a database node ID or the keyword
ALL, which indicates that the command should be applied to all of the cluster's data nodes.
Displays information on all available commands.
Displays information on the cluster's status.
Note: In a cluster where multiple management nodes are in use, this command displays information only for data nodes that are actually connected to the current management server.
Brings online the data node identified by
node_id (or all data nodes).
Beginning with MySQL 5.0.19, this command can also be used to individual management nodes online. Note:
ALL START continues to affect data nodes only.
Important: To use this command to bring a data node online, the data node must have been started using ndbd --nostart or ndbd -n.
Stops the data node identified by
node_id (or all data nodes).
Beginning with MySQL 5.0.19, this command can also be used to stop individual management nodes. Note:
ALL STOP continues to affect data nodes only.
A node affected by this command disconnects from the cluster, and its associated ndbd or ndb_mgmd process terminates.
node_id RESTART [-n] [-i] [-a]
Restarts the data node identified by
node_id (or all data nodes).
-i option with
RESTART causes the data node to perform an initial restart; that is, the node's filesystem is deleted and recreated. The effect is the same as that obtained from stopping the data node process and then starting it again using ndbd --initial from the system shell.
-n option causes the data node process to be restarted, but the data node is not actually brought online until the appropriate
START command is issued. The effect of this option is the same as that obtained from stopping the data node and then starting it again using
ndbd --nostart or
ndbd -n from the system shell.
-a causes all current transactions relying on this node to be aborted. No GCP check is done when the node rejoins the cluster.
Displays status information for the data node identified by
node_id (or for all data nodes).
ENTER SINGLE USER MODE
Enters single user mode, whereby only the MySQL server identified by the node ID
node_id is allowed to access the database.
Important: Do not attempt to have data nodes join the cluster while it is running in single user mode. Doing so can cause subsequent multiple node failures. Beginning with MySQL 5.1.12, it is no longer possible to add nodes while in single user mode. (See Bug#20395 for more information.)
EXIT SINGLE USER MODE
Exits single user mode, allowing all SQL nodes (that is, all running mysqld processes) to access the database.
Terminates the management client.
This command does not affect any nodes connected to the cluster.
Shuts down all cluster data nodes and management nodes. To exit the management client after this has been done, use
This command does not shut down any SQL nodes or API nodes that are connected to the cluster.
In this section, we discuss the types of event logs provided by MySQL Cluster, and the types of events that are logged.
MySQL Cluster provides two types of event log:
The cluster log, which includes events generated by all cluster nodes. The cluster log is the log recommended for most uses because it provides logging information for an entire cluster in a single location.
By default, the cluster log is saved to a file named
node_id is the node ID of the management server) in the same directory where the ndb_mgm binary resides.
Cluster logging information can also be sent to
stdout or a
syslog facility in addition to or instead of being saved to a file, as determined by the values set for the
LogDestination configuration parameters. See Section 220.127.116.11, “Defining the Management Server”, for more information about these parameters.
Node logs are local to each node.
Output generated by node event logging is written to the file
node_id is the node's node ID) in the node's
DataDir. Node event logs are generated for both management nodes and data nodes.
Node logs are intended to be used only during application development, or for debugging application code.
Both types of event logs can be set to log different subsets of events.
Category: This can be any one of the following values:
Priority: This is represented by one of the numbers from 1 to 15 inclusive, where 1 indicates “most important” and 15 “least important.”
Severity Level: This can be any one of the following values:
Both the cluster log and the node log can be filtered on these properties.
The format used in the cluster log is as shown here:
2007-01-26 19:35:55 [MgmSrvr] INFO -- Node 1: Data usage is 2%(60 32K pages of total 2560) 2007-01-26 19:35:55 [MgmSrvr] INFO -- Node 1: Index usage is 1%(24 8K pages of total 2336) 2007-01-26 19:35:55 [MgmSrvr] INFO -- Node 1: Resource 0 min: 0 max: 639 curr: 0 2007-01-26 19:35:55 [MgmSrvr] INFO -- Node 2: Data usage is 2%(76 32K pages of total 2560) 2007-01-26 19:35:55 [MgmSrvr] INFO -- Node 2: Index usage is 1%(24 8K pages of total 2336) 2007-01-26 19:35:55 [MgmSrvr] INFO -- Node 2: Resource 0 min: 0 max: 639 curr: 0 2007-01-26 19:35:55 [MgmSrvr] INFO -- Node 3: Data usage is 2%(58 32K pages of total 2560) 2007-01-26 19:35:55 [MgmSrvr] INFO -- Node 3: Index usage is 1%(25 8K pages of total 2336) 2007-01-26 19:35:55 [MgmSrvr] INFO -- Node 3: Resource 0 min: 0 max: 639 curr: 0 2007-01-26 19:35:55 [MgmSrvr] INFO -- Node 4: Data usage is 2%(74 32K pages of total 2560) 2007-01-26 19:35:55 [MgmSrvr] INFO -- Node 4: Index usage is 1%(25 8K pages of total 2336) 2007-01-26 19:35:55 [MgmSrvr] INFO -- Node 4: Resource 0 min: 0 max: 639 curr: 0 2007-01-26 19:39:42 [MgmSrvr] INFO -- Node 4: Node 9 Connected 2007-01-26 19:39:42 [MgmSrvr] INFO -- Node 1: Node 9 Connected 2007-01-26 19:39:42 [MgmSrvr] INFO -- Node 1: Node 9: API version 5.1.15 2007-01-26 19:39:42 [MgmSrvr] INFO -- Node 2: Node 9 Connected 2007-01-26 19:39:42 [MgmSrvr] INFO -- Node 2: Node 9: API version 5.1.15 2007-01-26 19:39:42 [MgmSrvr] INFO -- Node 3: Node 9 Connected 2007-01-26 19:39:42 [MgmSrvr] INFO -- Node 3: Node 9: API version 5.1.15 2007-01-26 19:39:42 [MgmSrvr] INFO -- Node 4: Node 9: API version 5.1.15 2007-01-26 19:59:22 [MgmSrvr] ALERT -- Node 2: Node 7 Disconnected 2007-01-26 19:59:22 [MgmSrvr] ALERT -- Node 2: Node 7 Disconnected
Each line in the cluster log contains the following information:
A timestamp in
The type of node which is performing the logging. In the cluster log, this is always
The severity of the event.
The ID of the node reporting the event.
A description of the event. The most common types of events to appear in the log are connections and disconnections between different nodes in the cluster, and when checkpoints occur. In some cases, the description may contain status information.
Turns the cluster log on.
Turns the cluster log off.
Provides information about cluster log settings.
category events with priority less than or equal to
threshold in the cluster log.
Toggles cluster logging of events of the specified
The following table describes the default setting (for all data nodes) of the cluster log category threshold. If an event has a priority with a value lower than or equal to the priority threshold, it is reported in the cluster log.
Note that events are reported per data node, and that the threshold can be set to different values on different nodes.
|Category||Default threshold (All data nodes)|
STATISTICS category can provide a great deal of useful data. See Section 18.104.22.168, “Using
CLUSTERLOG STATISTICS”, for more information.
Thresholds are used to filter events within each category. For example, a
STARTUP event with a priority of 3 is not logged unless the threshold for
STARTUP is changed to 3 or lower. Only events with priority 3 or lower are sent if the threshold is 3.
The following table shows the event severity levels. (Note: These correspond to Unix
syslog levels, except for
LOG_NOTICE, which are not used or mapped.)
|1||A condition that should be corrected immediately, such as a corrupted system database|
|2||Critical conditions, such as device errors or insufficient resources|
|3||Conditions that should be corrected, such as configuration errors|
|4||Conditions that are not errors, but that might require special handling|
|6||Debugging messages used for |
Event severity levels can be turned on or off (using
CLUSTERLOG FILTER — see above). If a severity level is turned on, then all events with a priority less than or equal to the category thresholds are logged. If the severity level is turned off then no events belonging to that severity level are logged.
An event report reported in the event logs has the following format:
09:19:30 2005-07-24 [NDB] INFO -- Node 4 Start phase 4 completed
This section discusses all reportable events, ordered by category and severity level within each category.
These events are associated with connections between Cluster nodes.
|data nodes connected||8||Data nodes connected|
|data nodes disconnected||8||Data nodes disconnected|
|Communication closed||8||SQL node or data node connection closed|
|Communication opened||8||SQL node or data node connection opened|
The logging messages shown here are associated with checkpoints.
|LCP stopped in calc keep GCI||0||LCP stopped|
|Local checkpoint fragment completed||11||LCP on a fragment has been completed|
|Global checkpoint completed||10||GCP finished|
|Global checkpoint started||9||Start of GCP: REDO log is written to disk|
|Local checkpoint completed||8||LCP completed normally|
|Local checkpoint started||7||Start of LCP: data written to disk|
|Report undo log blocked||7||UNDO logging blocked; buffer near overflow|
The following events are generated in response to the startup of a node or of the cluster and of its success or failure. They also provide information relating to the progress of the startup process, including information concerning logging activities.
|Internal start signal received STTORRY||15||Blocks received after completion of restart|
|Undo records executed||15|
|New REDO log started||10||GCI keep |
|New log started||10||Log part |
|Node has been refused for inclusion in the cluster||8||Node cannot be included in cluster due to misconfiguration, inability to establish communication, or other problem|
|data node neighbors||8||Shows neighboring data nodes|
|data node start phase ||4||A data node start phase has been completed|
|Node has been successfully included into the cluster||3||Displays the node, managing node, and dynamic ID|
|data node start phases initiated||1||NDB Cluster nodes starting|
|data node all start phases completed||1||NDB Cluster nodes started|
|data node shutdown initiated||1||Shutdown of data node has commenced|
|data node shutdown aborted||1||Unable to shut down data node normally|
The following events are generated when restarting a node and relate to the success or failure of the node restart process.
|Node failure phase completed||8||Reports completion of node failure phases|
|Node has failed, node state was ||8||Reports that a node has failed|
|Report arbitrator results||2||There are eight different possible results for arbitration attempts: |
|Completed copying a fragment||10|
|Completed copying of dictionary information||8|
|Completed copying distribution information||8|
|Starting to copy fragments||8|
|Completed copying all fragments||8|
|GCP takeover started||7|
|GCP takeover completed||7|
|LCP takeover started||7|
|LCP takeover completed (state = ||7|
|Report whether an arbitrator is found or not||6||There are seven different possible outcomes when seeking an arbitrator: |
The following events are of a statistical nature. They provide information such as numbers of transactions and other operations, amount of data sent or received by individual nodes, and memory usage.
|Report job scheduling statistics||9||Mean internal job scheduling statistics|
|Sent number of bytes||9||Mean number of bytes sent to node |
|Received # of bytes||9||Mean number of bytes received from node |
|Report transaction statistics||8||Numbers of: transactions, commits, reads, simple reads, writes, concurrent operations, attribute information, and aborts|
|Report operations||8||Number of operations|
|Report table create||7|
|Memory usage||5||Data and index memory usage (80%, 90%, and 100%)|
These events relate to Cluster errors and warnings. The presence of one or more of these generally indicates that a major malfunction or failure has occurred.
|Dead due to missed heartbeat||8||Node |
|Missed heartbeats||8||Node |
|General warning events||2|
These events provide general information about the state of the cluster and activities associated with Cluster maintenance, such as logging and heartbeat transmission.
|Sent heartbeat||12||Heartbeat sent to node |
|Create log bytes||11||Log part, log file, MB|
|General information events||2|
NDB management client's
CLUSTERLOG STATISTICS command can provide a number of useful statistics in its output. The following statistics are reported by the transaction coordinator:
|Statistic||Description (Number of...)|
|Transactions attempted with this node as coordinator|
|Transactions committed with this node as coordinator|
|Primary key reads (all)|
|Primary key reads reading the latest committed value|
|Primary key writes (includes all |
|Data words used to describe all reads and writes received|
|All concurrent operations ongoing at the moment the report is taken|
|Transactions with this node as coordinator that were aborted|
Read any incoming messages from sockets into a job buffer.
Check whether there are any timed messages to be executed; if so, put these into the job buffer as well.
Execute (in a loop) any messages in the job buffer.
Send any distributed messages that were generated by executing the messages in the job buffer.
Wait for any new incoming messages.
The number of loops executed in the third step is reported as the
Mean Loop Counter. This statistic increases in size as the utilisation of the TCP/IP buffer improves. You can use this to monitor performance as you add new processes to the cluster.
Mean send size and
Mean receive size statistics allow you to gauge the efficiency of writes and reads (respectively) between nodes. These values are given in bytes. Higher values mean a lower cost per byte sent or received; the maximum is 64k.
To cause all cluster log statistics to be logged, you can use the following command in the
NDB management client:
ALL CLUSTERLOG STATISTICS=15
Note: Setting the threshold for
STATISTICS to 15 causes the cluster log to become very verbose, and to gow quite rapidly in size, in direct proportion to the number of cluster nodes and the amount of activity on the cluster.
Single user mode allows the database administrator to restrict access to the database system to a single API node, such as a MySQL server (SQL node) or an instance of ndb_restore. When entering single user mode, connections to all other API nodes are closed gracefully and all running transactions are aborted. No new transactions are permitted to start.
Once the cluster has entered single user mode, only the designated API node is granted access to the database.
You can use the ALL STATUS command to see when the cluster has entered single user mode.
ENTER SINGLE USER MODE 5
After this command has executed and the cluster has entered single user mode, the API node whose node ID is
5 becomes the cluster's only permitted user.
The node specified in the preceding command must be an API node; attempting to specify any other type of node will be rejected.
Note: When the preceding commmand is invoked, all transactions running on the designated node are aborted, the connection is closed, and the server must be restarted.
The command EXIT SINGLE USER MODE changes the state of the cluster's data nodes from single user mode to normal mode. API nodes — such as MySQL Servers — waiting for a connection (that is, waiting for the cluster to become ready and available), are again permitted to connect. The API node denoted as the single-user node continues to run (if still connected) during and after the state change.
EXIT SINGLE USER MODE
There are two recommended ways to handle a node failure when running in single user mode:
Finish all single user mode transactions
Issue the EXIT SINGLE USER MODE command
Restart the cluster's data nodes
Restart database nodes prior to entering single user mode.
This section discusses several SQL statements that can prove useful in managing and monitoring a MySQL server that is connected to a MySQL Cluster, and in some cases provide information about the cluster itself.
SHOW ENGINE NDB STATUS,
SHOW ENGINE NDBCLUSTER STATUS
The output of this statement contains information about the server's connection to the cluster, creation and usage of MySQL Cluster objects, and binary logging for MySQL Cluster replication.
See Section 22.214.171.124, “
SHOW ENGINE Syntax”, for a usage example and more detailed information.
SHOW ENGINES [LIKE 'NDB%']
This statement can be used to determine whether or not clustering support is enabled in the MySQL server, and if so, whether it is active.
See Section 126.96.36.199, “
SHOW ENGINES Syntax”, for more detailed information.
SHOW VARIABLES LIKE 'NDB%'
This statement provides a list of most server system variables relating to the
NDB storage engine, and their values, as shown here:
SHOW VARIABLES LIKE 'NDB%';+-------------------------------------+-------+ | Variable_name | Value | +-------------------------------------+-------+ | ndb_autoincrement_prefetch_sz | 32 | | ndb_cache_check_time | 0 | | ndb_extra_logging | 0 | | ndb_force_send | ON | | ndb_index_stat_cache_entries | 32 | | ndb_index_stat_enable | OFF | | ndb_index_stat_update_freq | 20 | | ndb_report_thresh_binlog_epoch_slip | 3 | | ndb_report_thresh_binlog_mem_usage | 10 | | ndb_use_copying_alter_table | OFF | | ndb_use_exact_count | ON | | ndb_use_transactions | ON | +-------------------------------------+-------+
See Section 5.2.3, “System Variables”, for more information.
SHOW STATUS LIKE 'NDB%'
This statement shows at a glance whether or not the MySQL server is acting as a cluster SQL node, and if so, it provides the MySQL server's cluster node ID, the hostname and port for the cluster management server to which it is connected, and the number of data nodes in the cluster, as shown here:
SHOW STATUS LIKE 'NDB%';+--------------------------+---------------+ | Variable_name | Value | +--------------------------+---------------+ | Ndb_cluster_node_id | 10 | | Ndb_config_from_host | 192.168.0.103 | | Ndb_config_from_port | 1186 | | Ndb_number_of_data_nodes | 4 | +--------------------------+---------------+
If the MySQL server was built with clustering support, but it is not connected to a cluster, all rows in the output of this statement contain a zero or an empty string:
SHOW STATUS LIKE 'NDB%';+--------------------------+-------+ | Variable_name | Value | +--------------------------+-------+ | Ndb_cluster_node_id | 0 | | Ndb_config_from_host | | | Ndb_config_from_port | 0 | | Ndb_number_of_data_nodes | 0 | +--------------------------+-------+
See also Section 188.8.131.52, “
SHOW STATUS Syntax”.
Source code editor What Is Ajax