Friday, November 26, 2010

/proc/sysrq-trigger的功能

sysrq-trigger功能很多, 之前最常用到是要作一些讓Linux kernel crash的測試時用sysrq-trigger就很方便了

當然要使用前記得先 enable 它, 平常是 disable 的
# echo 1 > /proc/sys/kernel/sysrq
使用方法如下

# 立即重新開機, 但不會 sync 和 umount
echo "b" > /proc/sysrq-trigger

# 立即關機
echo "o" > /proc/sysrq-trigger

# dump 記憶體分配的 information (可以使用 /var/log/message 查看)
echo "m" > /proc/sysrq-trigger

# Will dump the current registers and flags to your console. (應該是跟CPU相關的訊息)
echo "p" > /proc/sysrq-trigger

# Will dump a list of current tasks and their information to your console.
echo "t" > /proc/sysrq-trigger

# Will perform a kexec reboot in order to take a crashdump.
echo "c" > /proc/sysrq-trigger


其他功能可以查閱 Linux kernel 下 Documentation/sysrq.txt 文件

Saturday, November 20, 2010

Cluster Project FAQ

Cluster Project FAQ – Frequently Asked Questions

http://www.robertpeterson.org/faq.html

整理的很好, 有疑問的時候可以先去爬爬文, 以免走冤枉路

Wednesday, November 17, 2010

I/O fencing

就自己爬文和查資料所得的結果, 所謂 I/O fencing 的意義如下(不知是否有理解錯誤??)

什麼是 I/O fencing -

在cluster架構中, I/O fencing 是為了避免

假死或者動作異常的 cluster member 造成

shared media 資料損毀所採取的主動對策

為什麼需要它 -

Fencing 可以避免資料損毀, 降低 cluster 架構中的不確定性

以下以一個簡單的例子說明
*node 1 takes a lock
*node 1 hangs
*node 2 thinks node 1 is dead
*node 2 takes the same lock node 1 took
*node 2 writes data
*node 1 wakes up, still believing it has the lock
*node 1 overwrites data that node 2 just wrote out

加入 fencing 機制後
*node 1 takes a lock
*node 1 hangs
*node 2 thinks node 1 is dead
*node 2 fences node 1
*node 2 takes the same lock node 1 took
*node 2 writes data
*node 1 wakes up, still believing it has the lock
*node 1 tries to write data, but can't since it has lost its I/O paths to the disk
在第二個case中資料損毀的情況被避開了

目前在cluster software使用的技術

I/O Fencing Variants
*power fencing - 假如 power loss, 該 node 就無法做 I/O. 這可以藉由外部的 power switches 來完成
或者藉由整合的 power management 功能來完成, 例如 iLO, IPMI, DRAC, RSA, 等等

*fibre channel zoning - 一般來說是由 fibre channel switches 來切斷 host 到 shared SAN 的路徑

*SCSI-2 reservations (old) - 詳情見SPI-2

*SCSI-3 reservations (group) - 每個 node 須完成對 LUN或者LUNs 的註冊, 詳情見 SPI-3

*Network Disconnect (NAS only) - 當以(NFS, iSCSI, etc)來連接一個 NAS設備, 可以用一個 managed switch 將 node 的 network port 與網路隔離開

*ssh "reboot" - 此方法很不可靠

*virtual machine destruction - 等同於 power fencing

Wednesday, November 10, 2010

小技巧-檢查目錄是否有裝置mount在上面


if ! mountpoint -q /mnt/test; then
mount /mnt/test
fi

Cluster 名詞大集合

大部分資料從 Wikipedia 查到的, 只是作筆記整理
---------------------------------------------
Cluster 的種類 :
1.High-availability (HA) clusters
2.Load-balancing clusters
3.Compute clusters
---------------------------------------------
Two-node cluster - A two-node cluster is the minimal high-availability cluster that can be built.
---------------------------------------------
Classification by role symmetry
1.Active/Passive
One node owns the services, the other one remains inoperative.
Should the primary node fail, the secondary or backup node takes the resources and reactivates the services, while the ex-primary remains in turn inoperative.
This is a configuration where only one node is operative at any point of time.

2.Active/Active
There is no concept of a primary or backup node: both nodes provide some service, should one of these nodes fail, the other must also assume the failed node's services.
---------------------------------------------