Wednesday, November 17, 2010

I/O fencing

就自己爬文和查資料所得的結果, 所謂 I/O fencing 的意義如下(不知是否有理解錯誤??)

什麼是 I/O fencing -

在cluster架構中, I/O fencing 是為了避免

假死或者動作異常的 cluster member 造成

shared media 資料損毀所採取的主動對策

為什麼需要它 -

Fencing 可以避免資料損毀, 降低 cluster 架構中的不確定性

以下以一個簡單的例子說明
*node 1 takes a lock
*node 1 hangs
*node 2 thinks node 1 is dead
*node 2 takes the same lock node 1 took
*node 2 writes data
*node 1 wakes up, still believing it has the lock
*node 1 overwrites data that node 2 just wrote out

加入 fencing 機制後
*node 1 takes a lock
*node 1 hangs
*node 2 thinks node 1 is dead
*node 2 fences node 1
*node 2 takes the same lock node 1 took
*node 2 writes data
*node 1 wakes up, still believing it has the lock
*node 1 tries to write data, but can't since it has lost its I/O paths to the disk
在第二個case中資料損毀的情況被避開了

目前在cluster software使用的技術

I/O Fencing Variants
*power fencing - 假如 power loss, 該 node 就無法做 I/O. 這可以藉由外部的 power switches 來完成
或者藉由整合的 power management 功能來完成, 例如 iLO, IPMI, DRAC, RSA, 等等

*fibre channel zoning - 一般來說是由 fibre channel switches 來切斷 host 到 shared SAN 的路徑

*SCSI-2 reservations (old) - 詳情見SPI-2

*SCSI-3 reservations (group) - 每個 node 須完成對 LUN或者LUNs 的註冊, 詳情見 SPI-3

*Network Disconnect (NAS only) - 當以(NFS, iSCSI, etc)來連接一個 NAS設備, 可以用一個 managed switch 將 node 的 network port 與網路隔離開

*ssh "reboot" - 此方法很不可靠

*virtual machine destruction - 等同於 power fencing

No comments: