Curse666

Thursday, January 13, 2011

Drop Caches

參考http://linux-mm.org/Drop_Cache

Linux kernel 2.6.16 之後的版本提供了 drop caches (包含 pagecache, indoe 和 dentry caches) 的機制,

此機制可以幫忙我們釋放大量的記憶體. 現在你可以扔掉以前只是為了去除 cache 而配置許多記憶體的 script

現在只需要操作 /proc/sys/vm/drop_caches 就可以完成

1. 釋放 pagecache:
echo 1 > /proc/sys/vm/drop_caches

2. 釋放 dentries 和 inodes:
echo 2 > /proc/sys/vm/drop_caches

3. 釋放 pagecache, dentries 和 inodes:
echo 3 > /proc/sys/vm/drop_caches

這是非破壞性的動作而且只會釋放完全沒在使用的部份

Dirty objects 在尚未寫到 disk 前其狀態將會維持在使用中而且是不能被釋放的
假如先執行 "sync" 使其 flush 到 disk 那 drop cache 動作應該會釋放更多記憶體

Friday, December 10, 2010

build kernel 遇到 PATH_MAX undeclared

今天在 Ubuntu 10.10 build 2.6.18.8 的 kernel 時遇到這個錯誤

爬文後發現很多人都有遇到這個問題

年紀大了還是寫下來以免忘記

修改 scripts/mod/sumversion.c

多 include limits.h 就 OK了



#include <linux/limits.h>

Windows 2008 R2 安裝 AD

在 Windows 2008 R2 安裝 AD 的簡易部簡易步驟

1. Set a static IP and set the same IP to DNS

2. start->run-> type dcpromo

3. Create a new Domain

4. Set FQDN ex: test.com

5. Set password for ADDR

Friday, November 26, 2010

/proc/sysrq-trigger的功能

sysrq-trigger功能很多, 之前最常用到是要作一些讓Linux kernel crash的測試時用sysrq-trigger就很方便了

當然要使用前記得先 enable 它, 平常是 disable 的
# echo 1 > /proc/sys/kernel/sysrq
使用方法如下



# 立即重新開機, 但不會 sync 和 umount

echo "b" > /proc/sysrq-trigger



# 立即關機

echo "o" > /proc/sysrq-trigger



# dump 記憶體分配的 information (可以使用 /var/log/message 查看)

echo "m" > /proc/sysrq-trigger



# Will dump the current registers and flags to your console. (應該是跟CPU相關的訊息)

echo "p" > /proc/sysrq-trigger



# Will dump a list of current tasks and their information to your console.

echo "t" > /proc/sysrq-trigger



# Will perform a kexec reboot in order to take a crashdump.

echo "c" > /proc/sysrq-trigger

其他功能可以查閱 Linux kernel 下 Documentation/sysrq.txt 文件

Saturday, November 20, 2010

Cluster Project FAQ

Cluster Project FAQ – Frequently Asked Questions

http://www.robertpeterson.org/faq.html

整理的很好, 有疑問的時候可以先去爬爬文, 以免走冤枉路

Wednesday, November 17, 2010

I/O fencing

就自己爬文和查資料所得的結果, 所謂 I/O fencing 的意義如下(不知是否有理解錯誤??)

什麼是 I/O fencing -

在cluster架構中, I/O fencing 是為了避免

假死或者動作異常的 cluster member 造成

shared media 資料損毀所採取的主動對策

為什麼需要它 -

Fencing 可以避免資料損毀, 降低 cluster 架構中的不確定性

以下以一個簡單的例子說明
*node 1 takes a lock
*node 1 hangs
*node 2 thinks node 1 is dead
*node 2 takes the same lock node 1 took
*node 2 writes data
*node 1 wakes up, still believing it has the lock
*node 1 overwrites data that node 2 just wrote out

加入 fencing 機制後
*node 1 takes a lock
*node 1 hangs
*node 2 thinks node 1 is dead
*node 2 fences node 1
*node 2 takes the same lock node 1 took
*node 2 writes data
*node 1 wakes up, still believing it has the lock
*node 1 tries to write data, but can't since it has lost its I/O paths to the disk
在第二個case中資料損毀的情況被避開了

目前在cluster software使用的技術

I/O Fencing Variants
*power fencing - 假如 power loss, 該 node 就無法做 I/O. 這可以藉由外部的 power switches 來完成
或者藉由整合的 power management 功能來完成, 例如 iLO, IPMI, DRAC, RSA, 等等

*fibre channel zoning - 一般來說是由 fibre channel switches 來切斷 host 到 shared SAN 的路徑

*SCSI-2 reservations (old) - 詳情見SPI-2

*SCSI-3 reservations (group) - 每個 node 須完成對 LUN或者LUNs 的註冊, 詳情見 SPI-3

*Network Disconnect (NAS only) - 當以(NFS, iSCSI, etc)來連接一個 NAS設備, 可以用一個 managed switch 將 node 的 network port 與網路隔離開

*ssh "reboot" - 此方法很不可靠

*virtual machine destruction - 等同於 power fencing

Wednesday, November 10, 2010

小技巧-檢查目錄是否有裝置mount在上面



if ! mountpoint -q /mnt/test; then

    mount /mnt/test

fi

Cluster 名詞大集合

大部分資料從 Wikipedia 查到的, 只是作筆記整理
---------------------------------------------
Cluster 的種類 :
1.High-availability (HA) clusters
2.Load-balancing clusters
3.Compute clusters
---------------------------------------------
Two-node cluster - A two-node cluster is the minimal high-availability cluster that can be built.
---------------------------------------------
Classification by role symmetry
1.Active/Passive
One node owns the services, the other one remains inoperative.
Should the primary node fail, the secondary or backup node takes the resources and reactivates the services, while the ex-primary remains in turn inoperative.
This is a configuration where only one node is operative at any point of time.

2.Active/Active
There is no concept of a primary or backup node: both nodes provide some service, should one of these nodes fail, the other must also assume the failed node's services.
---------------------------------------------

Tuesday, October 26, 2010

PCIe study part (1)

PCIe
● topology : Point-to-Point
● frequency: 2.5 GHz
● width : 2 wires per lane/per direction (Up to 32 lanes)
● Max. throughput : 500~16000 MB/sec
● Split transaction : Yes
● Dev. type: Root complex, Switches, Bridges, Endpoints
● Layer : Physical layer, Data Link layer, Transaction layer

● PCIe employs packets to accomplish data transfer between devices--
Transaction Layer Packets (TLPs)

● There are 4 types of transactions--
Memory, I/O, Configuration, Messages

● What is transactionsseries ?
series of packet transmissions to complete information exchange between
requester and completer

● Posted = Requester transmits TLP request packet, completer
does not return completion packet --> Memory Write, Message

● Non-posted = Requester transmits TLP request packet,
completer returns completion packet to requester,
May include data in the request or completion TLP
--> Memory Read, Memory Read Lock, I/O Read, I/O Write, Config Read, Config Write

High Availability systems under Linux

文章閱讀筆記

#
What is HA?
High Availability is what it says it is.
Something that is Highly Available.

#
The service runs on a machine, and redirecting the service and requests to another healthy machine is the art of High Availability.

#
How will this work
if master node fails, then the slave node may take over its ip address and start serving the requests.This method is called IP takeover.

#
How do clusters talk
They will talk to each other over a serial cable and over a cross link Ethernet cable (for redundancy, serial cable or Ethernet cable may fail) and check each others heartbeat.
The program to monitor the heartbeats of the cluster nodes is called... guess...heartbeat.
heartbeat is available at http://www.linux-ha.org/download/
The program for ip address take over is called fake and is integrated in heartbeat.

#
What about data integrity issues
When service httpd moves from node1 to node2 it does not see the same data. I loose all the files that I was creating with my httpd CGI's.

Two Answers:
1. You should never write to file from your CGI's. (use a network database instead.. MySQL is pretty good)
2. You can attach the two nodes to a central external SCSI storage, and make sure that only one is talking to it at one time, and also make sure that you change the SCSI id of the host card on machine a to 6 and leave on machine b 7 or vice -versa.

You can run GFS (Global File System, see below in resources) over FC which allows you to have transparent access to the storage from all machines as if they were local storage.

#
What about active/active cluster

You can easily build an Active/Active server if you have a good storage system that allows concurrent access.
Examples are Fibrechannel and GFS.

If you are content with Network filesystems such as NFS, you may use that, but I would not suggest that.

Anyway, you can map serviceA to clustnode1 and serviceB to clustnode2 example of my haresource file

clustnode2 172.23.2.13 mysql
clustnode1 172.23.2.14 ldap
clustnode2 172.23.2.15 cyrus

I use GFS for storage so I don't have a problem with concurrent access to data and can run as many services as is manageable by these machines.
Here clustnode2 is the master for mysql and cyrus which clustnode1 is the master for ldap.
If clustnode2 goes down then clustnode1 takes over all the ip addresses and the services.

Wednesday, October 20, 2010

Advantages of Ethernet Tunneling through PCI Express

There are many advantages to tunneling Ethernet packets through PCIe, including:

1. PCIe has higher bandwidth than Ethernet (64 Gbps today vs. 10 Gbps);
2. PCIe has lower latency compared to any other protocol;
3. PCIe has the lowest per port pricing (1/10) compared to any higher-level protocol;
4. Ethernet tunneling saves power and cost; and
5. Ethernet tunneling uses the same software stack protecting software investment.

These advantages are compelling.

以上文字擷取自以下網址
http://www.wwpi.com/index.php?option=com_content&view=article&id=6540:ethernet-tunneling-through-pci-express-inter-processor-communication-low-latency-storage-io&catid=99:cover-story&Itemid=2701018

Saturday, March 06, 2010

spew - measures I/O performance and/or generates I/O load

無意間找到一個軟體
http://spew.berlios.de/
看起來應該是可以用來測試 I/O performance
有空再來試試看, 先把網址記起來免得忘記了

Wednesday, November 04, 2009

解決在 Busybox中 ctrl+c 沒有作用的問題

最近在公司弄 embedded linux 的東西,
boot 完 kernel 進到 busybox 後,
發現 ctrl+c 竟然沒有作用(沒有 ctrl+c真的很不方便),
我們可以看一下當ctrl+c 按下所經過的路徑

driver/char/n_tty.c :
n_tty_receive_buf() -->
n_tty_receive_break() -->
isig(SIGINT,tty,1)-->

kernel/signal.c :
kill_pgrp(tty->pgrp, SIGINT, 1)-->
kill_pgrp_info(SIGINT, __si_special(priv), pid)-->

最後會中斷每個 group number 為 tty->pgrp的task,
只要process具有相同的group id
不管 background / foreground 都會被 kill 掉,
ctrl+c是傳送到 /dev/ttyS0 driver 的,
在原本的 root file system 並沒有 ttyS0 這個 device node,
使用的是 /dev/console 這個 device
後來在 busybox 網站 FAQ 查到

http://www.busybox.net/FAQ.html#job_control

Job control will be turned off since your shell can not obtain a controlling terminal. This typically happens when you run your shell on /dev/console. The kernel will not provide a controlling terminal on the /dev/console device. Your should run your shell on a normal tty such as tty1 or ttyS0 and everything will work perfectly. If you REALLY want your shell to run on /dev/console, then you can hack your kernel (if you are into that sortof thing) by changing drivers/char/tty_io.c to change the lines where it sets "noctty = 1;" to instead set it to "0". I recommend you instead run your shell on a real console...

解決方法如下

第一種方法
1. 在 dev 目錄下建立 ttyS0 device node, 將 console link 到 ttyS0



#mknod --mode 666 ttyS0 c 4 64

#ln -s ttyS0 console

2. 修改 /etc/inittab



#vi /etc/inittab

console::sysinit:-/etc/rcS

ttyS0::respawn:-/bin/sh

第二種方法
Hack kernel
修改 drivers/char/tty_io.c
將 noctty = 1 的地方改為 noctty = 0
然後 /etc/inittab 就可以寫成
console::sysinit:-/etc/rcS
console::respawn:-/bin/sh

重新製作 root file system 測試果然 ok了

不過 busybox 的建議是在 real console 下 run shell ....

Monday, November 02, 2009

BUG_ON(size & (size - 1));

今天看 Linux kernel 中 kfifo (circular buffer) 的 code, 無意間看到一行code



/* size must be a power of 2 */

BUG_ON(size & (size - 1));

這真是個好用的判断技巧

Friday, October 30, 2009

該來學學 scripting language 了

一直以來都在和 C/C++ 打交道,

所以趁現在有空檔就來學學 Lua吧,

多年前就知道這東西, 只是一直沒去研究它,

Lua 讓我印象最深刻的是 World of Warcraft 的 Add-ons 了

現在還特別出了本書 Beginning Lua with World of Warcraft Add-ons

或許真該買一本來看看, 雖然我好幾年沒玩 WoW 了

Lua 現今已被大量使用在各個 Online Game上

主要是因為 Lua 在眾多 scripting languages 中擁有最高的執行效率

會想研究 Lua 主要是因為 Lua 原始碼是 100% C 開發的, 看起來就很順眼, 哈哈

Wednesday, September 23, 2009

EMC RAID6 Implementation

EMC Symmetrix DMX – RAID 6 Implementation

算是Even-Odd algorithm的一種變形

Tuesday, September 08, 2009

Old Linux

現今Linux kernel已經發展的如此龐大,
想要深入研究確實需要花費一番功夫,
所以從早期的Linux kernel下手或許會簡單的多,
http://www.oldlinux.org/index_cn.html
這是一個收集舊版Linux kernel的網站,
好好的挖掘一番會發現很多寶物的

Friday, September 04, 2009

Generic SCSI Target Subsystem For Linux

最近在研究 SCST 這玩意

顧名思義, 這個 Open Source Project 是為了 SCSI Target 而開的

至於 SCSI Target 有什麼用途, 簡單來說就是讓你的 Linux System

變成一個 SCSI Base 的 device, 還蠻有趣的

http://sourceforge.net/projects/scst/

Linux Kernel Map

研究Linux Kernel必看的網站

http://www.linuxdriver.co.il/kernel_map.shtml

Friday, September 07, 2007

Effect of drive count on RAID-5

關於Storage不錯的文章
http://storageadvisors.adaptec.com/2007/07/10/effect-of-drive-count-on-raid-5/