在HBase中,如果我們要對一個Table進行操作的話,需要透過「HTable」物件來先初始化一些工作,而本文的重點就在於它初始化了哪些工作?為何第一次初始化就要耗費約「500ms」的時間呢?(以筆者電腦單機測試為例)
下述是一個簡單的程式:
HTable table = new HTable("UserTable");
HTable.java (lines:80~83)
public HTable(final String tableName)
throws IOException {
this(new HBaseConfiguration(), Bytes.toBytes(tableName));
}
從「HTable」的原始碼來看,上述呼叫的Constructor會產生一個「HBaseConfiguration」Object,它是用來載入HBase的組態檔,如:「hbase-default.xml」和「hbase-site.xml」。
另外值得一提的是,產生「HBaseConfiguration」Object大約需要「200ms」的時間(以筆者電腦單機測試為例)。
HBaseConfiguration.java (lines:48-51)
private void addHbaseResources() {
addResource("hbase-default.xml");
addResource("hbase-site.xml");
}
接著再從呼叫另一個Constructor來看,它除了設定一些參數之外,最主要呼叫了兩個方法,它們分別為:「HConnectionManager.getConnection(conf)」和「connection.locateRegion(tableName, HConstants.EMPTY_START_ROW)」
HTable.java (lines:115~132)
public HTable(HBaseConfiguration conf, final byte [] tableName)
throws IOException {
this.tableName = tableName;
if (conf == null) {
this.scannerTimeout = 0;
this.connection = null;
return;
}
this.connection = HConnectionManager.getConnection(conf);
this.scannerTimeout =
conf.getInt("hbase.regionserver.lease.period", 60 * 1000);
this.configuration = conf;
this.connection.locateRegion(tableName, HConstants.EMPTY_START_ROW);
this.writeBufferSize = conf.getLong("hbase.client.write.buffer", 2097152);
this.autoFlush = true;
this.currentWriteBufferSize = 0;
this.scannerCaching = conf.getInt("hbase.client.scanner.caching", 1);
}
先從「HConnectionManager.getConnection(conf)」來看,它主要產生一個「TableServers」物件,從變數名稱的隱喻來看它所指的是建立一個連線,並將這個connection先cache起來。
HConnectionManager.java (lines:96-106)
public static HConnection getConnection(HBaseConfiguration conf) {
TableServers connection;
synchronized (HBASE_INSTANCES) {
connection = HBASE_INSTANCES.get(conf);
if (connection == null) {
connection = new TableServers(conf);
HBASE_INSTANCES.put(conf, connection);
}
}
return connection;
}
再從「TableServers」Constructor來看,它除了設定基本組態之外,還透過「Class.forName()」來載入一個Interface資訊,此Interface為「org.apache.hadoop.hbase.ipc.HRegionInterface」,而該介面就是HBase Server和Client之間的RPC溝通介面。
所以到目前為止兩者其實尚未有實質化的連線。
HConnectionManager.java (lines:256-280)
public TableServers(HBaseConfiguration conf) {
this.conf = conf;
String serverClassName =
conf.get(REGION_SERVER_CLASS, DEFAULT_REGION_SERVER_CLASS);
this.closed = false;
try {
this.serverInterfaceClass =
(Class<? extends HRegionInterface>) Class.forName(serverClassName);
} catch (ClassNotFoundException e) {
throw new UnsupportedOperationException(
"Unable to find region server interface " + serverClassName, e);
}
this.pause = conf.getLong("hbase.client.pause", 2 * 1000);
this.numRetries = conf.getInt("hbase.client.retries.number", 10);
this.maxRPCAttempts = conf.getInt("hbase.client.rpc.maxattempts", 1);
this.rpcTimeout = conf.getLong("hbase.regionserver.lease.period", 60000);
this.master = null;
this.masterChecked = false;
}
接下來再來探討「connection.locateRegion(tableName, HConstants.EMPTY_START_ROW)」所為何事?
這裡的「HConstants.EMPTY_START_ROW」常數值為「new byte [0];」,代表指定該Table第一個Row的Region位置。
HConnectionManager.java (lines:556-560)
public HRegionLocation locateRegion(final byte [] tableName,
final byte [] row)
throws IOException{
return locateRegion(tableName, row, true);
}
接著就是要取得該Table相關的Region Information,從下述程式可以得知,它會呼叫「locateRegionInMeta」方法來取得「UserTable」和「.META.」這二個Table的資訊,至於ROOT則會呼叫「locateRootRegion()」來取得,根據測試得知「locateRootRegion()」大約會耗費「250ms」,這是因為內部需要透過ZooKeeper Server來取得ROOT的相關資訊。
不過其實下述程式中的「locateRegionInMeta()」方法又會呼叫「locateRegion()」,所以它的整個執行流程順序會先從「-ROOT-」取得資訊,接著是「.META.」最後才是「UserTable」。
HConnectionManager.java (lines:568~)
private HRegionLocation locateRegion(final byte [] tableName,
final byte [] row, boolean useCache)
throws IOException{
if (tableName == null || tableName.length == 0) {
throw new IllegalArgumentException(
"table name cannot be null or zero length");
}
if (Bytes.equals(tableName, ROOT_TABLE_NAME)) {
synchronized (rootRegionLock) {
if (!useCache || rootRegionLocation == null) {
this.rootRegionLocation = locateRootRegion();
}
return this.rootRegionLocation;
}
} else if (Bytes.equals(tableName, META_TABLE_NAME)) {
synchronized (metaRegionLock) {
return locateRegionInMeta(ROOT_TABLE_NAME, tableName, row, useCache);
}
} else {
synchronized(userRegionLock){
return locateRegionInMeta(META_TABLE_NAME, tableName, row, useCache);
}
}
}

可以請問要如何測試時間??
2010-04-04 18:33:51
最簡單的方式可以透過System.currentTimeMillis() 或 System.nanoTime()
2010-04-04 22:44:57