在HBase中,如果我們要對一個Table進行操作的話,需要透過「HTable」物件來先初始化一些工作,而本文的重點就在於它初始化了哪些工作?為何第一次初始化就要耗費約「500ms」的時間呢?(以筆者電腦單機測試為例)
下述是一個簡單的程式:
HTable table = new HTable("UserTable");
HTable.java (lines:80~83)
public HTable(final String tableName) throws IOException { this(new HBaseConfiguration(), Bytes.toBytes(tableName)); }
從「HTable」的原始碼來看,上述呼叫的Constructor會產生一個「HBaseConfiguration」Object,它是用來載入HBase的組態檔,如:「hbase-default.xml」和「hbase-site.xml」。
另外值得一提的是,產生「HBaseConfiguration」Object大約需要「200ms」的時間(以筆者電腦單機測試為例)。
HBaseConfiguration.java (lines:48-51)
private void addHbaseResources() { addResource("hbase-default.xml"); addResource("hbase-site.xml"); }
接著再從呼叫另一個Constructor來看,它除了設定一些參數之外,最主要呼叫了兩個方法,它們分別為:「HConnectionManager.getConnection(conf)」和「connection.locateRegion(tableName, HConstants.EMPTY_START_ROW)」
HTable.java (lines:115~132)
public HTable(HBaseConfiguration conf, final byte [] tableName) throws IOException { this.tableName = tableName; if (conf == null) { this.scannerTimeout = 0; this.connection = null; return; } this.connection = HConnectionManager.getConnection(conf); this.scannerTimeout = conf.getInt("hbase.regionserver.lease.period", 60 * 1000); this.configuration = conf; this.connection.locateRegion(tableName, HConstants.EMPTY_START_ROW); this.writeBufferSize = conf.getLong("hbase.client.write.buffer", 2097152); this.autoFlush = true; this.currentWriteBufferSize = 0; this.scannerCaching = conf.getInt("hbase.client.scanner.caching", 1); }
先從「HConnectionManager.getConnection(conf)」來看,它主要產生一個「TableServers」物件,從變數名稱的隱喻來看它所指的是建立一個連線,並將這個connection先cache起來。
HConnectionManager.java (lines:96-106)
public static HConnection getConnection(HBaseConfiguration conf) { TableServers connection; synchronized (HBASE_INSTANCES) { connection = HBASE_INSTANCES.get(conf); if (connection == null) { connection = new TableServers(conf); HBASE_INSTANCES.put(conf, connection); } } return connection; }
再從「TableServers」Constructor來看,它除了設定基本組態之外,還透過「Class.forName()」來載入一個Interface資訊,此Interface為「org.apache.hadoop.hbase.ipc.HRegionInterface」,而該介面就是HBase Server和Client之間的RPC溝通介面。
所以到目前為止兩者其實尚未有實質化的連線。
HConnectionManager.java (lines:256-280)
public TableServers(HBaseConfiguration conf) { this.conf = conf; String serverClassName = conf.get(REGION_SERVER_CLASS, DEFAULT_REGION_SERVER_CLASS); this.closed = false; try { this.serverInterfaceClass = (Class<? extends HRegionInterface>) Class.forName(serverClassName); } catch (ClassNotFoundException e) { throw new UnsupportedOperationException( "Unable to find region server interface " + serverClassName, e); } this.pause = conf.getLong("hbase.client.pause", 2 * 1000); this.numRetries = conf.getInt("hbase.client.retries.number", 10); this.maxRPCAttempts = conf.getInt("hbase.client.rpc.maxattempts", 1); this.rpcTimeout = conf.getLong("hbase.regionserver.lease.period", 60000); this.master = null; this.masterChecked = false; }
接下來再來探討「connection.locateRegion(tableName, HConstants.EMPTY_START_ROW)」所為何事?
這裡的「HConstants.EMPTY_START_ROW」常數值為「new byte [0];」,代表指定該Table第一個Row的Region位置。
HConnectionManager.java (lines:556-560)
public HRegionLocation locateRegion(final byte [] tableName, final byte [] row) throws IOException{ return locateRegion(tableName, row, true); }
接著就是要取得該Table相關的Region Information,從下述程式可以得知,它會呼叫「locateRegionInMeta」方法來取得「UserTable」和「.META.」這二個Table的資訊,至於ROOT則會呼叫「locateRootRegion()」來取得,根據測試得知「locateRootRegion()」大約會耗費「250ms」,這是因為內部需要透過ZooKeeper Server來取得ROOT的相關資訊。
不過其實下述程式中的「locateRegionInMeta()」方法又會呼叫「locateRegion()」,所以它的整個執行流程順序會先從「-ROOT-」取得資訊,接著是「.META.」最後才是「UserTable」。
HConnectionManager.java (lines:568~)
private HRegionLocation locateRegion(final byte [] tableName, final byte [] row, boolean useCache) throws IOException{ if (tableName == null || tableName.length == 0) { throw new IllegalArgumentException( "table name cannot be null or zero length"); } if (Bytes.equals(tableName, ROOT_TABLE_NAME)) { synchronized (rootRegionLock) { if (!useCache || rootRegionLocation == null) { this.rootRegionLocation = locateRootRegion(); } return this.rootRegionLocation; } } else if (Bytes.equals(tableName, META_TABLE_NAME)) { synchronized (metaRegionLock) { return locateRegionInMeta(ROOT_TABLE_NAME, tableName, row, useCache); } } else { synchronized(userRegionLock){ return locateRegionInMeta(META_TABLE_NAME, tableName, row, useCache); } } }
可以請問要如何測試時間??
2010-04-04 18:33:51
最簡單的方式可以透過System.currentTimeMillis() 或 System.nanoTime()
2010-04-04 22:44:57