blog.Ring.idv.tw

淺談HBase HTable API

淺談HBase HTable API

HBase中,如果我們要對一個Table進行操作的話,需要透過「HTable」物件來先初始化一些工作,而本文的重點就在於它初始化了哪些工作?為何第一次初始化就要耗費約「500ms」的時間呢?(以筆者電腦單機測試為例)

下述是一個簡單的程式:

HTable table = new HTable("UserTable");

HTable.java (lines:80~83)

public HTable(final String tableName)
  throws IOException {
    this(new HBaseConfiguration(), Bytes.toBytes(tableName));
  }

從「HTable」的原始碼來看,上述呼叫的Constructor會產生一個「HBaseConfiguration」Object,它是用來載入HBase的組態檔,如:「hbase-default.xml」和「hbase-site.xml」。

另外值得一提的是,產生「HBaseConfiguration」Object大約需要「200ms」的時間(以筆者電腦單機測試為例)。

HBaseConfiguration.java (lines:48-51)

  private void addHbaseResources() {
    addResource("hbase-default.xml");
    addResource("hbase-site.xml");
  }

接著再從呼叫另一個Constructor來看,它除了設定一些參數之外,最主要呼叫了兩個方法,它們分別為:「HConnectionManager.getConnection(conf)」和「connection.locateRegion(tableName, HConstants.EMPTY_START_ROW)

HTable.java (lines:115~132)

 public HTable(HBaseConfiguration conf, final byte [] tableName)
  throws IOException {
    this.tableName = tableName;
    if (conf == null) {
      this.scannerTimeout = 0;
      this.connection = null;
      return;
    }

    this.connection = HConnectionManager.getConnection(conf);    
    this.scannerTimeout =
      conf.getInt("hbase.regionserver.lease.period", 60 * 1000);
    this.configuration = conf;
    this.connection.locateRegion(tableName, HConstants.EMPTY_START_ROW);
    this.writeBufferSize = conf.getLong("hbase.client.write.buffer", 2097152);
    this.autoFlush = true;
    this.currentWriteBufferSize = 0;
    this.scannerCaching = conf.getInt("hbase.client.scanner.caching", 1);
  }

先從「HConnectionManager.getConnection(conf)」來看,它主要產生一個「TableServers」物件,從變數名稱的隱喻來看它所指的是建立一個連線,並將這個connection先cache起來。

HConnectionManager.java (lines:96-106)

public static HConnection getConnection(HBaseConfiguration conf) {
    TableServers connection;
    synchronized (HBASE_INSTANCES) {
      connection = HBASE_INSTANCES.get(conf);
      if (connection == null) {
        connection = new TableServers(conf);
        HBASE_INSTANCES.put(conf, connection);
      }
    }
    return connection;
  }

再從「TableServers」Constructor來看,它除了設定基本組態之外,還透過「Class.forName()」來載入一個Interface資訊,此Interface為「org.apache.hadoop.hbase.ipc.HRegionInterface」,而該介面就是HBase Server和Client之間的RPC溝通介面。

所以到目前為止兩者其實尚未有實質化的連線。

HConnectionManager.java (lines:256-280)

public TableServers(HBaseConfiguration conf) {
      this.conf = conf;

      String serverClassName =
        conf.get(REGION_SERVER_CLASS, DEFAULT_REGION_SERVER_CLASS);

      this.closed = false;
      
      try {
        this.serverInterfaceClass =
          (Class<? extends HRegionInterface>) Class.forName(serverClassName);
        
      } catch (ClassNotFoundException e) {
        throw new UnsupportedOperationException(
            "Unable to find region server interface " + serverClassName, e);
      }

      this.pause = conf.getLong("hbase.client.pause", 2 * 1000);
      this.numRetries = conf.getInt("hbase.client.retries.number", 10);
      this.maxRPCAttempts = conf.getInt("hbase.client.rpc.maxattempts", 1);
      this.rpcTimeout = conf.getLong("hbase.regionserver.lease.period", 60000);
      
      this.master = null;
      this.masterChecked = false;
    }

接下來再來探討「connection.locateRegion(tableName, HConstants.EMPTY_START_ROW)」所為何事?

這裡的「HConstants.EMPTY_START_ROW」常數值為「new byte [0];」,代表指定該Table第一個Row的Region位置

HConnectionManager.java (lines:556-560)

 public HRegionLocation locateRegion(final byte [] tableName,
        final byte [] row)
    throws IOException{
      return locateRegion(tableName, row, true);
    }

接著就是要取得該Table相關的Region Information,從下述程式可以得知,它會呼叫「locateRegionInMeta」方法來取得「UserTable」和「.META.」這二個Table的資訊,至於ROOT則會呼叫「locateRootRegion()」來取得,根據測試得知「locateRootRegion()」大約會耗費「250ms」,這是因為內部需要透過ZooKeeper Server來取得ROOT的相關資訊。

不過其實下述程式中的「locateRegionInMeta()」方法又會呼叫「locateRegion()」,所以它的整個執行流程順序會先從「-ROOT-」取得資訊,接著是「.META.」最後才是「UserTable」。

HConnectionManager.java (lines:568~)

private HRegionLocation locateRegion(final byte [] tableName,
      final byte [] row, boolean useCache)
    throws IOException{
      if (tableName == null || tableName.length == 0) {
        throw new IllegalArgumentException(
            "table name cannot be null or zero length");
      }
            
      if (Bytes.equals(tableName, ROOT_TABLE_NAME)) {
        synchronized (rootRegionLock) {
          if (!useCache || rootRegionLocation == null) {
            this.rootRegionLocation = locateRootRegion();
          }
          return this.rootRegionLocation;
        }        
      } else if (Bytes.equals(tableName, META_TABLE_NAME)) {
        synchronized (metaRegionLock) {
          return locateRegionInMeta(ROOT_TABLE_NAME, tableName, row, useCache);
        }
      } else {
        synchronized(userRegionLock){
          return locateRegionInMeta(META_TABLE_NAME, tableName, row, useCache);
        }
      }
    }

2010-01-08 19:09:09

2 comments on "淺談HBase HTable API"

  1. 1. Andy 說:

    可以請問要如何測試時間??

    2010-04-04 18:33:51

  2. 2. Shen 說:

    最簡單的方式可以透過System.currentTimeMillis() 或 System.nanoTime()

    2010-04-04 22:44:57

Leave a Comment

Copyright (C) Ching-Shen Chen. All rights reserved.

::: 搜尋 :::

::: 分類 :::

::: 最新文章 :::

::: 最新回應 :::

::: 訂閱 :::

Atom feed
Atom Comment