- 浏览: 108484 次
- 性别:
- 来自: 深圳
文章分类
最新评论
-
土豆蛋儿:
我想读取一个外部文件,以什么方式好了? 文件内容经常编辑
flume 自定义source -
土豆蛋儿:
大神,您好。
flume 自定义source
索引是标准的数据库技术,hive 0.7版本之后支持索引。hive索引采用的不是'one size fites all'的索引实现方式,而是提供插入式接口,并且提供一个具体的索引实现作为参考。Hive的Index接口如下:
复制代码
public interface HiveIndexHandler extends Configurable {
/**
* Determines whether this handler implements indexes by creating an index
* table.
*
* @return true if index creation implies creation of an index table in Hive;
* false if the index representation is not stored in a Hive table
*/
boolean usesIndexTable();
/**
* Requests that the handler validate an index definition and fill in
* additional information about its stored representation.
* @throw HiveException if the index definition is invalid with respect to
* either the base table or the supplied index table definition
*/
void analyzeIndexDefinition(
org.apache.hadoop.hive.metastore.api.Table baseTable,
org.apache.hadoop.hive.metastore.api.Index index,
org.apache.hadoop.hive.metastore.api.Table indexTable)
throws HiveException;
/**
* Requests that the handler generate a plan for building the index; the plan
* should read the base table and write out the index representation.
*/
List<Task<?>> generateIndexBuildTaskList(
org.apache.hadoop.hive.ql.metadata.Table baseTbl,
org.apache.hadoop.hive.metastore.api.Index index,
List<Partition> indexTblPartitions, List<Partition> baseTblPartitions,
org.apache.hadoop.hive.ql.metadata.Table indexTbl,
Set<ReadEntity> inputs, Set<WriteEntity> outputs)
throws HiveException;
}
复制代码
创建索引的时候,Hive首先调用接口的usesIndexTable方法,判断索引是否是已Hive Table的方式存储(默认的实现是存储在Hive中的)。然后调用analyzeIndexDefinition分析索引创建语句是否合法,如果没有问题将在元数据标IDXS中添加索引表,否则抛出异常。如果索引创建语句中使用with deferred rebuild,在执行alter index xxx_index on xxx rebuild时将调用generateIndexBuildTaskList获取Index的MapReduce,并执行为索引填充数据。
下面是借鉴别人设计的测试索引的例子:
首先生成测试数据:
复制代码
#! /bin/bash
#generating 350M raw data.
i=0
while [ $i -ne 1000000 ]
do
echo -e "$i\tA decade ago, many were predicting that Cooke, a New York City prodigy, would become a basketball shoe pitchman and would flaunt his wares and skills at All-Star weekends like the recent aerial show in Orlando, Fla. There was a time, however fleeting, when he was more heralded, or perhaps merely hyped, than any other high school player in America."
i=$(($i+1))
done
复制代码
创建测试表:
hive> create table table01( id int, name string)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t';
OK
Time taken: 0.371 seconds
hive> load data local inpath '/home/hadoop/hive_index_test/dual.txt' overwrite into table table01;
Copying data from file:/home/hadoop/hive_index_test/dual.txt
Copying file: file:/home/hadoop/hive_index_test/dual.txt
Loading data to table default.table01
Deleted hdfs://localhost:9000/user/hive/warehouse/table01
OK
Time taken: 13.492 seconds
hive> create table table02 as select id,name as text from table01;
Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201301221042_0006, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201301221042_0006
Kill Command = /usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201301221042_0006
2013-01-22 11:21:19,639 Stage-1 map = 0%, reduce = 0%
2013-01-22 11:21:25,678 Stage-1 map = 33%, reduce = 0%
2013-01-22 11:21:37,754 Stage-1 map = 67%, reduce = 0%
2013-01-22 11:21:43,788 Stage-1 map = 100%, reduce = 0%
2013-01-22 11:21:46,828 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201301221042_0006
Ended Job = -663277165, job is filtered out (removed at runtime).
Moving data to: hdfs://localhost:9000/tmp/hive-hadoop/hive_2013-01-22_11-21-13_661_2061036951988537032/-ext-10001
Moving data to: hdfs://localhost:9000/user/hive/warehouse/table02
1000000 Rows loaded to hdfs://localhost:9000/tmp/hive-hadoop/hive_2013-01-22_11-21-13_661_2061036951988537032/-ext-10000
OK
Time taken: 33.904 seconds
hive> dfs -ls /user/hive/warehouse/table02;
Found 6 items
-rw-r--r-- 3 hadoop supergroup 67109134 2013-01-22 11:21 /user/hive/warehouse/table02/000000_0
-rw-r--r-- 3 hadoop supergroup 67108860 2013-01-22 11:21 /user/hive/warehouse/table02/000001_0
-rw-r--r-- 3 hadoop supergroup 67108860 2013-01-22 11:21 /user/hive/warehouse/table02/000002_0
-rw-r--r-- 3 hadoop supergroup 67108860 2013-01-22 11:21 /user/hive/warehouse/table02/000003_0
-rw-r--r-- 3 hadoop supergroup 67108860 2013-01-22 11:21 /user/hive/warehouse/table02/000004_0
-rw-r--r-- 3 hadoop supergroup 21344316 2013-01-22 11:21 /user/hive/warehouse/table02/000005_0
hive> select * from table02 where id=500000;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201301221042_0007, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201301221042_0007
Kill Command = /usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201301221042_0007
2013-01-22 11:22:26,865 Stage-1 map = 0%, reduce = 0%
2013-01-22 11:22:28,884 Stage-1 map = 33%, reduce = 0%
2013-01-22 11:22:31,905 Stage-1 map = 67%, reduce = 0%
2013-01-22 11:22:34,921 Stage-1 map = 100%, reduce = 0%
2013-01-22 11:22:37,943 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201301221042_0007
OK
500000 A decade ago, many were predicting that Cooke, a New York City prodigy, would become a basketball shoe pitchman and would flaunt his wares and skills at All-Star weekends like the recent aerial show in Orlando, Fla. There was a time, however fleeting, when he was more heralded, or perhaps merely hyped, than any other high school player in America.
Time taken: 18.551 seconds
创建索引:
hive> create index table02_index on table table02(id)
> as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
> with deferred rebuild;
OK
Time taken: 0.503 seconds
填充索引数据:
hive> alter index table02_index on table02 rebuild;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_201301221042_0008, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201301221042_0008
Kill Command = /usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201301221042_0008
2013-01-22 11:23:56,870 Stage-1 map = 0%, reduce = 0%
2013-01-22 11:24:02,902 Stage-1 map = 33%, reduce = 0%
2013-01-22 11:24:08,929 Stage-1 map = 67%, reduce = 0%
2013-01-22 11:24:11,944 Stage-1 map = 67%, reduce = 11%
2013-01-22 11:24:14,966 Stage-1 map = 100%, reduce = 11%
2013-01-22 11:24:21,007 Stage-1 map = 100%, reduce = 22%
2013-01-22 11:24:27,043 Stage-1 map = 100%, reduce = 67%
2013-01-22 11:24:30,056 Stage-1 map = 100%, reduce = 86%
2013-01-22 11:24:33,089 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201301221042_0008
Loading data to table default.default__table02_table02_index__
Deleted hdfs://localhost:9000/user/hive/warehouse/default__table02_table02_index__
Table default.default__table02_table02_index__ stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 74701985]
OK
Time taken: 61.203 seconds
hive> dfs -ls /user/hive/warehouse/default*;
Found 1 items
-rw-r--r-- 3 hadoop supergroup 74701985 2013-01-22 11:24 /user/hive/warehouse/default__table02_table02_index__/000000_0
可以看到索引内存储的数据:
hive> select * from default__table02_table02_index__ limit 3;
OK
0 hdfs://localhost:9000/user/hive/warehouse/table02/000000_0 [0]
1 hdfs://localhost:9000/user/hive/warehouse/table02/000000_0 [352]
2 hdfs://localhost:9000/user/hive/warehouse/table02/000000_0 [704]
Time taken: 0.156 seconds
自己做一个索引文件测试:
hive> SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
hive> Insert overwrite directory "/tmp/table02_index_data" select `_bucketname`, `_offsets` from default__table02_table02_index__ where id =500000;
Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201301221042_0009, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201301221042_0009
Kill Command = /usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201301221042_0009
2013-01-22 11:30:23,859 Stage-1 map = 0%, reduce = 0%
2013-01-22 11:30:26,872 Stage-1 map = 100%, reduce = 0%
2013-01-22 11:30:29,904 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201301221042_0009
Ended Job = -489547412, job is filtered out (removed at runtime).
Launching Job 2 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201301221042_0010, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201301221042_0010
Kill Command = /usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201301221042_0010
2013-01-22 11:30:35,861 Stage-2 map = 0%, reduce = 0%
2013-01-22 11:30:38,882 Stage-2 map = 100%, reduce = 0%
2013-01-22 11:30:41,907 Stage-2 map = 100%, reduce = 100%
Ended Job = job_201301221042_0010
Moving data to: /tmp/table02_index_data
1 Rows loaded to /tmp/table02_index_data
OK
Time taken: 25.173 seconds
hive> select * from table02 where id =500000;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201301221042_0011, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201301221042_0011
Kill Command = /usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201301221042_0011
2013-01-22 11:31:06,055 Stage-1 map = 0%, reduce = 0%
2013-01-22 11:31:09,066 Stage-1 map = 33%, reduce = 0%
2013-01-22 11:31:12,083 Stage-1 map = 67%, reduce = 0%
2013-01-22 11:31:15,102 Stage-1 map = 100%, reduce = 0%
2013-01-22 11:31:18,127 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201301221042_0011
OK
500000 A decade ago, many were predicting that Cooke, a New York City prodigy, would become a basketball shoe pitchman and would flaunt his wares and skills at All-Star weekends like the recent aerial show in Orlando, Fla. There was a time, however fleeting, when he was more heralded, or perhaps merely hyped, than any other high school player in America.
Time taken: 17.533 seconds
hive> Set hive.index.compact.file=/tmp/table02_index_data;
hive> Set hive.optimize.index.filter=false;
hive> Set hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat;
hive> select * from table02 where id =500000;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201301221042_0012, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201301221042_0012
Kill Command = /usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201301221042_0012
2013-01-22 11:32:14,929 Stage-1 map = 0%, reduce = 0%
2013-01-22 11:32:17,942 Stage-1 map = 100%, reduce = 0%
2013-01-22 11:32:20,968 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201301221042_0012
OK
500000 A decade ago, many were predicting that Cooke, a New York City prodigy, would become a basketball shoe pitchman and would flaunt his wares and skills at All-Star weekends like the recent aerial show in Orlando, Fla. There was a time, however fleeting, when he was more heralded, or perhaps merely hyped, than any other high school player in America.
Time taken: 11.222 seconds
总结:索引表的基本包含几列:1. 源表的索引列;2. _bucketname hdfs中文件地址 3. 索引列在hdfs文件中的偏移量。原理是通过记录索引列在HDFS中的偏移量,精准获取数据,避免全表扫描
复制代码
public interface HiveIndexHandler extends Configurable {
/**
* Determines whether this handler implements indexes by creating an index
* table.
*
* @return true if index creation implies creation of an index table in Hive;
* false if the index representation is not stored in a Hive table
*/
boolean usesIndexTable();
/**
* Requests that the handler validate an index definition and fill in
* additional information about its stored representation.
* @throw HiveException if the index definition is invalid with respect to
* either the base table or the supplied index table definition
*/
void analyzeIndexDefinition(
org.apache.hadoop.hive.metastore.api.Table baseTable,
org.apache.hadoop.hive.metastore.api.Index index,
org.apache.hadoop.hive.metastore.api.Table indexTable)
throws HiveException;
/**
* Requests that the handler generate a plan for building the index; the plan
* should read the base table and write out the index representation.
*/
List<Task<?>> generateIndexBuildTaskList(
org.apache.hadoop.hive.ql.metadata.Table baseTbl,
org.apache.hadoop.hive.metastore.api.Index index,
List<Partition> indexTblPartitions, List<Partition> baseTblPartitions,
org.apache.hadoop.hive.ql.metadata.Table indexTbl,
Set<ReadEntity> inputs, Set<WriteEntity> outputs)
throws HiveException;
}
复制代码
创建索引的时候,Hive首先调用接口的usesIndexTable方法,判断索引是否是已Hive Table的方式存储(默认的实现是存储在Hive中的)。然后调用analyzeIndexDefinition分析索引创建语句是否合法,如果没有问题将在元数据标IDXS中添加索引表,否则抛出异常。如果索引创建语句中使用with deferred rebuild,在执行alter index xxx_index on xxx rebuild时将调用generateIndexBuildTaskList获取Index的MapReduce,并执行为索引填充数据。
下面是借鉴别人设计的测试索引的例子:
首先生成测试数据:
复制代码
#! /bin/bash
#generating 350M raw data.
i=0
while [ $i -ne 1000000 ]
do
echo -e "$i\tA decade ago, many were predicting that Cooke, a New York City prodigy, would become a basketball shoe pitchman and would flaunt his wares and skills at All-Star weekends like the recent aerial show in Orlando, Fla. There was a time, however fleeting, when he was more heralded, or perhaps merely hyped, than any other high school player in America."
i=$(($i+1))
done
复制代码
创建测试表:
hive> create table table01( id int, name string)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t';
OK
Time taken: 0.371 seconds
hive> load data local inpath '/home/hadoop/hive_index_test/dual.txt' overwrite into table table01;
Copying data from file:/home/hadoop/hive_index_test/dual.txt
Copying file: file:/home/hadoop/hive_index_test/dual.txt
Loading data to table default.table01
Deleted hdfs://localhost:9000/user/hive/warehouse/table01
OK
Time taken: 13.492 seconds
hive> create table table02 as select id,name as text from table01;
Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201301221042_0006, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201301221042_0006
Kill Command = /usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201301221042_0006
2013-01-22 11:21:19,639 Stage-1 map = 0%, reduce = 0%
2013-01-22 11:21:25,678 Stage-1 map = 33%, reduce = 0%
2013-01-22 11:21:37,754 Stage-1 map = 67%, reduce = 0%
2013-01-22 11:21:43,788 Stage-1 map = 100%, reduce = 0%
2013-01-22 11:21:46,828 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201301221042_0006
Ended Job = -663277165, job is filtered out (removed at runtime).
Moving data to: hdfs://localhost:9000/tmp/hive-hadoop/hive_2013-01-22_11-21-13_661_2061036951988537032/-ext-10001
Moving data to: hdfs://localhost:9000/user/hive/warehouse/table02
1000000 Rows loaded to hdfs://localhost:9000/tmp/hive-hadoop/hive_2013-01-22_11-21-13_661_2061036951988537032/-ext-10000
OK
Time taken: 33.904 seconds
hive> dfs -ls /user/hive/warehouse/table02;
Found 6 items
-rw-r--r-- 3 hadoop supergroup 67109134 2013-01-22 11:21 /user/hive/warehouse/table02/000000_0
-rw-r--r-- 3 hadoop supergroup 67108860 2013-01-22 11:21 /user/hive/warehouse/table02/000001_0
-rw-r--r-- 3 hadoop supergroup 67108860 2013-01-22 11:21 /user/hive/warehouse/table02/000002_0
-rw-r--r-- 3 hadoop supergroup 67108860 2013-01-22 11:21 /user/hive/warehouse/table02/000003_0
-rw-r--r-- 3 hadoop supergroup 67108860 2013-01-22 11:21 /user/hive/warehouse/table02/000004_0
-rw-r--r-- 3 hadoop supergroup 21344316 2013-01-22 11:21 /user/hive/warehouse/table02/000005_0
hive> select * from table02 where id=500000;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201301221042_0007, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201301221042_0007
Kill Command = /usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201301221042_0007
2013-01-22 11:22:26,865 Stage-1 map = 0%, reduce = 0%
2013-01-22 11:22:28,884 Stage-1 map = 33%, reduce = 0%
2013-01-22 11:22:31,905 Stage-1 map = 67%, reduce = 0%
2013-01-22 11:22:34,921 Stage-1 map = 100%, reduce = 0%
2013-01-22 11:22:37,943 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201301221042_0007
OK
500000 A decade ago, many were predicting that Cooke, a New York City prodigy, would become a basketball shoe pitchman and would flaunt his wares and skills at All-Star weekends like the recent aerial show in Orlando, Fla. There was a time, however fleeting, when he was more heralded, or perhaps merely hyped, than any other high school player in America.
Time taken: 18.551 seconds
创建索引:
hive> create index table02_index on table table02(id)
> as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
> with deferred rebuild;
OK
Time taken: 0.503 seconds
填充索引数据:
hive> alter index table02_index on table02 rebuild;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_201301221042_0008, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201301221042_0008
Kill Command = /usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201301221042_0008
2013-01-22 11:23:56,870 Stage-1 map = 0%, reduce = 0%
2013-01-22 11:24:02,902 Stage-1 map = 33%, reduce = 0%
2013-01-22 11:24:08,929 Stage-1 map = 67%, reduce = 0%
2013-01-22 11:24:11,944 Stage-1 map = 67%, reduce = 11%
2013-01-22 11:24:14,966 Stage-1 map = 100%, reduce = 11%
2013-01-22 11:24:21,007 Stage-1 map = 100%, reduce = 22%
2013-01-22 11:24:27,043 Stage-1 map = 100%, reduce = 67%
2013-01-22 11:24:30,056 Stage-1 map = 100%, reduce = 86%
2013-01-22 11:24:33,089 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201301221042_0008
Loading data to table default.default__table02_table02_index__
Deleted hdfs://localhost:9000/user/hive/warehouse/default__table02_table02_index__
Table default.default__table02_table02_index__ stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 74701985]
OK
Time taken: 61.203 seconds
hive> dfs -ls /user/hive/warehouse/default*;
Found 1 items
-rw-r--r-- 3 hadoop supergroup 74701985 2013-01-22 11:24 /user/hive/warehouse/default__table02_table02_index__/000000_0
可以看到索引内存储的数据:
hive> select * from default__table02_table02_index__ limit 3;
OK
0 hdfs://localhost:9000/user/hive/warehouse/table02/000000_0 [0]
1 hdfs://localhost:9000/user/hive/warehouse/table02/000000_0 [352]
2 hdfs://localhost:9000/user/hive/warehouse/table02/000000_0 [704]
Time taken: 0.156 seconds
自己做一个索引文件测试:
hive> SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
hive> Insert overwrite directory "/tmp/table02_index_data" select `_bucketname`, `_offsets` from default__table02_table02_index__ where id =500000;
Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201301221042_0009, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201301221042_0009
Kill Command = /usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201301221042_0009
2013-01-22 11:30:23,859 Stage-1 map = 0%, reduce = 0%
2013-01-22 11:30:26,872 Stage-1 map = 100%, reduce = 0%
2013-01-22 11:30:29,904 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201301221042_0009
Ended Job = -489547412, job is filtered out (removed at runtime).
Launching Job 2 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201301221042_0010, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201301221042_0010
Kill Command = /usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201301221042_0010
2013-01-22 11:30:35,861 Stage-2 map = 0%, reduce = 0%
2013-01-22 11:30:38,882 Stage-2 map = 100%, reduce = 0%
2013-01-22 11:30:41,907 Stage-2 map = 100%, reduce = 100%
Ended Job = job_201301221042_0010
Moving data to: /tmp/table02_index_data
1 Rows loaded to /tmp/table02_index_data
OK
Time taken: 25.173 seconds
hive> select * from table02 where id =500000;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201301221042_0011, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201301221042_0011
Kill Command = /usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201301221042_0011
2013-01-22 11:31:06,055 Stage-1 map = 0%, reduce = 0%
2013-01-22 11:31:09,066 Stage-1 map = 33%, reduce = 0%
2013-01-22 11:31:12,083 Stage-1 map = 67%, reduce = 0%
2013-01-22 11:31:15,102 Stage-1 map = 100%, reduce = 0%
2013-01-22 11:31:18,127 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201301221042_0011
OK
500000 A decade ago, many were predicting that Cooke, a New York City prodigy, would become a basketball shoe pitchman and would flaunt his wares and skills at All-Star weekends like the recent aerial show in Orlando, Fla. There was a time, however fleeting, when he was more heralded, or perhaps merely hyped, than any other high school player in America.
Time taken: 17.533 seconds
hive> Set hive.index.compact.file=/tmp/table02_index_data;
hive> Set hive.optimize.index.filter=false;
hive> Set hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat;
hive> select * from table02 where id =500000;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201301221042_0012, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201301221042_0012
Kill Command = /usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:9001 -kill job_201301221042_0012
2013-01-22 11:32:14,929 Stage-1 map = 0%, reduce = 0%
2013-01-22 11:32:17,942 Stage-1 map = 100%, reduce = 0%
2013-01-22 11:32:20,968 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201301221042_0012
OK
500000 A decade ago, many were predicting that Cooke, a New York City prodigy, would become a basketball shoe pitchman and would flaunt his wares and skills at All-Star weekends like the recent aerial show in Orlando, Fla. There was a time, however fleeting, when he was more heralded, or perhaps merely hyped, than any other high school player in America.
Time taken: 11.222 seconds
总结:索引表的基本包含几列:1. 源表的索引列;2. _bucketname hdfs中文件地址 3. 索引列在hdfs文件中的偏移量。原理是通过记录索引列在HDFS中的偏移量,精准获取数据,避免全表扫描
发表评论
-
hive + hbase
2015-01-04 10:42 731环境配置: hadoop-2.0.0-cdh4.3.0 (4 ... -
hive 数据倾斜
2014-08-27 09:03 642链接:http://www.alidata.org/archi ... -
hive 分通总结
2014-08-27 08:42 540总结分析: 1. 定义了桶,但要生成桶的数据,只能是由其他表 ... -
explain hive index
2014-08-24 16:44 1115设置索引: 使用聚合索引优化groupby操作 hive> ... -
Hive 中内部表与外部表的区别与创建方法
2014-08-15 17:11 721分类: Hive 2013-12-07 11:56 ... -
hive map和reduce的控制
2014-08-15 16:14 594一、 控制hive任务中的map数: 1. 通 ... -
hive 压缩策略
2014-08-15 15:16 1725Hive使用的是Hadoop的文件 ... -
hive 在mysql中创建备用数据库
2014-08-15 09:21 836修改hive-site.xml <property> ... -
HIVE 窗口及分析函数
2014-08-11 16:21 1148HIVE 窗口及分析函数 使 ... -
hive 内置函数
2014-08-11 09:06 30231.sort_array(): sort_array(arra ... -
hive lateral view
2014-08-09 14:59 1986通过Lateral view可以方便的将UDTF得到的行转列的 ... -
hive数据的导出
2014-07-28 21:53 415在本博客的《Hive几种数据导入方式》文章中,谈到了Hive中 ... -
hive udaf
2014-07-25 16:11 712package com.lwz.udaf; import o ... -
hive自定义InputFormat
2014-07-25 09:13 816自定义分隔符 package com.lwz.inputf; ... -
HiveServer2连接ZooKeeper出现Too many connections问题的解决
2014-07-24 08:49 1686HiveServer2连接ZooKeeper出现Too man ... -
hive 常用命令
2014-07-17 22:22 6361.hive通过外部设置参数传入脚本中: hiv ... -
CouderaHadoop中hive的Hook扩展
2014-07-16 21:18 3252最近在做关于CDH4.3.0的hive封装,其中遇到了很多问题 ... -
利用SemanticAnalyzerHook回过滤不加分区条件的Hive查询
2014-07-16 16:43 1419我们Hadoop集群中将近百分之80的作业是通过Hive来提交 ... -
hive 的常用命令
2014-07-16 10:07 0设置、查看hive当前的角色: set sys ... -
hive 授权
2014-07-15 10:51 894Hive授权(Security配置) 博客分类: Hive分 ...
相关推荐
部分普通sql查询在hive中的实现方式详细说明;
Hive原理与实现 详细介绍了hive的原理
hive udaf 实现按位取与或 hive udaf 实现按位取与或 hive udaf 实现按位取与或
0.1. Hive入门 0.2. 安装与部署 0.3. 基本语法 0.4. 读模式与写模式 0.5. 日志调试 0.6. Hive体系结构 0.7. metastore ........ 0.11. 数据的管理 0.12. 数据的查询 0.13. 表连接 ........ 0.18. Hive的文件格式 ...
使用javaJDBC连接hive数据,实现简单的操作!
hive数仓、hive SQL 、 hive自定义函数 、hive参数深入浅出
hive hive hive hive hive hive hive hive hive hive hive hive
pdf文件讲述hive实现原理,图文并茂。
深入浅出Hive企业级架构优化、Hive Sql优化、压缩和分布式缓存
Java私塾:Hive Shell 基本操作——深入浅出学Hive
pyflink将mysql数据直接插入hive,由此可以延伸出pyflink实现hive关联mysql
大数据湖中Hive是一个非常重要的工具,它是用来做数仓、BI的不二之选,虽然Hive其实就是写Sql但是,对于其原理和优化我们在实际工作中必须要了解的,有利于提高集群的执行效率,也是程序员进阶的一项指标
hive-jdbc
spring-boot集成mybatis+druid实现 hive/mysql多数据源切换,用mysql数据库作为用户验证库以及用户信息库,hive作为数据可视化源库。
使用hive3.1.2和spark3.0.0配置hive on spark的时候,发现官方下载的hive3.1.2和spark3.0.0不兼容,hive3.1.2对应的版本是spark2.3.0,而spark3.0.0对应的hadoop版本是hadoop2.6或hadoop2.7。 所以,如果想要使用高...
hive实现并发机制:hive里,同一sql里,会涉及到n个job,默认情况下,每个job是顺序执行的。 如果每个job没有前后依赖关系,可以并发执行的话,可以通过设置该参数 set hive.exec.parallel=true,实现job并发执行...
利用Hive进行复杂用户行为大数据分析及优化案例(全套视频+课件+代码+讲义+工具软件),具体内容包括: 01_自动批量加载数据到hive 02_Hive表批量加载数据的脚本实现(一) 03_Hive表批量加载数据的脚本实现(二) ...
大数据hive实现原理.zip
1 Hive 概念与连接使用: 2 2 Hive支持的数据类型: 2 2.1原子数据类型: 2 2.2复杂数据类型: 2 2.3 Hive类型转换: 3 3 Hive创建/删除数据库 3 3.1创建数据库: 3 3.2 删除数据库: 3 4 Hive 表相关语句 3 4.1 Hive ...