[Hadoop大数据]——Hive数据的导入导出 - 好文

Hive作为大数据环境下的数据仓库工具，支持基于hadoop以sql的方式执行mapreduce的任务，非常适合对大量的数据进行全量的查询分析。

本文主要讲述下hive载cli中如何导入导出数据：

导入数据

第一种方式，直接从本地文件系统导入数据

我的本机有一个test1.txt文件，这个文件中有三列数据，并且每列都是以'\t'为分隔
[root@localhost conf]# cat /usr/tmp/test1.txt 1 a1 b1 2 a2 b2 3 a3 b3 4 a4 b
创建数据表：
>create table test1(a string,b string,c string) >row format delimited >fields
terminated by '\t' >stored as textfile;
导入数据：
load data local inpath '/usr/tmp/test1.txt' overwrite into table test1;
其中local inpath，表明路径为本机路径
overwrite表示加载的数据会覆盖原来的内容

在这里我还是要推荐下我自己建的大数据学习交流qq裙：522189307 ，裙里都是学大数据开发的，如果你正在学习大数据
，小编欢迎你加入，大家都是软件开发党，不定期分享干货（只有大数据开发相关的），包括我自己整理的一份最新的大数据进阶资料和高级开发教程，欢迎进阶中和进想深入大数据的小伙伴。上述资料加群可以领取

第二种，从hdfs文件中导入数据

首先上传数据到hdfs中
hadoop fs -put /usr/tmp/test1.txt /test1.txt
在hive中查看test1.txt文件
hive> dfs -cat /test1.txt; 1 a1 b1 2 a2 b2 3 a3 b3 4 a4 b4
创建数据表，与前面一样。导入数据的命令有些差异:
load data inpath '/test1.txt' overwrite into table test2;
第三种，基于查询insert into导入

首先定义数据表，这里直接创建带有分区的表
hive> create table test3(a string,b string,c string) partitioned by (d string)
row format delimited fields terminated by '\t' stored as textfile; OK Time
taken: 0.109 seconds hive> describe test3; OK a string b string c string d
string # Partition Information # col_name data_type comment d string Time
taken: 0.071 seconds, Fetched: 9 row(s)
通过查询直接导入数据到固定的分区表中：
hive> insert into table test3 partition(d='aaaaaa') select * from test2;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the
future versions. Consider using a different execution engine (i.e. spark, tez)
or using Hive 1.X releases. Query ID =
root_20160823212718_9cfdbea4-42fa-4267-ac46-9ac2c357f944 Total jobs = 3
Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no
reduce operator Job running in-process (local Hadoop) 2016-08-23 21:27:21,621
Stage-1 map = 100%, reduce = 0% Ended Job = job_local1550375778_0001 Stage-4 is
selected by condition resolver. Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver. Moving data to directory
hdfs://localhost:8020/user/hive/warehouse/test.db/test3/d=aaaaaa/.hive-staging_hive_2016-08-23_21-27-18_739_4058721562930266873-1/-ext-10000
Loading data to table test.test3 partition (d=aaaaaa) MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 248 HDFS Write: 175 SUCCESS Total MapReduce CPU Time
Spent: 0 msec OK Time taken: 3.647 seconds
通过查询观察结果
hive> select * from test3; OK 1 a1 b1 aaaaaa 2 a2 b2 aaaaaa 3 a3 b3 aaaaaa 4
a4 b4 aaaaaa Time taken: 0.264 seconds, Fetched: 4 row(s)
PS:也可以直接通过动态分区插入数据:
insert into table test4 partition(c) select * from test2;
分区会以文件夹命名的方式存储：
hive> dfs -ls /user/hive/warehouse/test.db/test4/; Found 4 items drwxr-xr-x -
root supergroup 0 2016-08-23 21:33 /user/hive/warehouse/test.db/test4/c=b1
drwxr-xr-x - root supergroup 0 2016-08-23 21:33
/user/hive/warehouse/test.db/test4/c=b2 drwxr-xr-x - root supergroup 0
2016-08-23 21:33 /user/hive/warehouse/test.db/test4/c=b3 drwxr-xr-x - root
supergroup 0 2016-08-23 21:33 /user/hive/warehouse/test.db/test4/c=b4
第四种，直接基于查询创建数据表

直接通过查询创建数据表:
hive> create table test5 as select * from test4; WARNING: Hive-on-MR is
deprecated in Hive 2 and may not be available in the future versions. Consider
using a different execution engine (i.e. spark, tez) or using Hive 1.X
releases. Query ID = root_20160823213944_03672168-bc56-43d7-aefb-cac03a6558bf
Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0
since there's no reduce operator Job running in-process (local Hadoop)
2016-08-23 21:39:46,030 Stage-1 map = 100%, reduce = 0% Ended Job =
job_local855333165_0003 Stage-4 is selected by condition resolver. Stage-3 is
filtered out by condition resolver. Stage-5 is filtered out by condition
resolver. Moving data to directory
hdfs://localhost:8020/user/hive/warehouse/test.db/.hive-staging_hive_2016-08-23_21-39-44_259_5484795730585321098-1/-ext-10002
Moving data to directory
hdfs://localhost:8020/user/hive/warehouse/test.db/test5 MapReduce Jobs
Launched: Stage-Stage-1: HDFS Read: 600 HDFS Write: 466 SUCCESS Total MapReduce
CPU Time Spent: 0 msec OK Time taken: 2.184 seconds
查看结果
hive> select * from test5; OK 1 a1 b1 2 a2 b2 3 a3 b3 4 a4 b4 Time taken:
0.147 seconds, Fetched: 4 row(s)
导出数据

导出到本地文件

执行导出本地文件命令:
hive> insert overwrite local directory '/usr/tmp/export' select * from test1;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the
future versions. Consider using a different execution engine (i.e. spark, tez)
or using Hive 1.X releases. Query ID =
root_20160823221655_05b05863-6273-4bdd-aad2-e80d4982425d Total jobs = 1
Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no
reduce operator Job running in-process (local Hadoop) 2016-08-23 22:16:57,028
Stage-1 map = 100%, reduce = 0% Ended Job = job_local8632460_0005 Moving data
to local directory /usr/tmp/export MapReduce Jobs Launched: Stage-Stage-1: HDFS
Read: 794 HDFS Write: 498 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK
Time taken: 1.569 seconds hive>
在本地文件查看内容:
[root@localhost export]# ll total 4 -rw-r--r--. 1 root root 32 Aug 23 22:16
000000_0 [root@localhost export]# cat 000000_0 1a1b1 2a2b2 3a3b3 4a4b4
[root@localhost export]# pwd /usr/tmp/export [root@localhost export]#
导出到hdfs
hive> insert overwrite directory '/usr/tmp/test' select * from test1; WARNING:
Hive-on-MR is deprecated in Hive 2 and may not be available in the future
versions. Consider using a different execution engine (i.e. spark, tez) or
using Hive 1.X releases. Query ID =
root_20160823214217_e8c71bb9-a147-4518-8353-81f9adc54183 Total jobs = 3
Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no
reduce operator Job running in-process (local Hadoop) 2016-08-23 21:42:19,257
Stage-1 map = 100%, reduce = 0% Ended Job = job_local628523792_0004 Stage-3 is
selected by condition resolver. Stage-2 is filtered out by condition resolver.
Stage-4 is filtered out by condition resolver. Moving data to directory
hdfs://localhost:8020/usr/tmp/test/.hive-staging_hive_2016-08-23_21-42-17_778_6818164305996247644-1/-ext-10000
Moving data to directory /usr/tmp/test MapReduce Jobs Launched: Stage-Stage-1:
HDFS Read: 730 HDFS Write: 498 SUCCESS Total MapReduce CPU Time Spent: 0 msec
OK Time taken: 1.594 seconds
导出成功，查看导出的hdfs文件
hive> dfs -cat /usr/tmp/test; cat: `/usr/tmp/test': Is a directory Command
failed with exit code = 1 Query returned non-zero code: 1, cause: null hive>
dfs -ls /usr/tmp/test; Found 1 items -rwxr-xr-x 3 root supergroup 32 2016-08-23
21:42 /usr/tmp/test/000000_0 hive> dfs -cat /usr/tmp/test/000000_0; 1a1b1 2a2b2
3a3b3 4a4b4 hive>
导出到另一个表

样例可以参考前面数据导入的部分:
insert into table test3 select * from test1;

热门工具换一换