外部表（加载数据:搭建数据分析体系第24篇）

创建hive外部分区表，加载数据步骤

【1】创建外部表，指定数据仓库目录

【2】加载本地数据到数据仓库，指定分区-->上传到特定分区目录

【3】修改表，添加分区

【4】查询数据

详细步骤

（1）创建hive外部分区表（创建外部分区表同时指定数据仓库的目录）

（2）本地数据加载到hive数据仓库，指定分区，此时在数据仓库目录下自动创建分区目录：

（3）需要修改表，添加相应的分区

（4）可查到分区数据

（5）使用分区作为where条件的查询语句

（6）继续加载本地数据文件到hive数据仓库，指定新分区

内部分区表查询语句（安sex分区进行分区）按分区sex='male'查询：

select id，name，sex from people_sex where sex='sex' and id

可以直接从该分区查询到结果数据。

不按分区查询：

select id，name，sex from people_sex where id >190;

创建hive外部分区表，加载数据实例

（1）外部分区表：

create external table if not exists people_sex_outside(

id INT,

name VARCHAR(100)

)

partitioned by (sex string)

row format delimited

fields terminated by 't'

lines terminated by 'n'

stored as textfile

location '/people_sex_outside_direct'

;

select * from people_sex_outside;

没有结果,不满足分区表相应的目录格式。

people_sex_outside_direct是指定hive表数据存放目录，即数据仓库目录

使用Hadoop命令拷贝数据到指定位置（hive的shell中执行和Linux的Shell执行）,同内部表：

执行命令加载本地数据到hive表中：

（2）本地数据加载到hive数据仓库，指定分区，此时自动创建分区目录：

dfs -copyFromLocal /opt/datas/people.txt /people_sex_outside_direct/sex=male

select * from people_sex_outside;

满足分区表相应的目录格式，仍然没有结果，因为，查不到分区相关信息。

（3）需要修改表，添加相应的分区

alter table people_sex_outside add partition(sex='male');

可显示分区

show partitions people_sex_outside;

（4）可查到分区数据

select * from people_sex_outside;

再加载数据到外部分区表，创建新分区目录（不必要）：

dfs -mkdir -p /people_sex_outside_direct/sex=sex

（5）本地数据加载到hive数据仓库，指定分区，此时自动创建分区目录：

dfs -copyFromLocal /opt/datas/people.txt /people_sex_outside_direct/sex=sex

此时还查不到刚加载的新分区数据：

select * from people_sex_outside;

（6）需要修改表，添加相应的分区

alter table people_sex_outside add partition(sex='sex');

此时能查看到添加的分区

show partitions people_sex_outside;

可查到分区数据

select * from people_sex_outside;

（7）使用分区作为where条件的查询语句：

select * from people_sex_outside where sex='male' and id >190;

正文

外部表（加载数据:搭建数据分析体系第24篇）

相关阅读

酷跑游戏大全,一直死然后重来的跑酷游戏

新版赵信,端游赵信技能加点顺序

wow 采矿攻略,WOW中采矿

网游发卡平台,淘手游的673严选靠谱吗

发表评论取消回复

还没有评论，来说两句吧...

目录[+]