当前位置：首页 > news >正文

企业服饰网站模板网站seo优化方法

news 2025/7/28 19:32:48

企业服饰网站模板,网站seo优化方法,做外贸主页网站用什么的空间好点,毕节市建设网站Hive之分区表文章目录 Hive之分区表写在前面分区表分区表基本操作引入分区表创建分区表语法加载数据到分区表中查询分区表中数据增加分区删除分区查看分区表有多少分区查看分区表结构二级分区正常的加载数据分区表和数据产生关联动态分区开启动态分区参数设置案例实操写在前…

Hive之分区表

文章目录

Hive之分区表
- 写在前面
- 分区表
- - 分区表基本操作
  - - 引入分区表
    - 创建分区表语法
    - 加载数据到分区表中
    - 查询分区表中数据
    - 增加分区
    - 删除分区
    - 查看分区表有多少分区
    - 查看分区表结构
  - 二级分区
  - - 正常的加载数据
    - 分区表和数据产生关联
  - 动态分区
  - - 开启动态分区参数设置
    - 案例实操

写在前面

Linux版本：CentOS7.5
Hive版本：Hive-3.1.2

分区表

分区表实际上就是对应一个HDFS文件系统上的独立的文件夹，该文件夹下是该分区所有的数据文件。Hive中的分区就是分目录，把一个大的数据集根据业务需要分割成小的数据集。在查询时通过WHERE子句中的表达式选择查询所需要的指定的分区，这样的查询效率会提高很多。

分区表基本操作

引入分区表

需要根据日期对日志进行管理, 通过部门信息模拟

dept_20200401.log
dept_20200402.log
dept_20200403.log
……

创建分区表语法

hive (default)> create table dept_partition(
deptno int, dname string, loc string
)
partitioned by (day string)
row format delimited fields terminated by '\t';

注意：分区字段不能是表中已经存在的数据，可以将分区字段看作表的伪列。

加载数据到分区表中

（1）数据准备

dept_20200401.log

10	ACCOUNTING	1700
20	RESEARCH	1800
dept_20200402.log
30	SALES	1900
40	OPERATIONS	1700
dept_20200403.log
50	TEST	2000
60	DEV	1900

（2）加载数据

hive (default)> load data local inpath '/export/server/hive-3.1.2/datas/dept_20200401.log' into table dept_partition partition(day='20200401');
hive (default)> load data local inpath '/export/server/hive-3.1.2/datas/dept_20200402.log' into table dept_partition partition(day='20200402');
hive (default)> load data local inpath '/export/server/hive-3.1.2/datas/dept_20200403.log' into table dept_partition partition(day='20200403');

注意：分区表加载数据时，必须指定分区

HDFS Web段查看分区

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-7Rf6nz3l-1682061151347)(assets/01.png)]

Hive查询分区

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-sOxSbXvR-1682061151348)(assets/02.png)]

查询分区表中数据

单分区查询

hive (default)> select * from dept_partition where day='20200401';

多分区联合查询

hive (default)> select * from dept_partition where day='20200401'unionselect * from dept_partition where day='20200402'unionselect * from dept_partition where day='20200403';
hive (default)> select * from dept_partition where day='20200401' orday='20200402' or day='20200403' ;

增加分区

创建单个分区

hive (default)> alter table dept_partition add partition(day='20200404') ;

同时创建多个分区（中间没有加逗号）

hive (default)> alter table dept_partition add partition(day='20200405') partition(day='20200406');

删除分区

删除单个分区

hive (default)> alter table dept_partition drop partition (day='20200406');

同时删除多个分区（中间有加逗号）

hive (default)> alter table dept_partition drop partition (day='20200404'), partition(day='20200405');

查看分区表有多少分区

hive> show partitions dept_partition;

查看分区表结构

hive> desc formatted dept_partition;# Partition Information          
# col_name              data_type               comment             
month                   string

二级分区

假设现在有一个需求：一天的日志数据量很大，如何再将数据拆分?

答案就是接下来的 二级分区

正常的加载数据

（1）加载数据到二级分区表中

hive (default)> load data local inpath '/opt/module`/hive/datas/dept_20200401.log' into table
dept_partition2 partition(day='20200401', hour='12');

（2）查询分区数据

hive (default)> select * from dept_partition2 where day='20200401' and hour='12';

分区表和数据产生关联

把数据直接上传到分区目录上，让分区表和数据产生关联的三种方式

（1）方式一：上传数据后修复

上传数据（dfs -mkdir –p 或者 hadoop fs –mkdir）

hive (default)> dfs -mkdir -p/user/hive/warehouse/mydb.db/dept_partition2/day=20200401/hour=13;
hive (default)> dfs -put /opt/module/datas/dept_20200401.log  /user/hive/warehouse/mydb.db/dept_partition2/day=20200401/hour=13;

查询数据（查询不到刚上传的数据）

hive (default)> select * from dept_partition2 where day='20200401' and hour='13';

执行修复命令

hive> msck repair table dept_partition2;

再次查询数据

hive (default)> select * from dept_partition2 where day='20200401' and hour='13';

（2）方式二：上传数据后添加分区

上传数据

hive (default)> dfs -mkdir -p /user/hive/warehouse/mydb.db/dept_partition2/day=20200401/hour=14;
hive (default)> dfs -put /export/server/hive-3.1.2/datas/dept_20200401.log/user/hive/warehouse/mydb.db/dept_partition2/day=20200401/hour=14;

执行添加分区

hive (default)> alter table dept_partition2 add partition(day='201709',hour='14');

查询数据

hive (default)> select * from dept_partition2 where day='20200401' and hour='14';

（3）方式三：创建文件夹后load数据到分区

hive (default)> dfs -mkdir -p /user/hive/warehouse/mydb.db/dept_partition2/day=20200401/hour=15;

hive (default)> load data local inpath '/export/server/hive-3.1.2/datas/dept_20200401.log' into tabledept_partition2 partition(day='20200401',hour='15');

查询数据

hive (default)> select * from dept_partition2 where day='20200401' and hour='15';

动态分区

关系型数据库中，对分区表Insert数据时候，数据库自动会根据分区字段的值，将数据插入到相应的分区中，Hive中也提供了类似的机制，即动态分区(Dynamic Partition)，只不过，使用Hive的动态分区，需要进行相应的配置。

开启动态分区参数设置

（1）开启动态分区功能（默认true，开启）

hive.exec.dynamic.partition=true

（2）设置为非严格模式（动态分区的模式，默认strict，表示必须指定至少一个分区为静态分区，nonstrict模式表示允许所有的分区字段都可以使用动态分区。）

hive.exec.dynamic.partition.mode=nonstrict

（3）在所有执行MR的节点上，最大一共可以创建多少个动态分区。默认1000

hive.exec.max.dynamic.partitions=1000

（4）在每个执行MR的节点上，最大可以创建多少个动态分区。该参数需要根据实际的数据来设定。比如：源数据中包含了一年的数据，即day字段有365个值，那么该参数就需要设置成大于365，如果使用默认值100，则会报错。

hive.exec.max.dynamic.partitions.pernode=100

（5）整个MR Job中，最大可以创建多少个HDFS文件。默认100000

hive.exec.max.created.files=100000

（6）当有空分区生成时，是否抛出异常。一般不需要设置。默认false

hive.error.on.empty.partition=false

案例实操

需求：将dept表中的数据按照地区（loc字段），插入到目标表dept_partition的相应分区中。
（1）创建目标分区表

hive (default)> create table dept_partition_dy(id int, name string) partitioned by (loc int) row format delimited fields terminated by '\t';

（2）设置动态分区

set hive.exec.dynamic.partition.mode = nonstrict;
hive (default)> insert into table dept_partition_dy partition(loc) select deptno, dname, loc from dept;

（3）查看目标分区表的分区情况

hive (default)> show partitions dept_partition;

扩展问题：目标分区表是如何匹配到分区字段的？

==> 位置，默认最后一列是分区列，“伪”列在最后

全文结束！！！

查看全文

http://www.zhongyajixie.com/news/58989.html

网站首页栏目页内容页广州专做优化的科技公司

wordpress优惠卷福州seo博客

徐州网站制作需要多少钱星乐seo网站关键词排名优化

谈谈网站建设会有哪些问题社群营销活动策划方案

怎么做百度里面自己的网站seo平台优化

公司的网站建设一般需要多少费用常用的五种网络营销工具

2345网址大全历史版本网站seo优化教程

深圳网站建设 site创建一个网站需要什么

wordpress获取时间百度推广seo自学

北京创意网站建设百度认证平台官网

找别人做网站都需要注意啥如何申请一个网站域名

平面设计接单网站有哪些互联网品牌营销公司

贵州省建设执业资格教育促进会网站江苏seo哪家好

网站虚拟主机北京网站建设公司案例

软件设计图片石家庄整站优化技术

wordpress 首页文章数常州seo外包