MangoCool

hadoop集群上hive安装,配置mysql数据库存储metadata

2016-06-07 16:15:33   作者:MangoCool   来源:MangoCool

Hive是基于Hadoop的一个数据仓库工具,可以将结构化的数据文件映射为一张数据库表,并提供简单的sql查询功能,可以将sql语句转换为MapReduce任务进行运行。 其优点是学习成本低,可以通过类SQL语句快速实现简单的MapReduce统计,不必开发专门的MapReduce应用,十分适合数据仓库的统计分析。

我们这里采用的Hadoop-2.7.2、Hive-2.0.0版本,Hadoop实现已经安装完成,服务器三台,分别是:

192.168.21.6 slave

192.168.21.181 master

192.168.21.9 slave


MySQL安装

这一步其实不是必须的,因为Hive默认的metadata(元数据)是存储在Derby里面的,但是有一个弊端就是同一时间只能有一个Hive实例访问,这适合做开发程序时做本地测试。

Hive提供了增强配置,可将数据库替换成mysql等关系型数据库,将存储数据独立出来在多个服务示例之间共享。

1、安装MySQL:

yum install -y mysql-server

2、启动MySQL:

service mysqld start

3、root登录新建新用户:

mysql -u root -p

root初始密码为空,输入命令后直接回车即可。

mysql> use mysql;
mysql> update user set password = Password('root') where User = 'root';
mysql> create user 'hive'@'%' identified by 'hive';
mysql> grant all privileges on *.* to 'hive'@'%' with grant option;
mysql> flush privileges;
mysql> exit;

4、创建数据库:

mysql> create database hive;


Hive安装配置

Hive只需要安装master就可以了

1、下载地址:https://mirrors.tuna.tsinghua.edu.cn/apache/hive/,这里我选择Hive-2.0.0版本的。

2、解压:

tar -zxvf apache-hive-2.0.0-bin.tar.gz
3、配置/etc/profile:

export HIVE_HOME=/home/hadoop/SW/hive-2.0.0
export PATH=$HIVE_HOME/bin:$HIVE_HOME/conf:$PATH

4、创建Hive数据文件目录:

在HDFS中建立用于存储Hive数据的文件目录(/tmp 目录可能已经存在),进入hadoop的bin目录执行:

./hadoop fs -mkdir /tmp
./hadoop fs -mkdir /user/hive/warehouse
./hadoop fs -chmod 777 /tmp
./hadoop fs -chmod 777 /user/hive/warehouse
其中/tmp用于存放一些执行过程中的临时文件,/user/hive/warehouse用于存放Hive进行管理的数据文件。

5、Hive配置文件:

Hive配置文件位于$Hive_Home/conf目录下面,名为hive-site.xml,这个文件默认情况下是不存在的,需要进行手动创建,在此目录下有个hive-default.xml.template的模板文件,cp它创建hive-site.xml文件。

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>

<property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
        <description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
        <description>Driver class name for a JDBC metastore</description>
</property>

<property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>hive</value>
        <description>username to use against metastore database</description>
</property>

<property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>hive</value>
        <description>password to use against metastore database</description>
</property>

<property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/user/hive/warehouse</value>
        <description>location of default database for the warehouse</description>
</property>

<property>
        <name>hive.metastore.local</name>
        <value>true</value>
</property>

</configuration>

这里我们将hive的服务端(Hive Service)和客户端(metadata Service)部署在同一台服务器上。

也可以部署在不同服务器,这样更好,如下:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>

	<!--Server-->
	<property>
		<name>javax.jdo.option.ConnectionURL</name>
		<value>jdbc:mysql://192.168.21.8:3306/hive?createDatabaseIfNotExist=true</value>
		<description>JDBC connect string for a JDBC metastore</description>
	</property>

	<property>
		<name>javax.jdo.option.ConnectionDriverName</name>
		<value>com.mysql.jdbc.Driver</value>
		<description>Driver class name for a JDBC metastore</description>
	</property>

	<property>
		<name>javax.jdo.option.ConnectionUserName</name>
		<value>hive</value>
		<description>username to use against metastore database</description>
	</property>

	<property>
		<name>javax.jdo.option.ConnectionPassword</name>
		<value>hive</value>
		<description>password to use against metastore database</description>
	</property>
	<!--Server-->

	<!--Client-->
	<property>
		<name>hive.metastore.local</name>
		<value>false</value>
	</property>

	<property>
		<name>hive.server2.thrift.port</name>
		<value>10001</value>
	</property>

	<property>
		<name>hive.server2.authentication</name>
		<value>NONE</value>
	</property>

	<property>
		<name>hive.metastore.uris</name>
		<value>thrift://192.168.21.8:9083</value>
	</property>
	<!--Client-->

	<!--Common-->
	<property>
		<name>hive.metastore.warehouse.dir</name>
		<value>/user/hive/warehouse</value>
		<description>location of default database for the warehouse</description>
	</property>

	<property>
		<name>hive.exec.scratchdir</name>
		<value>/tmp/hive</value>
		<description>Local scratch space for Hive jobs</description>
	</property>

	<property>
		<name>hive.default.fileformat</name>
		<value>Parquet</value>
		<description>
      Expects one of [textfile, sequencefile, rcfile, orc].
      Default file format for CREATE TABLE statement.
      Users can explicitly override it by CREATE TABLE ... STORED AS [FORMAT]
		</description>
	</property>

	<property>
		<name>hive.query.result.fileformat</name>
		<value>Parquet</value>
		<description>
      Expects one of [textfile, sequencefile, rcfile].
      Default file format for storing result of the query.
		</description>
	</property>

	<property>
		<name>hive.execution.engine</name>
		<value>spark</value>
		<description>
          Expects one of [mr, tez, spark].
          Chooses execution engine. Options are: mr (Map reduce, default), tez, spark.
                  While MR remains the default engine for historical reasons, 
                  it is itself a historical engine and is deprecated in Hive 2 line. 
                  It may be removed without further warning.
		</description>
	</property>
	<!--Common-->

</configuration>

更多Hive metastore配置方式可以参考:http://blog.csdn.net/reesun/article/details/8556078

JDBC驱动包下载:

wget http://search.maven.org/remotecontent?filepath=mysql/mysql-connector-java/5.1.38/mysql-connector-java-5.1.38.jar

复制到hive-2.0.0/lib下:

cp mysql-connector-java-5.1.38.jar hive-2.0.0/lib

启动前初始化:

bin/schematool -dbType mysql -initSchema

启动Hive:

hive --service hiveserver2 &
hive --service metastore &
如果要让Hive运行于后台,可执行:
nohup hive --service hiveserver2 &
nohup hive --service metastore &

启动CLI方式:

hive shell

or

hive

debug模式的CLI:

hive --hiveconf hive.root.logger=DEBUG,console

标签: hadoop hive mysql

分享:

上一篇java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@70e8f8e is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.

下一篇java.lang.ClassNotFoundException: org.apache.commons.io.Charsets

关于我

崇尚极简,热爱技术,喜欢唱歌,热衷旅行,爱好电子产品的一介码农。

座右铭

当你的才华还撑不起你的野心的时候,你就应该静下心来学习,永不止步!

人生之旅历途甚长,所争决不在一年半月,万不可因此着急失望,招精神之萎葸。

Copyright 2015- 芒果酷(mangocool.com) All rights reserved. 湘ICP备14019394号

免责声明:本网站部分文章转载其他媒体,意在为公众提供免费服务。如有信息侵犯了您的权益,可与本网站联系,本网站将尽快予以撤除。