alluxio java_Alluxio 简单使用

一、 Alluxio HA

下载 tar 包，这里选择了 2.0 版本，下载后解压到 cdh01

ssh 免密

配置：cp conf/alluxio-site.properties.template conf/alluxio-site.properties

# 主节点，使用 ip 的形式，每个主节点配置自己的 ip，worker 节点不配置

alluxio.master.hostname=192.168.12.38

# 配置日志文件目录

alluxio.master.journal.folder=hdfs://nameservice1/alluxio/journal/

# 必须配置，否则启动报错，默认使用 EMBEDDED，和 zookeeper 高可用不可共用

alluxio.master.journal.type=UFS

# 必须配置，否则启动报错，且每个节点需创建该文件夹

alluxio.worker.tieredstore.level0.dirs.path=/opt/alluxio-2.0.0/ramdisk

# zk 高可用

alluxio.zookeeper.enabled=true

alluxio.zookeeper.address=cdh01:2181,cd02:2181,cdh07:2181

alluxio.zookeeper.session.timeout=120s

# 关联hdfs配置文件

alluxio.underfs.hdfs.configuration=/etc/hadoop/conf/core-site.xml:/etc/hadoop/conf/hdfs-site.xml

配置 conf/alluxio-env.sh，添加 JAVA_HOME ：export JAVA_HOME=/usr/local/jdk1.8.0_231

配置 conf/nasters 及 conf/workers

分发到所有 master 和 worker 节点！

在任一台 master 节点运行格式化操作：./bin/alluxio format

在任一台 master 节点启动：./bin/alluxio-start.sh all SudoMount，倘若报错，就去每台 worker 节点先执行 ./bin/alluxio-start.sh worker ，然后再 ./bin/alluxio-stop.sh all -> ./bin/alluxio-start.sh all

./bin/alluxio fs leader 确定 leader，然后页面访问 http://:19999

二、HDFS 集成 Alluxio

报错： java.io.IOException No FileSystem for scheme: alluxio

2.1、Configuring core-site.xml

首先在 CDH 集群 core-site.xml 中添加：

fs.alluxio.impl

alluxio.hadoop.FileSystem

alluxio.zookeeper.enabled

true

alluxio.zookeeper.address

cdh01:2181,cdh02:2181,cdh07:2181

2.2、Configuring HADOOP_CLASSPATH

In order for the Alluxio client jar to be available to the MapReduce applications, you must add the Alluxio Hadoop client jar to the $HADOOP_CLASSPATH environment variable in hadoop-env.sh.

In the “YARN (MR2 Included)” section of the Cloudera Manager, in the “Configuration” tab, search for the parameter “Gateway Client Environment Advanced Configuration Snippet (Safety Valve) for hadoop-env.sh”. Then add the following line to the script:

HADOOP_CLASSPATH=/opt/alluxio-2.0.0/client/alluxio-2.0.0-client.jar:${HADOOP_CLASSPATH}

三、Spark 集成 Alluxio

3.1、配置

可直接通过如下命令的方式运行：

spark2-submit –master yarn –conf “spark.driver.extraClassPath=/opt/alluxio-2.0.0/client/alluxio-2.0.0-client.jar” –conf “spark.executor.extraClassPath=/opt/alluxio-2.0.0/client/alluxio-2.0.0-client.jar”

如果运行中配到如下错误：

alluxio.exception.status.UnauthenticatedException: Plain authentication failed: User yarn is not configured for any impersonation. impersonationUser: intellif

在 alluxio-site.properties 中添加如下配置，并重启 alluxio 集群。

# 配置用户模拟，允许yarn用户模拟任意用户

alluxio.master.security.impersonation.yarn.users=*

# 或者配置下面一大段

alluxio.master.security.impersonation.root.users=*

alluxio.master.security.impersonation.root.groups=*

alluxio.master.security.impersonation.client.users=*

alluxio.master.security.impersonation.client.groups=*

alluxio.security.login.impersonation.username=none

3.2、Spark SQL 读取 Alluxio

# 拷贝数据

./bin/alluxio fs copyFromLocal LICENSE /Input

spark2-shell –master yarn –conf “spark.driver.extraClassPath=/opt/alluxio-2.0.0/client/alluxio-2.0.0-client.jar” –conf “spark.executor.extraClassPath=/opt/alluxio-2.0.0/client/alluxio-2.0.0-client.jar”

进入 spark-shell 后运行：

val s = sc.textFile(“alluxio://cdh02:19998/Input”) // 地址选 leader 地址

val double = s.map(line => line + line)

double.saveAsTextFile(“alluxio://cdh02:19998/Output”)

四、Hive 集成 Alluxio

4.1、配置

在 hive-env.sh 中添加如下配置：

HIVE_AUX_JARS_PATH=/opt/alluxio-2.0.0/client/alluxio-2.0.0-client.jar:${HIVE_AUX_JARS_PATH}

同时在 alluxio 配置文件 alluxio-site.properties 中添加：

alluxio.master.security.impersonation.hive.users=*

4.2、Hive 内部表关联 Alluxio

# 下载文件

wget http://files.grouplens.org/datasets/movielens/ml-100k.zip

# 解压

unzip ml-100k.zip

# 拷贝到 alluxio，使用 alluxio://hostname:19998 访问失败，改为 alluxio://zk@zkHost1:2181,zkHost2:2181,zkHost3:2181/path

./bin/alluxio fs mkdir /ml-100k

./bin/alluxio fs copyFromLocal /home/intellif/wqf/ml-100k/u.user alluxio://zk@cdh01:2181,cdh02:2181,cdh07:2181/ml-100k/u.user

alluxio ui：

建立 hive 表，把路径关联上即可，和 hdfs 上的表操作差不多：

CREATE TABLE u_user (

userid INT,

age INT,

gender CHAR(1),

occupation STRING,

zipcode STRING)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ‘|’

STORED AS TEXTFILE

LOCATION ‘alluxio://zk@cdh01:2181,cdh02:2181,cdh07:2181/ml-100k’;

其实只要指定 location 就行，其他的话和以前没啥区别。

然后可进行查询等操作：

select * from u_user;

4.3、调整原有 hdfs 上的表读取 alluxio

场景是以前的 hive 表是直接读取 hdfs 的，现在需要调整为 alluxio，这个场景带来的效果是，第一次读取表的时候还是读取 hdfs，但是读取完成时候，数据 hdfs 上加载到 alluxio 上，这个操作的前提是 alluxio 的根路径需要和 hdfs 的根路径关联，这个需要补充的是这里只要实现 hdfs 的路径和 alluxio 是重叠的，alluxio 可以感知到路径被读取了，需要调整配置，重启 alluxio：

alluxio.master.mount.table.root.ufs=hdfs://nameservice1/

接下来的是表的操作，首先是建立一张普通表：

CREATE TABLE u_user_3 (

userid INT,

age INT,

gender CHAR(1),

occupation STRING,

zipcode STRING)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ‘|’

STORED AS TEXTFILE ;

再导入数据：

LOAD DATA LOCAL INPATH ‘/home/intellif/ml-100k/u.user’ OVERWRITE INTO TABLE u_user_3;

desc formatted u_user_3;

# col_name data_type comment

userid int

age int

gender char(1)

occupation string

zipcode string

# Detailed Table Information

Database: bigdata_odl

OwnerType: USER

Owner: admin

CreateTime: Tue May 26 16:28:02 CST 2020

LastAccessTime: UNKNOWN

Protect Mode: None

Retention: 0

Location: hdfs://nameservice1/user/hive/warehouse/bigdata_odl.db/u_user_3

这里只是普通操作，数据还是存储在 hdfs 中，接下来需要修改表的存储路径到 alluxio 中：

alter table u_user_3 set location “alluxio://zk@cdh01:2181,cdh02:2181,cdh07:2181/user/hive/warehouse/bigdata_odl.db/u_user_3”;

再对这个表查询：

select count(*) from u_user_3;

再去 alluxio 中可以看到表数据被 cache 住了。

今天的文章alluxio java_Alluxio 简单使用分享到此就结束了，感谢您的阅读。

版权声明：本文内容由互联网用户自发贡献，该文观点仅代表作者本人。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容，请发送邮件至举报，一经查实，本站将立刻删除。
如需转载请保留出处：https://bianchenghao.cn/5527.html

alluxio java_Alluxio 简单使用

相关推荐

发表回复