文章目录
altas2.1.0编译、安装、集成CDH6.3.2
参考链接1:http://t.csdn.cn/TOS4q
参考链接2:数据治理之数据管理的利器——Atlas入门宝典 - 独孤风 - 博客园 (cnblogs.com)
一: Atlas源码下载
http://atlas.apache.org/2.1.0/index.html#/Downloads
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-J0W62QT4-84)(C:\Users\Second\AppData\Roaming\Typora\typora-user-images\image-050834.png)]
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-VgO8mcVw-85)(C:\Users\Second\AppData\Roaming\Typora\typora-user-images\image-.png)]
二: Atlas源码编译
环境需求 | 示例版本 |
---|---|
JDK8 | Java™ SE Runtime Environment (build 1.8.0_231-b11) |
maven3.5 | 3.10.0-957.el7.x86_64 |
git | git version 1.8.3.1 |
gcc | gcc (GCC) 4.8.5 (Red Hat 4.8.5-36) |
python3.7 | Python 3.7.0 |
nodejs | 6.4.1 |
1.修改altas项目主pom文件,即需要编译的CDH6.3.2对应版本信息
<repository> <id>cloudera</id> <url>https://repository.cloudera.com/artifactory/cloudera-repos</url> <releases> <enabled>true</enabled> </releases> <snapshots> <enabled>false</enabled> </snapshots> </repository>
<lucene-solr.version>7.4.0-cdh6.3.2</lucene-solr.version> <hadoop.version>3.0.0-cdh6.3.2</hadoop.version> <hbase.version>2.1.0-cdh6.3.2</hbase.version> <solr.version>7.4.0-cdh6.3.2</solr.version> <hive.version>2.1.1-cdh6.3.2</hive.version> <kafka.version>2.2.1-cdh6.3.2</kafka.version> <zookeeper.version>3.4.5-cdh6.3.2</zookeeper.version>
2.Atlas源代码修改,兼容CDH6.3.2-Hive2.1.1,默认使用Hive3.1
注意 代码目录:atlas-release-2.1.0/addons/hive-bridge
2.1 修改src/main/java/org/apache/atlas/hive/bridge//HiveMetaStoreBridge.java 577行
String catalogName = hiveDB.getCatalogName() != null ? hiveDB.getCatalogName().toLowerCase() : null; 修改为 String catalogName = null;
2.2 修改src/main/java/org/apache/atlas/hive/hook/AtlasHiveHookContext.java 81行
this.metastoreHandler = (listenerEvent != null) ? metastoreEvent.getIHMSHandler() : null; 修改为 this.metastoreHandler = null;
3.编译
mvn clean -DskipTests package -Pdist -Drat.skip=true
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-c9AletJr-85)(C:\Users\Second\AppData\Roaming\Typora\typora-user-images\image-055396.png)]
注意 编译CDH版本时存在部分问题。
例如 由于Cloudera不在公布6.3.2之后的免费发行版,此版本以及以后的版本对应的jar包很少进行维护与更新,亦不可进行下载。
Q: lucene-core-7.4.0-cdh6.3.2.jar 无法下载。中央库Cloudera-repl中存在,下载显示404
[ERROR] Failed to execute goal on project atlas-graphdb-janus: Could not resolve dependencies for project org.apache.atlas:atlas-graphdb-janus:jar:2.1.0: Failed to collect dependencies at org.apache.lucene:lucene-core:jar:7.4.0-cdh6.3.2: Failed to read artifact descriptor for org.apache.lucene:lucene-core:jar:7.4.0-cdh6.3.2: Could not transfer artifact org.apache.lucene:lucene-solr-grandparent:pom:7.4.0-cdh6.3.2 from/to cloudera (https://repository.cloudera.com/artifactory/cloudera-repos/): Tra
solution:获取lucene-core-7.4.0.jar换名字lucene-core-7.4.0-cdh6.3.2.jar,比较无耻但有用。
三: Altas安装
获取编译之后的apache-atlas-2.1.0-bin.tar.gz,所在目录apache-atlas-2.1.0/distro/target/。
3.1 解压
tar -zxvf apache-atlas-2.1.0-bin.tar.gz
3.2 配置atlas-env.sh
解开注释
export HBASE_CONF_DIR=/etc/hbase/conf export ATLAS_SERVER_HEAP="-Xms15360m -Xmx15360m -XX:MaxNewSize=5120m -XX:MetaspaceSize=100M -XX:MaxMetaspaceSize=512m" export ATLAS_SERVER_OPTS="-server -XX:SoftRefLRUPolicyMSPerMB=0 -XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+PrintTenuringDistribution -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=dumps/atlas_server.hprof -Xloggc:logs/gc-worker.log -verbose:gc -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1m -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCTimeStamps" export MANAGE_LOCAL_HBASE=false export MANAGE_LOCAL_SOLR=false export MANAGE_EMBEDDED_CASSANDRA=false export MANAGE_LOCAL_ELASTICSEARCH=false
3.3 配置atlas-application.properties
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
Graph Database Configs
# Graph Database
#Configures the graph database to use. Defaults to JanusGraph
#atlas.graphdb.backend=org.apache.atlas.repository.graphdb.janus.AtlasJanusGraphDatabase
# Graph Storage
# Set atlas.graph.storage.backend to the correct value for your desired storage
# backend. Possible values:
#
# hbase
# cassandra
# embeddedcassandra - Should only be set by building Atlas with -Pdist,embedded-cassandra-solr
# berkeleyje
#
# See the configuration documentation for more information about configuring the various storage backends.
#
atlas.graph.storage.hbase.table=atlas
atlas.graph.storage.backend=hbase
atlas.graph.storage.hbase.table=apache_atlas_janus
#Hbase
#For standalone mode , specify localhost
#for distributed mode, specify zookeeper quorum here
atlas.graph.storage.hostname=master1:2181,master2:2181,core1:2181
atlas.graph.storage.hbase.regions-per-server=1
atlas.graph.storage.lock.wait-time=10000
#In order to use Cassandra as a backend, comment out the hbase specific properties above, and uncomment the
#the following properties
#atlas.graph.storage.clustername=
#atlas.graph.storage.port=
# Gremlin Query Optimizer
#
# Enables rewriting gremlin queries to maximize performance. This flag is provided as
# a possible way to work around any defects that are found in the optimizer until they
# are resolved.
#atlas.query.gremlinOptimizerEnabled=true
# Delete handler
#
# This allows the default behavior of doing "soft" deletes to be changed.
#
# Allowed Values:
# org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1 - all deletes are "soft" deletes
# org.apache.atlas.repository.store.graph.v1.HardDeleteHandlerV1 - all deletes are "hard" deletes
#
#atlas.DeleteHandlerV1.impl=org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1
# Entity audit repository
#
# This allows the default behavior of logging entity changes to hbase to be changed.
#
# Allowed Values:
# org.apache.atlas.repository.audit.HBaseBasedAuditRepository - log entity changes to hbase
# org.apache.atlas.repository.audit.CassandraBasedAuditRepository - log entity changes to cassandra
# org.apache.atlas.repository.audit.NoopEntityAuditRepository - disable the audit repository
#
#atlas.EntityAuditRepository.impl=org.apache.atlas.repository.audit.HBaseBasedAuditRepository
# if Cassandra is used as a backend for audit from the above property, uncomment and set the following
# properties appropriately. If using the embedded cassandra profile, these properties can remain
# commented out.
# atlas.EntityAuditRepository.keyspace=atlas_audit
# atlas.EntityAuditRepository.replicationFactor=1
# Graph Search Index
atlas.graph.index.search.backend=solr
#Solr
#Solr cloud mode properties
atlas.graph.index.search.solr.mode=cloud
atlas.graph.index.search.solr.zookeeper-url=master1:2181/solr,master2:2181/solr,core1:2181/solr
atlas.graph.index.search.solr.zookeeper-connect-timeout=60000
atlas.graph.index.search.solr.zookeeper-session-timeout=60000
atlas.graph.index.search.solr.wait-searcher=true
#Solr http mode properties
#atlas.graph.index.search.solr.mode=http
#atlas.graph.index.search.solr.http-urls=http://localhost:8983/solr
# ElasticSearch support (Tech Preview)
# Comment out above solr configuration, and uncomment the following two lines. Additionally, make sure the
# hostname field is set to a comma delimited set of elasticsearch master nodes, or an ELB that fronts the masters.
#
# Elasticsearch does not provide authentication out of the box, but does provide an option with the X-Pack product
# https://www.elastic.co/products/x-pack/security
#
# Alternatively, the JanusGraph documentation provides some tips on how to secure Elasticsearch without additional
# plugins: https://docs.janusgraph.org/latest/elasticsearch.html
#atlas.graph.index.search.hostname=localhost
#atlas.graph.index.search.elasticsearch.client-only=false
# Solr-specific configuration property
atlas.graph.index.search.max-result-set-size=150
Import Configs
#atlas.import.temp.directory=/temp/import
Notification Configs
atlas.notification.embedded=false
atlas.kafka.data=${sys:atlas.home}/data/kafka
atlas.kafka.zookeeper.connect=master1:2181,master2:2181,core1:2181
atlas.kafka.bootstrap.servers=master1:9092,master2:9092,core1:9092
atlas.kafka.zookeeper.session.timeout.ms=60000
atlas.kafka.zookeeper.connection.timeout.ms=60000
atlas.kafka.zookeeper.sync.time.ms=20
atlas.kafka.auto.commit.interval.ms=1000
atlas.kafka.hook.group.id=atlas
atlas.kafka.enable.auto.commit=false
atlas.kafka.auto.offset.reset=earliest
atlas.kafka.session.timeout.ms=30000
atlas.kafka.offsets.topic.replication.factor=1
atlas.kafka.poll.timeout.ms=1000
atlas.notification.create.topics=true
atlas.notification.replicas=1
atlas.notification.topics=ATLAS_HOOK,ATLAS_ENTITIES
atlas.notification.log.failed.messages=true
atlas.notification.consumer.retry.interval=500
atlas.notification.hook.retry.interval=1000
# Enable for Kerberized Kafka clusters
#atlas.notification.kafka.service.principal=kafka/_HOST@EXAMPLE.COM
#atlas.notification.kafka.keytab.location=/etc/security/keytabs/kafka.service.keytab
Server port configuration
atlas.server.http.port=21000
#atlas.server.https.port=21443
Security Properties
# SSL config
atlas.enableTLS=false
#truststore.file=/path/to/truststore.jks
#cert.stores.credential.provider.path=jceks://file/path/to/credentialstore.jceks
#following only required for 2-way SSL
#keystore.file=/path/to/keystore.jks
# Authentication config
atlas.authentication.method.kerberos=false
atlas.authentication.method.file=true
ldap.type= LDAP or AD
atlas.authentication.method.ldap.type=none
user credentials file
atlas.authentication.method.file.filename=${sys:atlas.home}/conf/users-credentials.properties
groups from UGI
#atlas.authentication.method.ldap.ugi-groups=true
# LDAP properties
#atlas.authentication.method.ldap.url=ldap://<ldap server url>:389
#atlas.authentication.method.ldap.userDNpattern=uid={0},ou=People,dc=example,dc=com
#atlas.authentication.method.ldap.groupSearchBase=dc=example,dc=com
#atlas.authentication.method.ldap.groupSearchFilter=(member=uid={0},ou=Users,dc=example,dc=com)
#atlas.authentication.method.ldap.groupRoleAttribute=cn
#atlas.authentication.method.ldap.base.dn=dc=example,dc=com
#atlas.authentication.method.ldap.bind.dn=cn=Manager,dc=example,dc=com
#atlas.authentication.method.ldap.bind.password=<password>
#atlas.authentication.method.ldap.referral=ignore
#atlas.authentication.method.ldap.user.searchfilter=(uid={0})
#atlas.authentication.method.ldap.default.role=<default role>
Active directory properties
#atlas.authentication.method.ldap.ad.domain=example.com
#atlas.authentication.method.ldap.ad.url=ldap://<AD server url>:389
#atlas.authentication.method.ldap.ad.base.dn=(sAMAccountName={0})
#atlas.authentication.method.ldap.ad.bind.dn=CN=team,CN=Users,DC=example,DC=com
#atlas.authentication.method.ldap.ad.bind.password=<password>
#atlas.authentication.method.ldap.ad.referral=ignore
#atlas.authentication.method.ldap.ad.user.searchfilter=(sAMAccountName={0})
#atlas.authentication.method.ldap.ad.default.role=<default role>
JAAS Configuration #
#atlas.jaas.KafkaClient.loginModuleName = com.sun.security.auth.module.Krb5LoginModule
#atlas.jaas.KafkaClient.loginModuleControlFlag = required
#atlas.jaas.KafkaClient.option.useKeyTab = true
#atlas.jaas.KafkaClient.option.storeKey = true
#atlas.jaas.KafkaClient.option.serviceName = kafka
#atlas.jaas.KafkaClient.option.keyTab = /etc/security/keytabs/atlas.service.keytab
#atlas.jaas.KafkaClient.option.principal = atlas/_HOST@EXAMPLE.COM
Server Properties
atlas.rest.address=http://localhost:21000
# If enabled and set to true, this will run setup steps when the server starts
atlas.server.run.setup.on.start=false
Entity Audit Configs
atlas.audit.hbase.tablename=apache_atlas_entity_audit
atlas.audit.zookeeper.session.timeout.ms=1000
atlas.audit.hbase.zookeeper.quorum=master1:2181,master2:2181,core1:2181
High Availability Configuration #
atlas.server.ha.enabled=false
Enabled the configs below as per need if HA is enabled
#atlas.server.ids=id1
#atlas.server.address.id1=localhost:21000
#atlas.server.ha.zookeeper.connect=localhost:2181
#atlas.server.ha.zookeeper.retry.sleeptime.ms=1000
#atlas.server.ha.zookeeper.num.retries=3
#atlas.server.ha.zookeeper.session.timeout.ms=20000
if ACLs need to be set on the created nodes, uncomment these lines and set the values
#atlas.server.ha.zookeeper.acl=<scheme>:<id>
#atlas.server.ha.zookeeper.auth=<scheme>:<authinfo>
Atlas Authorization
atlas.authorizer.impl=simple
atlas.authorizer.simple.authz.policy.file=atlas-simple-authz-policy.json
Type Cache Implementation #
# A type cache class which implements
# org.apache.atlas.typesystem.types.cache.TypeCache.
# The default implementation is org.apache.atlas.typesystem.types.cache.DefaultTypeCache which is a local in-memory type cache.
#atlas.TypeCache.impl=
Performance Configs
#atlas.graph.storage.lock.retries=10
#atlas.graph.storage.cache.db-cache-time=120000
CSRF Configs
atlas.rest-csrf.enabled=true
atlas.rest-csrf.browser-useragents-regex=^Mozilla.*,^Opera.*,^Chrome.*
atlas.rest-csrf.methods-to-ignore=GET,OPTIONS,HEAD,TRACE
atlas.rest-csrf.custom-header=X-XSRF-HEADER
KNOX Configs
#atlas.sso.knox.browser.useragent=Mozilla,Chrome,Opera
#atlas.sso.knox.enabled=true
#atlas.sso.knox.providerurl=https://<knox gateway ip>:8443/gateway/knoxsso/api/v1/websso
#atlas.sso.knox.publicKey=
Atlas Metric/Stats configs
# Format: atlas.metric.query.<key>.<name>
atlas.metric.query.cache.ttlInSecs=900
#atlas.metric.query.general.typeCount=
#atlas.metric.query.general.typeUnusedCount=
#atlas.metric.query.general.entityCount=
#atlas.metric.query.general.tagCount=
#atlas.metric.query.general.entityDeleted=
#
#atlas.metric.query.entity.typeEntities=
#atlas.metric.query.entity.entityTagged=
#
#atlas.metric.query.tags.entityTags=
Compiled Query Cache Configuration
# The size of the compiled query cache. Older queries will be evicted from the cache
# when we reach the capacity.
#atlas.CompiledQueryCache.capacity=1000
# Allows notifications when items are evicted from the compiled query
# cache because it has become full. A warning will be issued when
# the specified number of evictions have occurred. If the eviction
# warning threshold <= 0, no eviction warnings will be issued.
#atlas.CompiledQueryCache.evictionWarningThrottle=0
Full Text Search Configuration
#Set to false to disable full text search.
#atlas.search.fulltext.enable=true
Gremlin Search Configuration
#Set to false to disable gremlin search.
atlas.search.gremlin.enable=false
Add http headers
#atlas.headers.Access-Control-Allow-Origin=*
#atlas.headers.Access-Control-Allow-Methods=GET,OPTIONS,HEAD,PUT,POST
#atlas.headers.<headerName>=<headerValue>
UI Configuration #
atlas.ui.default.version=v1
atlas.hook.hive.synchronous=false
atlas.hook.hive.numRetries=3
atlas.hook.hive.queueSize=10000
atlas.cluster.name=primary
3.4 配置atlas log4j 解开注释
<appender name="perf_appender" class="org.apache.log4j.DailyRollingFileAppender"> <param name="file" value="${atlas.log.dir}/atlas_perf.log" /> <param name="datePattern" value="'.'yyyy-MM-dd" /> <param name="append" value="true" /> <layout class="org.apache.log4j.PatternLayout"> <param name="ConversionPattern" value="%d|%t|%m%n" /> </layout> </appender> <logger name="org.apache.atlas.perf" additivity="false"> <level value="debug" /> <appender-ref ref="perf_appender" /> </logger>
3.5 配置CDH-HBASE config文件
ln -s /etc/hbase/conf/ /opt/apache-atlas-2.1.0/conf/hbase
3.6 配置CDH-SORL config文件以及collection
配置atlas配置文件 将/opt/apache-atlas-2.1.0/conf/solr文件夹复制到CDH parcels中的sorl目录
cp -r /opt/apache-atlas-2.1.0/conf/solr/ /opt/cloudera/parcels/CDH/lib/solr/atlas-solr
建立sorl collection
sudo -u solr /opt/cloudera/parcels/CDH/lib/solr/bin/solr create -c vertex_index -d /opt/cloudera/parcels/CDH/lib/solr/atlas-solr -shards 3 -replicationFactor 2 sudo -u solr /opt/cloudera/parcels/CDH/lib/solr/bin/solr create -c edge_index -d /opt/cloudera/parcels/CDH/lib/solr/atlas-solr -shards 3 -replicationFactor 2 sudo -u solr /opt/cloudera/parcels/CDH/lib/solr/bin/solr create -c fulltext_index -d /opt/cloudera/parcels/CDH/lib/solr/atlas-solr -shards 3 -replicationFactor 2
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Qa7M4L7R-86)(C:\Users\Second\AppData\Roaming\Typora\typora-user-images\image-.png)]
3.7 配置CDH-KAFKA topic
/opt/cloudera/parcels/CDH/lib/kafka/bin/kafka-topics.sh --zookeeper 192.168.1.170:2181,192.168.1.171:2181,192.168.1.172:2181 --create --replication-factor 3 --partitions 3 --topic _HOATLASOK /opt/cloudera/parcels/CDH/lib/kafka/bin/kafka-topics.sh --zookeeper 192.168.1.170:2181,192.168.1.171:2181,192.168.1.172:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_ENTITIES /opt/cloudera/parcels/CDH/lib/kafka/bin/kafka-topics.sh --zookeeper 192.168.1.170:2181,192.168.1.171:2181,192.168.1.172:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_HOOK
3.8 启动atlas
cd /opt/apache-atlas-2.1.0/bin ./atlas_start.py
starting atlas on host localhost starting atlas on port 21000 .................................................................................................... Apache Atlas Server started!!!
今天的文章
altas2.1.0编译、安装、集成CDH6.3.2分享到此就结束了,感谢您的阅读。
版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。
如需转载请保留出处:https://bianchenghao.cn/bian-cheng-ji-chu/102321.html