1.介绍

ignite在ignite简介一文中已经做了介绍。其功能非常多,我们主要把其功能分为两块:

  1. In-Memory Data Fabric
  2. In-Memory Hadoop Accelerator

官方文档上也可以看到,官方也主要把ignite分成这两个较为独立的模块:

Ignite In-Memory Hadoop Accelerator(请允许我以后都简称IHA)主要是通过提供一个内存存储系统和其自己实现的MR job执行器来加速hadoop MapReduce job的执行效率。关于该功能的效果,可以查看一个视频:
30+ time faster Hadoop MapReduce application with Bigtop and Ingite

本文假设你已经成功部署安装了hadoop。或者也可以用上面视频提到的Apache Bigtop迅速部署一个hadoop环境来试验。

PS: IHA功能只支持hadoop2.2以上的版本哦~

下面开始介绍如何来安装IHA。

2. 下载源码编译

2.1 安装好hadoop和jdk并且设置好环境变量

首先安装好hadoop、jdk(建议8及以上)并且设置好环境变量。我的环境变量设置可供参考:

# hadoop settings
export HADOOP_HOME=/home/appadmin/hadoop-2.7.2
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_CLASSPATH=/home/appadmin/alluxio-1.3.0/core/client/target/alluxio-core-client-1.3.0-jar-with-dependencies.jar:$HADOOP_CLASSPATH

# ignite
export IGNITE_HOME=/home/appadmin/apache-ignite-hadoop-1.7.0-bin
export PATH=$PATH:$IGNITE_HOME/bin
export IGNITE_LIBS=$IGNITE_LIBS:"$IGNITE_HOME/libs/*":$HADOOP_CLASSPATH

2.2 下载ignite并且配置

ignite官方下载最新的In-Memory Hadoop Accelerator,例如我下载的是apache-ignite-hadoop-1.7.0-bin.zip

解压后配置下配置文件:

  1. 去掉secondaryFileSystem 这个property的注释
<bean class="org.apache.ignite.configuration.FileSystemConfiguration">

  <property name="secondaryFileSystem">
    <bean class="org.apache.ignite.hadoop.fs.IgniteHadoopIgfsSecondaryFileSystem">
      <property name="fileSystemFactory">
        <bean class="org.apache.ignite.hadoop.fs.CachingHadoopFileSystemFactory">
          <property name="uri" value="hdfs://your_hdfs_host:9000"/>
        </bean>
      </property>
    </bean>
  </property>
</bean>
  1. 添加hadoop的core-site.xml的路径
<bean class="org.apache.ignite.hadoop.fs.CachingHadoopFileSystemFactory">
  <property name="uri" value="hdfs://your_hdfs_host:9000"/>
  <property name="configPaths">
    <list>
      <value>/path/to/core-site.xml</value>
    </list>
  </property>
</bean>

2.3 复制相关hadoop的jar包到ignite的libs目录下

这一步是官方文档没有说的步骤。不知道是不是我配置的问题还是ignite的BUG。无论怎么配置CLASSPATH都发现ignite找不到hadoop相关的一些方法。一开始运行会报如下错误:

class org.apache.ignite.IgniteException: Failed to instantiate Spring XML application context (make sure all classes used in Spring configuration are present at CLASSPATH) [springUrl=file:/home/appadmin/apache-ignite-hadoop-1.7.0-bin/config/default-config.xml] 
        at org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:908) 
        at org.apache.ignite.Ignition.start(Ignition.java:350) 
        at org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:302) 
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to instantiate Spring XML application context (make sure all classes used in Spring configuration are present at CLASSPATH) [springUrl=file:/home/appadmin/apache-ignite-hadoop-1.7.0-bin/config/default-config.xml] 
        at org.apache.ignite.internal.util.spring.IgniteSpringHelperImpl.applicationContext(IgniteSpringHelperImpl.java:387) 
        at org.apache.ignite.internal.util.spring.IgniteSpringHelperImpl.loadConfigurations(IgniteSpringHelperImpl.java:104) 
        at org.apache.ignite.internal.util.spring.IgniteSpringHelperImpl.loadConfigurations(IgniteSpringHelperImpl.java:98) 
        at org.apache.ignite.internal.IgnitionEx.loadConfigurations(IgnitionEx.java:639) 
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:840) 
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:749) 
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:619) 
        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:589) 
        at org.apache.ignite.Ignition.start(Ignition.java:347) 
        ... 1 more 
Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'org.apache.ignite.configuration.FileSystemConfiguration#0' defined in URL[file:/home/appadmin/apache-ignite-hadoop-1.7.0-bin/config/default-config.xml]: Cannot create inner bean 'org.apache.ignite.hadoop.fs.IgniteHadoopIgfsSecondaryFileSystem#7a187f14' of type [org.apache.ignite.hadoop.fs.IgniteHadoopIgfsSecondaryFileSystem] while setting bean property 'secondaryFileSystem'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'org.apache.ignite.hadoop.fs.IgniteHadoopIgfsSecondaryFileSystem#7a187f14' defined in URL [file:/home/appadmin/apache-ignite-hadoop-1.7.0-bin/config/default-config.xml]: Instantiation of bean failed; nested exception is java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataOutputStream 
        at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveInnerBean(BeanDefinitionValueResolver.java:290) 
        at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveValueIfNecessary(BeanDefinitionValueResolver.java:122)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyPropertyValues(AbstractAutowireCapableBeanFactory.java:1471)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1216)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:538)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:476)
        at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:302) 
        at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:229) 
        at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:298) 
        at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:193) 
        at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:725)
        at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:757)
        at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:480) 
        at org.apache.ignite.internal.util.spring.IgniteSpringHelperImpl.applicationContext(IgniteSpringHelperImpl.java:381) 
        ... 9 more 
Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'org.apache.ignite.hadoop.fs.IgniteHadoopIgfsSecondaryFileSystem#7a187f14' defined in URL[file:/home/appadmin/apache-ignite-hadoop-1.7.0-bin/config/default-config.xml]: Instantiation of bean failed; nested exception is java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataOutputStream 
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.instantiateBean(AbstractAutowireCapableBeanFactory.java:1095)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1040)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:505)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:476)
        at org.springframework.beans.factory.support.BeanDefinitionValueResolver.resolveInnerBean(BeanDefinitionValueResolver.java:276) 
        ... 22 more 
Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataOutputStream 
        at java.lang.Class.getDeclaredConstructors0(Native Method) 
        at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671) 
        at java.lang.Class.getConstructor0(Class.java:3075) 
        at java.lang.Class.getDeclaredConstructor(Class.java:2178) 
        at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:80) 
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.instantiateBean(AbstractAutowireCapableBeanFactory.java:1088)
        ... 26 more 
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataOutputStream 
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381) 
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424) 
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) 
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357) 
        ... 32 more 
Failed to start grid: Failed to instantiate Spring XML application context (make sure all classes used in Spring configuration are present at CLASSPATH) [springUrl=file:/home/appadmin/apache-ignite-hadoop-1.7.0-bin/config/default-config.xml] 
Note! You may use 'USER_LIBS' environment variable to specify your classpath. 

暂时的解决方法就是手动拷贝下ignite启动需要依赖的hadoop jar到ignite的lib目录下。当然也可以考虑采用ln -s创建软连接。

cp $HADOOP_HOME/share/hadoop/common/hadoop-common-2.7.2.jar $IGNITE_HOME/libs
# 注意这里把/common/lib下的所有jar都拷贝过来了,但是要剔除asm-3.2.jar 这个包,否则会有包冲突的问题
cp $HADOOP_HOME/share/hadoop/common/lib/*  $IGNITE_HOME/libs
cp $HADOOP_HOME/share/hadoop/common/lib/guava-11.0.2.jar  $IGNITE_HOME/libs
cp $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.7.2.jar $IGNITE_HOME/libs
cp  $HADOOP_HOME/share/hadoop/hdfs/hadoop-hdfs-2.7.2.jar $IGNITE_HOME/libs

2.4 运行

直接执行脚本ignite.sh即可。

3. 配置hadoop

拷贝(或者像如下一样创建软连接)ignite相关的jar到hadoop的share目录下:

cd $HADOOP_HOME/share/hadoop/common/lib
ln -s $IGNITE_HOME/libs/ignite-core-[version].jar
ln -s $IGNITE_HOME/libs/ignite-shmem-1.0.0.jar
ln -s $IGNITE_HOME/libs/ignite-hadoop/ignite-hadoop-[version].jar

3.1 修改core-site.xml

在core-site.xml中添加如下内容来支持IGFS

<configuration>

  <property>
    <name>fs.igfs.impl</name>
    <value>org.apache.ignite.hadoop.fs.v1.IgniteHadoopFileSystem</value>
  </property>
  <property>
    <name>fs.AbstractFileSystem.igfs.impl</name>
    <value>org.apache.ignite.hadoop.fs.v2.IgniteHadoopFileSystem</value>
  </property> 

</configuration>

3.2 修改mapred-site.xml

<configuration>

  <property>
    <name>mapreduce.framework.name</name>
    <value>ignite</value>
  </property>
  <property>
    <name>mapreduce.jobtracker.address</name>
    <value>[your_host]:11211</value>
  </property>

</configuration>

PS:以上配置的模板可以在$IGNITE_HOME/config/hadoop下找到

3.3 使用配置文件的最佳实践

文档里面建议的方式是创建一个ignite_conf目录存放配置文件,然后在job运行的时候,使用指定的配置文件运行。

  1. 将配置文件放到独立目录
mkdir ~/ignite_conf
cd ~/ignite_conf
cp $HADOOP_HOME/etc/hadoop/core-site.xml .
cp $HADOOP_HOME/etc/hadoop/mapred-site.xml .
  1. 独立指定使用哪个配置文件
#Query IGFS:
hadoop --config ~/ignite_conf fs -ls /
#Run a job:
hadoop --config ~/ignite_conf jar [your_job]

4. 使用IGFS

现在就可以像使用hdfs一样使用igfs了。igfs上的数据会和hdfs保持一致。操作igfs可以直接使用命令:

# 默认端口是10500,不需要加。无缝整合hadoop的命令行文件操作API
hadoop fs -rm -R  igfs://10.8.12.16/output

PS: 要删除IGFS上的数据,确保HDFS上要存在,否则会报找不到路径。建议用了IGFS后之后,操作HDFS上的数据都改成直接从IGFS上操作,避免数据不一致导致一些命令行操作失败。

这里再说明下,一般访问IGFS的时候,都不需要添加端口号。你添加了10050这个端口号,反而会报错: