#记录一个坑#
在Spark中,有时需要对lzo压缩文件的读取。这里采用的是newAPIHadoopFile()来进行读取
val configuration = new Configuration()
configuration.set("io.compression.codecs",
"org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,com.hadoop.compression.lzo.LzopCodec")
configuration.set("io.compression.codec.lzo.class",
"com.hadoop.compression.lzo.LzoCodec") //从hdfs中读取数据 val lines: RDD[String] =
sc.newAPIHadoopFile(path, classOf[LzoTextInputFormat], classOf[LongWritable],
classOf[Text], configuration).map(x => x._2.toString) //获得到rdd
所需要的jar包:hadoop-lzo-0.4.20-SNAPSHOT.jar在你安装的hadoop路径下 share\hadoop\common\lib
不行的话下载这个试试:https://download.csdn.net/download/ice_kind/10320246
热门工具 换一换