shark/spark 在查询表时抛出 NPE

shark/spark throws NPE when querying a table

shark/spark wiki 的开发部分非常简短,因此我尝试编写代码以编程方式查询表。这是……

object Test extends App {

 val master ="spark://localhost.localdomain:8084"

 val jobName ="scratch"



 val sparkHome ="/home/shengc/Downloads/software/spark-0.6.1"

 val executorEnvVars = Map[String, String](

 "SPARK_MEM" ->"1g",

 "SPARK_CLASSPATH" ->"",

 "HADOOP_HOME" ->"/home/shengc/Downloads/software/hadoop-0.20.205.0",

 "JAVA_HOME" ->"/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64",

 "HIVE_HOME" ->"/home/shengc/Downloads/software/hive-0.9.0-bin"

 )



 val sc = new shark.SharkContext(master, jobName, sparkHome, Nil, executorEnvVars) 

 sc.sql2console("create table src")

 sc.sql2console("load data local inpath '/home/shengc/Downloads/software/hive-0.9.0-bin/examples/files/kv1.txt' into table src")

 sc.sql2console("select count(1) from src")

}

13/01/06 17:33:20 INFO execution.SparkTask: Executing shark.execution.SparkTask

13/01/06 17:33:20 INFO shark.SharkEnv: Initializing SharkEnv

13/01/06 17:33:20 INFO execution.SparkTask: Adding jar file:///home/shengc/workspace/shark/hive/lib/hive-builtins-0.9.0.jar

java.lang.NullPointerException

  at shark.execution.SparkTask$$anonfun$execute$5.apply(SparkTask.scala:58)

  at shark.execution.SparkTask$$anonfun$execute$5.apply(SparkTask.scala:55)

  at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34)

  at scala.collection.mutable.ArrayOps.foreach(ArrayOps.scala:38)

  at shark.execution.SparkTask.execute(SparkTask.scala:55)

  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:134)

  at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)

  at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1326)

  at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1118)

  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951)

  at shark.SharkContext.sql(SharkContext.scala:58)

  at shark.SharkContext.sql2console(SharkContext.scala:84)

  at Test$delayedInit$body.apply(Test.scala:20)

  at scala.Function0$class.apply$mcV$sp(Function0.scala:34)

  at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)

  at scala.App$$anonfun$main$1.apply(App.scala:60)

  at scala.App$$anonfun$main$1.apply(App.scala:60)

  at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)

  at scala.collection.immutable.List.foreach(List.scala:76)

  at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:30)

  at scala.App$class.main(App.scala:60)

  at Test$.main(Test.scala:4)

  at Test.main(Test.scala)

FAILED: Execution Error, return code -101 from shark.execution.SparkTask13/01/06 17:33:20 ERROR ql.Driver: FAILED: Execution Error, return code -101 from shark.execution.SparkTask

13/01/06 17:33:20 INFO ql.Driver: </PERFLOG method=Driver.execute start=1357511600030 end=1357511600054 duration=24>

13/01/06 17:33:20 INFO ql.Driver: <PERFLOG method=releaseLocks>

13/01/06 17:33:20 INFO ql.Driver: </PERFLOG method=releaseLocks start=1357511600054 end=1357511600054 duration=0>
shark.SharkEnv.sc = sc
13/01/06 18:09:34 INFO cluster.TaskSetManager: Lost TID 1 (task 1.0:1)

13/01/06 18:09:34 INFO cluster.TaskSetManager: Loss was due to java.lang.ClassNotFoundException: shark.execution.TableScanOperator$$anonfun$preprocessRdd$3

  at java.net.URLClassLoader$1.run(URLClassLoader.java:217)

  at java.security.AccessController.doPrivileged(Native Method)

  at java.net.URLClassLoader.findClass(URLClassLoader.java:205)

  at java.lang.ClassLoader.loadClass(ClassLoader.java:321)

  at java.lang.ClassLoader.loadClass(ClassLoader.java:266)

  at java.lang.Class.forName0(Native Method)

  at java.lang.Class.forName(Class.java:264)
System.setProperty("MASTER","spark://localhost.localdomain:8084")

System.setProperty("SPARK_MEM","1g")

System.setProperty("SPARK_CLASSPATH","")

System.setProperty("HADOOP_HOME","/home/shengc/Downloads/software/hadoop-0.20.205.0")

System.setProperty("JAVA_HOME","/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64")

System.setProperty("HIVE_HOME","/home/shengc/Downloads/software/hive-0.9.0-bin")

System.setProperty("SCALA_HOME","/home/shengc/Downloads/software/scala-2.9.2")



shark.SharkEnv.initWithSharkContext("scratch")

val sc = shark.SharkEnv.sc.asInstanceOf[shark.SharkContext]



sc.sql2console("select * from src")

// SharkContext

val sc = new SharkContext(master, 

  jobName, 

  System.getenv("SPARK_HOME"), 

  Nil, 

  executorEnvVar)



// Initialize SharkEnv

SharkEnv.sc = sc



// create and populate table

sc.runSql("CREATE TABLE src(key INT, value STRING)")

sc.runSql("LOAD DATA LOCAL INPATH '${env:HIVE_HOME}/examples/files/kv1.txt' INTO TABLE src")



// print result to stdout

println(sc.runSql("select * from src"))

println(sc.runSql("select count(*) from src"))

我可以创建表 src 并将数据加载到 src 中,但是最后一个查询抛出 NPE 并且失败,这是输出...

object Test extends App {

 val master ="spark://localhost.localdomain:8084"

 val jobName ="scratch"



 val sparkHome ="/home/shengc/Downloads/software/spark-0.6.1"

 val executorEnvVars = Map[String, String](

 "SPARK_MEM" ->"1g",

 "SPARK_CLASSPATH" ->"",

 "HADOOP_HOME" ->"/home/shengc/Downloads/software/hadoop-0.20.205.0",

 "JAVA_HOME" ->"/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64",

 "HIVE_HOME" ->"/home/shengc/Downloads/software/hive-0.9.0-bin"

 )



 val sc = new shark.SharkContext(master, jobName, sparkHome, Nil, executorEnvVars) 

 sc.sql2console("create table src")

 sc.sql2console("load data local inpath '/home/shengc/Downloads/software/hive-0.9.0-bin/examples/files/kv1.txt' into table src")

 sc.sql2console("select count(1) from src")

}

13/01/06 17:33:20 INFO execution.SparkTask: Executing shark.execution.SparkTask

13/01/06 17:33:20 INFO shark.SharkEnv: Initializing SharkEnv

13/01/06 17:33:20 INFO execution.SparkTask: Adding jar file:///home/shengc/workspace/shark/hive/lib/hive-builtins-0.9.0.jar

java.lang.NullPointerException

  at shark.execution.SparkTask$$anonfun$execute$5.apply(SparkTask.scala:58)

  at shark.execution.SparkTask$$anonfun$execute$5.apply(SparkTask.scala:55)

  at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34)

  at scala.collection.mutable.ArrayOps.foreach(ArrayOps.scala:38)

  at shark.execution.SparkTask.execute(SparkTask.scala:55)

  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:134)

  at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)

  at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1326)

  at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1118)

  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951)

  at shark.SharkContext.sql(SharkContext.scala:58)

  at shark.SharkContext.sql2console(SharkContext.scala:84)

  at Test$delayedInit$body.apply(Test.scala:20)

  at scala.Function0$class.apply$mcV$sp(Function0.scala:34)

  at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)

  at scala.App$$anonfun$main$1.apply(App.scala:60)

  at scala.App$$anonfun$main$1.apply(App.scala:60)

  at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)

  at scala.collection.immutable.List.foreach(List.scala:76)

  at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:30)

  at scala.App$class.main(App.scala:60)

  at Test$.main(Test.scala:4)

  at Test.main(Test.scala)

FAILED: Execution Error, return code -101 from shark.execution.SparkTask13/01/06 17:33:20 ERROR ql.Driver: FAILED: Execution Error, return code -101 from shark.execution.SparkTask

13/01/06 17:33:20 INFO ql.Driver: </PERFLOG method=Driver.execute start=1357511600030 end=1357511600054 duration=24>

13/01/06 17:33:20 INFO ql.Driver: <PERFLOG method=releaseLocks>

13/01/06 17:33:20 INFO ql.Driver: </PERFLOG method=releaseLocks start=1357511600054 end=1357511600054 duration=0>
shark.SharkEnv.sc = sc
13/01/06 18:09:34 INFO cluster.TaskSetManager: Lost TID 1 (task 1.0:1)

13/01/06 18:09:34 INFO cluster.TaskSetManager: Loss was due to java.lang.ClassNotFoundException: shark.execution.TableScanOperator$$anonfun$preprocessRdd$3

  at java.net.URLClassLoader$1.run(URLClassLoader.java:217)

  at java.security.AccessController.doPrivileged(Native Method)

  at java.net.URLClassLoader.findClass(URLClassLoader.java:205)

  at java.lang.ClassLoader.loadClass(ClassLoader.java:321)

  at java.lang.ClassLoader.loadClass(ClassLoader.java:266)

  at java.lang.Class.forName0(Native Method)

  at java.lang.Class.forName(Class.java:264)
System.setProperty("MASTER","spark://localhost.localdomain:8084")

System.setProperty("SPARK_MEM","1g")

System.setProperty("SPARK_CLASSPATH","")

System.setProperty("HADOOP_HOME","/home/shengc/Downloads/software/hadoop-0.20.205.0")

System.setProperty("JAVA_HOME","/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64")

System.setProperty("HIVE_HOME","/home/shengc/Downloads/software/hive-0.9.0-bin")

System.setProperty("SCALA_HOME","/home/shengc/Downloads/software/scala-2.9.2")



shark.SharkEnv.initWithSharkContext("scratch")

val sc = shark.SharkEnv.sc.asInstanceOf[shark.SharkContext]



sc.sql2console("select * from src")

// SharkContext

val sc = new SharkContext(master, 

  jobName, 

  System.getenv("SPARK_HOME"), 

  Nil, 

  executorEnvVar)



// Initialize SharkEnv

SharkEnv.sc = sc



// create and populate table

sc.runSql("CREATE TABLE src(key INT, value STRING)")

sc.runSql("LOAD DATA LOCAL INPATH '${env:HIVE_HOME}/examples/files/kv1.txt' INTO TABLE src")



// print result to stdout

println(sc.runSql("select * from src"))

println(sc.runSql("select count(*) from src"))

但是,我可以通过在 bin/shark-withinfo

调用的 shell 中输入 select * from src 来查询 src 表

您可能会问我如何在由"bin/shark-shell"触发的 shell 中尝试该 sql。好吧,我无法进入那个shell。这是我遇到的错误...

https://groups.google.com/forum/?fromgroups=#!topic/shark-users/glZzrUfabGc

[EDIT 1]:这个 NPE 似乎是由 SharkENV.sc 导致的,所以我添加了

object Test extends App {

 val master ="spark://localhost.localdomain:8084"

 val jobName ="scratch"



 val sparkHome ="/home/shengc/Downloads/software/spark-0.6.1"

 val executorEnvVars = Map[String, String](

 "SPARK_MEM" ->"1g",

 "SPARK_CLASSPATH" ->"",

 "HADOOP_HOME" ->"/home/shengc/Downloads/software/hadoop-0.20.205.0",

 "JAVA_HOME" ->"/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64",

 "HIVE_HOME" ->"/home/shengc/Downloads/software/hive-0.9.0-bin"

 )



 val sc = new shark.SharkContext(master, jobName, sparkHome, Nil, executorEnvVars) 

 sc.sql2console("create table src")

 sc.sql2console("load data local inpath '/home/shengc/Downloads/software/hive-0.9.0-bin/examples/files/kv1.txt' into table src")

 sc.sql2console("select count(1) from src")

}

13/01/06 17:33:20 INFO execution.SparkTask: Executing shark.execution.SparkTask

13/01/06 17:33:20 INFO shark.SharkEnv: Initializing SharkEnv

13/01/06 17:33:20 INFO execution.SparkTask: Adding jar file:///home/shengc/workspace/shark/hive/lib/hive-builtins-0.9.0.jar

java.lang.NullPointerException

  at shark.execution.SparkTask$$anonfun$execute$5.apply(SparkTask.scala:58)

  at shark.execution.SparkTask$$anonfun$execute$5.apply(SparkTask.scala:55)

  at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34)

  at scala.collection.mutable.ArrayOps.foreach(ArrayOps.scala:38)

  at shark.execution.SparkTask.execute(SparkTask.scala:55)

  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:134)

  at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)

  at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1326)

  at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1118)

  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951)

  at shark.SharkContext.sql(SharkContext.scala:58)

  at shark.SharkContext.sql2console(SharkContext.scala:84)

  at Test$delayedInit$body.apply(Test.scala:20)

  at scala.Function0$class.apply$mcV$sp(Function0.scala:34)

  at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)

  at scala.App$$anonfun$main$1.apply(App.scala:60)

  at scala.App$$anonfun$main$1.apply(App.scala:60)

  at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)

  at scala.collection.immutable.List.foreach(List.scala:76)

  at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:30)

  at scala.App$class.main(App.scala:60)

  at Test$.main(Test.scala:4)

  at Test.main(Test.scala)

FAILED: Execution Error, return code -101 from shark.execution.SparkTask13/01/06 17:33:20 ERROR ql.Driver: FAILED: Execution Error, return code -101 from shark.execution.SparkTask

13/01/06 17:33:20 INFO ql.Driver: </PERFLOG method=Driver.execute start=1357511600030 end=1357511600054 duration=24>

13/01/06 17:33:20 INFO ql.Driver: <PERFLOG method=releaseLocks>

13/01/06 17:33:20 INFO ql.Driver: </PERFLOG method=releaseLocks start=1357511600054 end=1357511600054 duration=0>
shark.SharkEnv.sc = sc
13/01/06 18:09:34 INFO cluster.TaskSetManager: Lost TID 1 (task 1.0:1)

13/01/06 18:09:34 INFO cluster.TaskSetManager: Loss was due to java.lang.ClassNotFoundException: shark.execution.TableScanOperator$$anonfun$preprocessRdd$3

  at java.net.URLClassLoader$1.run(URLClassLoader.java:217)

  at java.security.AccessController.doPrivileged(Native Method)

  at java.net.URLClassLoader.findClass(URLClassLoader.java:205)

  at java.lang.ClassLoader.loadClass(ClassLoader.java:321)

  at java.lang.ClassLoader.loadClass(ClassLoader.java:266)

  at java.lang.Class.forName0(Native Method)

  at java.lang.Class.forName(Class.java:264)
System.setProperty("MASTER","spark://localhost.localdomain:8084")

System.setProperty("SPARK_MEM","1g")

System.setProperty("SPARK_CLASSPATH","")

System.setProperty("HADOOP_HOME","/home/shengc/Downloads/software/hadoop-0.20.205.0")

System.setProperty("JAVA_HOME","/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64")

System.setProperty("HIVE_HOME","/home/shengc/Downloads/software/hive-0.9.0-bin")

System.setProperty("SCALA_HOME","/home/shengc/Downloads/software/scala-2.9.2")



shark.SharkEnv.initWithSharkContext("scratch")

val sc = shark.SharkEnv.sc.asInstanceOf[shark.SharkContext]



sc.sql2console("select * from src")

// SharkContext

val sc = new SharkContext(master, 

  jobName, 

  System.getenv("SPARK_HOME"), 

  Nil, 

  executorEnvVar)



// Initialize SharkEnv

SharkEnv.sc = sc



// create and populate table

sc.runSql("CREATE TABLE src(key INT, value STRING)")

sc.runSql("LOAD DATA LOCAL INPATH '${env:HIVE_HOME}/examples/files/kv1.txt' INTO TABLE src")



// print result to stdout

println(sc.runSql("select * from src"))

println(sc.runSql("select count(*) from src"))

就在执行任何 sql2console 操作之前。然后它抱怨 scala.tools.nsc 的 ClassNotFoundException,所以我手动将 scala-compiler 放在类路径中。之后,代码又抱怨了另一个 ClassNotFoundException,我不知道如何修复它,因为我确实将鲨鱼罐放在类路径中。

object Test extends App {

 val master ="spark://localhost.localdomain:8084"

 val jobName ="scratch"



 val sparkHome ="/home/shengc/Downloads/software/spark-0.6.1"

 val executorEnvVars = Map[String, String](

 "SPARK_MEM" ->"1g",

 "SPARK_CLASSPATH" ->"",

 "HADOOP_HOME" ->"/home/shengc/Downloads/software/hadoop-0.20.205.0",

 "JAVA_HOME" ->"/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64",

 "HIVE_HOME" ->"/home/shengc/Downloads/software/hive-0.9.0-bin"

 )



 val sc = new shark.SharkContext(master, jobName, sparkHome, Nil, executorEnvVars) 

 sc.sql2console("create table src")

 sc.sql2console("load data local inpath '/home/shengc/Downloads/software/hive-0.9.0-bin/examples/files/kv1.txt' into table src")

 sc.sql2console("select count(1) from src")

}

13/01/06 17:33:20 INFO execution.SparkTask: Executing shark.execution.SparkTask

13/01/06 17:33:20 INFO shark.SharkEnv: Initializing SharkEnv

13/01/06 17:33:20 INFO execution.SparkTask: Adding jar file:///home/shengc/workspace/shark/hive/lib/hive-builtins-0.9.0.jar

java.lang.NullPointerException

  at shark.execution.SparkTask$$anonfun$execute$5.apply(SparkTask.scala:58)

  at shark.execution.SparkTask$$anonfun$execute$5.apply(SparkTask.scala:55)

  at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34)

  at scala.collection.mutable.ArrayOps.foreach(ArrayOps.scala:38)

  at shark.execution.SparkTask.execute(SparkTask.scala:55)

  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:134)

  at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)

  at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1326)

  at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1118)

  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951)

  at shark.SharkContext.sql(SharkContext.scala:58)

  at shark.SharkContext.sql2console(SharkContext.scala:84)

  at Test$delayedInit$body.apply(Test.scala:20)

  at scala.Function0$class.apply$mcV$sp(Function0.scala:34)

  at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)

  at scala.App$$anonfun$main$1.apply(App.scala:60)

  at scala.App$$anonfun$main$1.apply(App.scala:60)

  at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)

  at scala.collection.immutable.List.foreach(List.scala:76)

  at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:30)

  at scala.App$class.main(App.scala:60)

  at Test$.main(Test.scala:4)

  at Test.main(Test.scala)

FAILED: Execution Error, return code -101 from shark.execution.SparkTask13/01/06 17:33:20 ERROR ql.Driver: FAILED: Execution Error, return code -101 from shark.execution.SparkTask

13/01/06 17:33:20 INFO ql.Driver: </PERFLOG method=Driver.execute start=1357511600030 end=1357511600054 duration=24>

13/01/06 17:33:20 INFO ql.Driver: <PERFLOG method=releaseLocks>

13/01/06 17:33:20 INFO ql.Driver: </PERFLOG method=releaseLocks start=1357511600054 end=1357511600054 duration=0>
shark.SharkEnv.sc = sc
13/01/06 18:09:34 INFO cluster.TaskSetManager: Lost TID 1 (task 1.0:1)

13/01/06 18:09:34 INFO cluster.TaskSetManager: Loss was due to java.lang.ClassNotFoundException: shark.execution.TableScanOperator$$anonfun$preprocessRdd$3

  at java.net.URLClassLoader$1.run(URLClassLoader.java:217)

  at java.security.AccessController.doPrivileged(Native Method)

  at java.net.URLClassLoader.findClass(URLClassLoader.java:205)

  at java.lang.ClassLoader.loadClass(ClassLoader.java:321)

  at java.lang.ClassLoader.loadClass(ClassLoader.java:266)

  at java.lang.Class.forName0(Native Method)

  at java.lang.Class.forName(Class.java:264)
System.setProperty("MASTER","spark://localhost.localdomain:8084")

System.setProperty("SPARK_MEM","1g")

System.setProperty("SPARK_CLASSPATH","")

System.setProperty("HADOOP_HOME","/home/shengc/Downloads/software/hadoop-0.20.205.0")

System.setProperty("JAVA_HOME","/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64")

System.setProperty("HIVE_HOME","/home/shengc/Downloads/software/hive-0.9.0-bin")

System.setProperty("SCALA_HOME","/home/shengc/Downloads/software/scala-2.9.2")



shark.SharkEnv.initWithSharkContext("scratch")

val sc = shark.SharkEnv.sc.asInstanceOf[shark.SharkContext]



sc.sql2console("select * from src")

// SharkContext

val sc = new SharkContext(master, 

  jobName, 

  System.getenv("SPARK_HOME"), 

  Nil, 

  executorEnvVar)



// Initialize SharkEnv

SharkEnv.sc = sc



// create and populate table

sc.runSql("CREATE TABLE src(key INT, value STRING)")

sc.runSql("LOAD DATA LOCAL INPATH '${env:HIVE_HOME}/examples/files/kv1.txt' INTO TABLE src")



// print result to stdout

println(sc.runSql("select * from src"))

println(sc.runSql("select count(*) from src"))

[编辑 2]:好的,我想出了另一个代码,它可以通过完全按照鲨鱼的如何初始化交互式 repl 的源代码来实现我想要的。

object Test extends App {

 val master ="spark://localhost.localdomain:8084"

 val jobName ="scratch"



 val sparkHome ="/home/shengc/Downloads/software/spark-0.6.1"

 val executorEnvVars = Map[String, String](

 "SPARK_MEM" ->"1g",

 "SPARK_CLASSPATH" ->"",

 "HADOOP_HOME" ->"/home/shengc/Downloads/software/hadoop-0.20.205.0",

 "JAVA_HOME" ->"/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64",

 "HIVE_HOME" ->"/home/shengc/Downloads/software/hive-0.9.0-bin"

 )



 val sc = new shark.SharkContext(master, jobName, sparkHome, Nil, executorEnvVars) 

 sc.sql2console("create table src")

 sc.sql2console("load data local inpath '/home/shengc/Downloads/software/hive-0.9.0-bin/examples/files/kv1.txt' into table src")

 sc.sql2console("select count(1) from src")

}

13/01/06 17:33:20 INFO execution.SparkTask: Executing shark.execution.SparkTask

13/01/06 17:33:20 INFO shark.SharkEnv: Initializing SharkEnv

13/01/06 17:33:20 INFO execution.SparkTask: Adding jar file:///home/shengc/workspace/shark/hive/lib/hive-builtins-0.9.0.jar

java.lang.NullPointerException

  at shark.execution.SparkTask$$anonfun$execute$5.apply(SparkTask.scala:58)

  at shark.execution.SparkTask$$anonfun$execute$5.apply(SparkTask.scala:55)

  at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34)

  at scala.collection.mutable.ArrayOps.foreach(ArrayOps.scala:38)

  at shark.execution.SparkTask.execute(SparkTask.scala:55)

  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:134)

  at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)

  at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1326)

  at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1118)

  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951)

  at shark.SharkContext.sql(SharkContext.scala:58)

  at shark.SharkContext.sql2console(SharkContext.scala:84)

  at Test$delayedInit$body.apply(Test.scala:20)

  at scala.Function0$class.apply$mcV$sp(Function0.scala:34)

  at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)

  at scala.App$$anonfun$main$1.apply(App.scala:60)

  at scala.App$$anonfun$main$1.apply(App.scala:60)

  at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)

  at scala.collection.immutable.List.foreach(List.scala:76)

  at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:30)

  at scala.App$class.main(App.scala:60)

  at Test$.main(Test.scala:4)

  at Test.main(Test.scala)

FAILED: Execution Error, return code -101 from shark.execution.SparkTask13/01/06 17:33:20 ERROR ql.Driver: FAILED: Execution Error, return code -101 from shark.execution.SparkTask

13/01/06 17:33:20 INFO ql.Driver: </PERFLOG method=Driver.execute start=1357511600030 end=1357511600054 duration=24>

13/01/06 17:33:20 INFO ql.Driver: <PERFLOG method=releaseLocks>

13/01/06 17:33:20 INFO ql.Driver: </PERFLOG method=releaseLocks start=1357511600054 end=1357511600054 duration=0>
shark.SharkEnv.sc = sc
13/01/06 18:09:34 INFO cluster.TaskSetManager: Lost TID 1 (task 1.0:1)

13/01/06 18:09:34 INFO cluster.TaskSetManager: Loss was due to java.lang.ClassNotFoundException: shark.execution.TableScanOperator$$anonfun$preprocessRdd$3

  at java.net.URLClassLoader$1.run(URLClassLoader.java:217)

  at java.security.AccessController.doPrivileged(Native Method)

  at java.net.URLClassLoader.findClass(URLClassLoader.java:205)

  at java.lang.ClassLoader.loadClass(ClassLoader.java:321)

  at java.lang.ClassLoader.loadClass(ClassLoader.java:266)

  at java.lang.Class.forName0(Native Method)

  at java.lang.Class.forName(Class.java:264)
System.setProperty("MASTER","spark://localhost.localdomain:8084")

System.setProperty("SPARK_MEM","1g")

System.setProperty("SPARK_CLASSPATH","")

System.setProperty("HADOOP_HOME","/home/shengc/Downloads/software/hadoop-0.20.205.0")

System.setProperty("JAVA_HOME","/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64")

System.setProperty("HIVE_HOME","/home/shengc/Downloads/software/hive-0.9.0-bin")

System.setProperty("SCALA_HOME","/home/shengc/Downloads/software/scala-2.9.2")



shark.SharkEnv.initWithSharkContext("scratch")

val sc = shark.SharkEnv.sc.asInstanceOf[shark.SharkContext]



sc.sql2console("select * from src")

// SharkContext

val sc = new SharkContext(master, 

  jobName, 

  System.getenv("SPARK_HOME"), 

  Nil, 

  executorEnvVar)



// Initialize SharkEnv

SharkEnv.sc = sc



// create and populate table

sc.runSql("CREATE TABLE src(key INT, value STRING)")

sc.runSql("LOAD DATA LOCAL INPATH '${env:HIVE_HOME}/examples/files/kv1.txt' INTO TABLE src")



// print result to stdout

println(sc.runSql("select * from src"))

println(sc.runSql("select count(*) from src"))

这很丑陋,但至少它有效。欢迎任何关于如何编写更健壮的代码的评论!!

对于希望以编程方式对 Shark 进行操作的人,请注意所有 hive 和 Shark jar 都必须在您的 CLASSPATH 中,并且 scala 编译器也必须在您的类路径中。另一个重要的事情是 hadoop 的 conf 也应该在类路径中。


我认为问题在于您的 SharkEnv 未初始化。

我正在使用 Shark 0.9.0(但我相信您也必须在 0.6.1 中初始化 SharkEnv),并且我的 SharkEnv 以以下方式初始化:

object Test extends App {

 val master ="spark://localhost.localdomain:8084"

 val jobName ="scratch"



 val sparkHome ="/home/shengc/Downloads/software/spark-0.6.1"

 val executorEnvVars = Map[String, String](

 "SPARK_MEM" ->"1g",

 "SPARK_CLASSPATH" ->"",

 "HADOOP_HOME" ->"/home/shengc/Downloads/software/hadoop-0.20.205.0",

 "JAVA_HOME" ->"/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64",

 "HIVE_HOME" ->"/home/shengc/Downloads/software/hive-0.9.0-bin"

 )



 val sc = new shark.SharkContext(master, jobName, sparkHome, Nil, executorEnvVars) 

 sc.sql2console("create table src")

 sc.sql2console("load data local inpath '/home/shengc/Downloads/software/hive-0.9.0-bin/examples/files/kv1.txt' into table src")

 sc.sql2console("select count(1) from src")

}

13/01/06 17:33:20 INFO execution.SparkTask: Executing shark.execution.SparkTask

13/01/06 17:33:20 INFO shark.SharkEnv: Initializing SharkEnv

13/01/06 17:33:20 INFO execution.SparkTask: Adding jar file:///home/shengc/workspace/shark/hive/lib/hive-builtins-0.9.0.jar

java.lang.NullPointerException

  at shark.execution.SparkTask$$anonfun$execute$5.apply(SparkTask.scala:58)

  at shark.execution.SparkTask$$anonfun$execute$5.apply(SparkTask.scala:55)

  at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34)

  at scala.collection.mutable.ArrayOps.foreach(ArrayOps.scala:38)

  at shark.execution.SparkTask.execute(SparkTask.scala:55)

  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:134)

  at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)

  at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1326)

  at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1118)

  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951)

  at shark.SharkContext.sql(SharkContext.scala:58)

  at shark.SharkContext.sql2console(SharkContext.scala:84)

  at Test$delayedInit$body.apply(Test.scala:20)

  at scala.Function0$class.apply$mcV$sp(Function0.scala:34)

  at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)

  at scala.App$$anonfun$main$1.apply(App.scala:60)

  at scala.App$$anonfun$main$1.apply(App.scala:60)

  at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)

  at scala.collection.immutable.List.foreach(List.scala:76)

  at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:30)

  at scala.App$class.main(App.scala:60)

  at Test$.main(Test.scala:4)

  at Test.main(Test.scala)

FAILED: Execution Error, return code -101 from shark.execution.SparkTask13/01/06 17:33:20 ERROR ql.Driver: FAILED: Execution Error, return code -101 from shark.execution.SparkTask

13/01/06 17:33:20 INFO ql.Driver: </PERFLOG method=Driver.execute start=1357511600030 end=1357511600054 duration=24>

13/01/06 17:33:20 INFO ql.Driver: <PERFLOG method=releaseLocks>

13/01/06 17:33:20 INFO ql.Driver: </PERFLOG method=releaseLocks start=1357511600054 end=1357511600054 duration=0>
shark.SharkEnv.sc = sc
13/01/06 18:09:34 INFO cluster.TaskSetManager: Lost TID 1 (task 1.0:1)

13/01/06 18:09:34 INFO cluster.TaskSetManager: Loss was due to java.lang.ClassNotFoundException: shark.execution.TableScanOperator$$anonfun$preprocessRdd$3

  at java.net.URLClassLoader$1.run(URLClassLoader.java:217)

  at java.security.AccessController.doPrivileged(Native Method)

  at java.net.URLClassLoader.findClass(URLClassLoader.java:205)

  at java.lang.ClassLoader.loadClass(ClassLoader.java:321)

  at java.lang.ClassLoader.loadClass(ClassLoader.java:266)

  at java.lang.Class.forName0(Native Method)

  at java.lang.Class.forName(Class.java:264)
System.setProperty("MASTER","spark://localhost.localdomain:8084")

System.setProperty("SPARK_MEM","1g")

System.setProperty("SPARK_CLASSPATH","")

System.setProperty("HADOOP_HOME","/home/shengc/Downloads/software/hadoop-0.20.205.0")

System.setProperty("JAVA_HOME","/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64")

System.setProperty("HIVE_HOME","/home/shengc/Downloads/software/hive-0.9.0-bin")

System.setProperty("SCALA_HOME","/home/shengc/Downloads/software/scala-2.9.2")



shark.SharkEnv.initWithSharkContext("scratch")

val sc = shark.SharkEnv.sc.asInstanceOf[shark.SharkContext]



sc.sql2console("select * from src")

// SharkContext

val sc = new SharkContext(master, 

  jobName, 

  System.getenv("SPARK_HOME"), 

  Nil, 

  executorEnvVar)



// Initialize SharkEnv

SharkEnv.sc = sc



// create and populate table

sc.runSql("CREATE TABLE src(key INT, value STRING)")

sc.runSql("LOAD DATA LOCAL INPATH '${env:HIVE_HOME}/examples/files/kv1.txt' INTO TABLE src")



// print result to stdout

println(sc.runSql("select * from src"))

println(sc.runSql("select count(*) from src"))

另外,尝试在不使用聚合函数的情况下从 src 表中查询数据(带有 "select count(*) ..." 的注释行),当数据查询正常时我遇到了类似的问题,但是 count(*) 抛出了异常,在我的情况下,通过将 mysql-connector-java.jar 添加到 yarn.application.classpath 来修复。


相关推荐

  • Spring部署设置openshift

    Springdeploymentsettingsopenshift我有一个问题让我抓狂了三天。我根据OpenShift帐户上的教程部署了spring-eap6-quickstart代码。我已配置调试选项,并且已将Eclipse工作区与OpehShift服务器同步-服务器上的一切工作正常,但在Eclipse中出现无法消除的错误。我有这个错误:cvc-complex-type.2.4.a:Invali…
    2025-04-161
  • 检查Java中正则表达式中模式的第n次出现

    CheckfornthoccurrenceofpatterninregularexpressioninJava本问题已经有最佳答案,请猛点这里访问。我想使用Java正则表达式检查输入字符串中特定模式的第n次出现。你能建议怎么做吗?这应该可以工作:MatchResultfindNthOccurance(intn,Patternp,CharSequencesrc){Matcherm=p.matcher…
    2025-04-161
  • 如何让 JTable 停留在已编辑的单元格上

    HowtohaveJTablestayingontheeditedcell如果有人编辑JTable的单元格内容并按Enter,则内容会被修改并且表格选择会移动到下一行。是否可以禁止JTable在单元格编辑后转到下一行?原因是我的程序使用ListSelectionListener在单元格选择上同步了其他一些小部件,并且我不想在编辑当前单元格后选择下一行。Enter的默认绑定是名为selectNext…
    2025-04-161
  • Weblogic 12c 部署

    Weblogic12cdeploy我正在尝试将我的应用程序从Tomcat迁移到Weblogic12.2.1.3.0。我能够毫无错误地部署应用程序,但我遇到了与持久性提供程序相关的运行时错误。这是堆栈跟踪:javax.validation.ValidationException:CalltoTraversableResolver.isReachable()threwanexceptionatorg.…
    2025-04-161
  • Resteasy Content-Type 默认值

    ResteasyContent-Typedefaults我正在使用Resteasy编写一个可以返回JSON和XML的应用程序,但可以选择默认为XML。这是我的方法:@GET@Path("/content")@Produces({MediaType.APPLICATION_XML,MediaType.APPLICATION_JSON})publicStringcontentListRequestXm…
    2025-04-161
  • 代码不会停止运行,在 Java 中

    thecodedoesn'tstoprunning,inJava我正在用Java解决项目Euler中的问题10,即"Thesumoftheprimesbelow10is2+3+5+7=17.Findthesumofalltheprimesbelowtwomillion."我的代码是packageprojecteuler_1;importjava.math.BigInteger;importjava…
    2025-04-161
  • Out of memory java heap space

    Outofmemoryjavaheapspace我正在尝试将大量文件从服务器发送到多个客户端。当我尝试发送大小为700mb的文件时,它显示了"OutOfMemoryjavaheapspace"错误。我正在使用Netbeans7.1.2版本。我还在属性中尝试了VMoption。但仍然发生同样的错误。我认为阅读整个文件存在一些问题。下面的代码最多可用于300mb。请给我一些建议。提前致谢publicc…
    2025-04-161
  • Log4j 记录到共享日志文件

    Log4jLoggingtoaSharedLogFile有没有办法将log4j日志记录事件写入也被其他应用程序写入的日志文件。其他应用程序可以是非Java应用程序。有什么缺点?锁定问题?格式化?Log4j有一个SocketAppender,它将向服务发送事件,您可以自己实现或使用与Log4j捆绑的简单实现。它还支持syslogd和Windows事件日志,这对于尝试将日志输出与来自非Java应用程序…
    2025-04-161