斯卡拉写图中的每个节点的邻接表到一个文本文件
问题描述:
我试图扭转向图以及每个顶点的邻接表写入一个文本文件中的格式斯卡拉写图中的每个节点的邻接表到一个文本文件
NodeId \t NeighbourId1,NeighbourId2,...,NeighbourIdn
所以到目前为止,我只尝试了打印我的输出是如下:
(4,[[email protected])
(0,[[email protected])
(1,[[email protected])
(3,[[email protected])
(2,[[email protected])
尽管它应该是以下格式:
4 2
0 4
1 0,2
3 1,2,3
2 0,1
我一直使用当前的代码是
object Problem2{
def main(args: Array[String]){
val inputFile:String = args(0)
val outputFolder = args(1)
val conf = new SparkConf().setAppName("Problem2").setMaster("local")
val sc = new SparkContext(conf)
val graph = GraphLoader.edgeListFile(sc,inputFile)
val edges = graph.reverse.edges
val vertices = graph.vertices
val newGraph = Graph(vertices,edges)
val verticesWithSuccessors: VertexRDD[Array[VertexId]] =
newGraph.ops.collectNeighborIds(EdgeDirection.Out)
val successorGraph = Graph(verticesWithSuccessors, edges)
val res = successorGraph.vertices.collect()
val adjList = successorGraph.vertices.foreach(println)
我不认为mkString()
可以用一个图形对象使用做。图形对象是否有类似的方法来获取字符串?
答
让我们再次拿这个例子:一旦你有了这个
val vertices: RDD[(VertexId, String)] =
sc.parallelize(Array((1L,""), (2L,""), (4L,""), (6L,"")))
val edges: RDD[Edge[String]] =
sc.parallelize(Array(
Edge(1L, 2L, ""),
Edge(1L, 4L, ""),
Edge(1L, 6L, "")))
val inputGraph = Graph(vertices, edges)
val verticesWithSuccessors: VertexRDD[Array[VertexId]] =
inputGraph.ops.collectNeighborIds(EdgeDirection.Out)
val successorGraph = Graph(verticesWithSuccessors, edges)
:
val adjList = successorGraph.vertices
可以转换成数据帧容易:
val df = adjList.toDF(Seq("node", "adjacents"): _*)
df.show()
+----+---------+
|node|adjacents|
+----+---------+
| 1|[2, 4, 6]|
| 2| []|
| 4| []|
| 6| []|
+----+---------+
现在很容易与改造列。这里不那么漂亮例如:
val result = df.rdd.collect().map(l=> l(0).asInstanceOf[Long] + "\t" + l(1).asInstanceOf[Seq[Long]].mkString(" "))
result.foreach(println(_))
1 2 4 6
2
4
6
或者你也可以使用UDF的尝试或者你想处理的列。
希望这会有所帮助!
为了完整起见,转换到数据帧之前,SQL上下文创建 'VAL sqlContext =新org.apache.spark.sql.SQLContext(SC)'' 进口sqlContext.implicits._' – Dee