Skip to content

bigdata-vandy/spark-xml-parse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

spark-xml-parse

Demonstration of XML parsing using the StackOverflow data dump.

Overview

This is a simple Spark app that reads a Posts.xml input file from one of the StackExchange data dumps; the XML schema description can be found here.

The code attempts to parse one row XML element in each line; if a row is parsed, its Body, CreationDate, and ViewCount attributes are queried. For each successful parse, a compact JSON record is written onto a single line in the files of the output directory.

About

Demonstration of XML parsing using the StackOverflow data dump.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published