Process OpenPDC data with Hadoop 2.0

hinfsynz · August 2, 2017, 5:48pm

Hi,

We’re currently doing a pilot project to build up our OpenPDC hadoop platform for big data analytics on archived PMU data. Basically, we are working on top of the project code developed by Josh Patterson in 2009 (https://github.com/GridProtectionAlliance/openPDC/blob/master/Source/Documentation/wiki/Developers_Using_Hadoop.md)

With the open source code and related documentation developed by Josh, I was able to set up the hadoop cluster and launch the test run. However, the code was developed a few years ago based on the Hadoop 1.0 framework. Some of the functionalities now are outdated and doesn’t work anymore. I searched many documents and helps online to modify the code such that it can run perfectly on the Hadoop 2.0 framework using Yarn. Unfortunately, my revision didn’t work as what I expect. The job submitted to Yarn can be executed but no output was generated.

I’m wondering if you know who should I ask for help about this issue?
Thank you very much,

Song

ritchiecarroll · August 3, 2017, 7:23pm

Hi Song - this portion of the code base hasn’t been used for a while, so it’s not actively maintained. Perhaps you’d like to give it a refresh?

hinfsynz · August 4, 2017, 7:28pm

Hey Ritchie,

I’m working on this code and hopefully I can figure out the fix soon.

By the way, how to submit the code based on the new framework? Does it need a code review by GPA?

Thanks,
Song

ritchiecarroll · August 4, 2017, 7:46pm

If you fork the GSF repository and create a new “pull request” with your updates, we will review.

Thanks!
Ritchie

hinfsynz · August 11, 2017, 1:18pm

My problem has been resolved. It turns out to be an issue of outdated Java method.

I will give an update soon.