Re: Hadoop, pig, dynamical systems tools

Hi Brenda,

David Donoho has an extensive critical review of big data techniques in
http://courses.csail.mit.edu/18.337/2015/docs/50YearsDataScience.pdf

But I'd like to learn about HADOOP from your direct encounter, and understand how you think we can use such big data mining techniques to multi-dimensional activity / ensemble-body / gesture tracking.  It may be good to talk with Qiao to get his opinion on things.  I trust his scientific judgment most ...

And here's a curnudgeonly rant:

Glad to see you working thru all this,
Xin Wei

On Aug 8, 2016, at 8:40 PM, Brenda McCaffrey <brendamc@asu.edu> wrote:

Hi Xin Wei and friends,

Quick update:

*  Thanks to Connor for helping me get much of the Java script working within Eclipse.
*  I'm meeting with Mike K. tomorrow morning at 10am to look at the code.
*  I'm learning more about Hadoop and Apache Pig and it may be very exciting for us, although I have yet to find anyone in our realm who has knowledge of it.  Here are a few things I've learned:
  • Hadoop is an open source data management system that evolved from Google's behavioral search system about 10 years ago.
  • Hadoop is used by Amazon, Yahoo, Facebook and others to track and manage behavioral data.  For example, that's how Facebook knows where you move your mouse.
  • Hadoop allows massively parallel processing of large data sets.  It can be set up on unlimited low-cost computers to run infinitely parallel.
  • Apache Pig is a scripting language that manages Hadoop queries.
Apparently, Hadoop and Pig are primarily used in business systems.  The gentleman who wrote these dynamical systems codes (Jacob Perkins when he was at UT Austin), did so to demonstrate that one could solve complex scientific problems without massive data sets by using parallel processing.  Eureka!

I continue to be intrigued by this approach since it implies to me that we may be able to solve very complex, real-time dynamical systems problems by using relatively simple parallel computing frameworks in conjunction with Java.

I'm going to get the fundamental Java to work (I hope) to demonstrate it's functionality, but the real power of these programs is the ability to slice the simulations into partitions that can be evaluated in parallel to converge on the regions of interest.  This could be especially useful for time-domain data coming in from complex systems for which we don't know the closed form analytical models.  (Video?)

These are my thoughts for now.  I'm not a CS person so I may be completely in left field about this, but it looks promising.

Thanks for listening.
-Brenda


On Sun, Aug 7, 2016 at 4:10 AM, <shaxinwei@gmail.com> wrote:

Dr Brenda McCaffrey, PhD researcher at Synthesis, is working on a project to develop some dynamical system based tools for modeling human movement.

Can anyone help with the following query:

I’m converging on an approach for Java simulation of Lyapunov exponents based on an amazing set of programs created by a gentleman named Jacob Perkins (https://github.com/thedatachef/sounder/tree/master/udf/src/main/java/sounder/pig/chaos) that he developed using the Sprott algorithm.  This is really deep stuff and uses Hadoop and pig as well as Java.  Do we have resources in these areas?  A couple of hours with an expert would change my life!

Xin Wei