Responsibilities:
- Design and implement distributed data processing pipelines using Spark, Hive, Python, and other tools and languages prevalent in the Hadoop ecosystem. Ability to design and implement end to end solution.
- Build utilities, user defined functions, and frameworks to better enable data flow patterns.
- Research, evaluate and utilize new technologies/tools/frameworks centered around Hadoop and other elements in the Big Data space.
- Build and incorporate automated unit tests, participate in integration...