Key responsibilities:
Working with clients to understand their data.
Basedontheunderstanding you will be building the data structures and pipelines.
You will be working on the application from end to end collaborating with UI and other development teams.
You will be responsible for building the data pipelines to migrate and load the data into the HDFSeither on-prem or in the cloud.
Developing Data ingestion/process/integration pipelines effectively.
Creating Hive data structures,metadata and loading the data into data lakes / Big Data warehouse environments.
Optimized(Performance tuning) many data pipelines effectively to minimize cost.
Codeversioning control and git repository is up to date.
You will be responsible for building and maintaining CI/CD of the data pipelines.
You will be managing the unit testing of all data pipelines
Skills & Experience:
Bachelor’s degree in computer science or related field.
Minimum of 5+years working experience with Spark, Hadoop eco systems.
Minimum of 4+years working experience on designing data streaming pipelines.
Should be an expert in either Python/Scala/Java.
Should have experience in Data Ingestion and Integration into data lake using hadoop ecosystem tools such as Sqoop, Spark, SQL, Hive, Airflow, etc..
Should have experience optimizing (Performance tuning) data pipelines.
Minimum experience of 3+ years on NoSQL and Spark Streaming.
Knowledge of Kubernetes and Docker is a plus.
Should have experience with Cloud services either Azure/AWS.
Should have experience with on-prem distribution such as Cloudera/HortonWorks/MapR.
Basic understanding of CI/CD pipelines.
Basic knowledge of Linux environment and commands.