Technologies / Skills:
Advanced SQL, Python and associated libraries like Pandas, Numpy etc., Pyspark , Shell scripting, Data- Modelling, Big data, Hadoop, Hive, ETL pipelines and IaC tools like Terraform etc.
Responsibilities:
• Efficient communication skills to coordinate with users, technical teams and DataSolution architects.
• Document technical design documents for given requirements or JIRA stories.
• Communicate results and business impacts of insight initiatives to key stakeholders to collaboratively solve business problems.
• Working closely with the overall Enterprise Data & Analytics Architect and Engineering practice leads to ensure adherence with the best practices and design principles.
• Assures quality, security and compliance requirements are met for supported area.
• Develop fault-tolerance data pipelines running on cluster
• Ability to come up with scalable and modular solutions
Required Qualification:
• 1-8 yrs of hands-on experience developing data pipelines for Data Ingestion or transformation using Python (PySpark) /Spark SQL in AWS cloud
• Experience in development of data pipelines and processing of data at scale using technologies like EMR, Lambda, Glue, Athena, Redshift, Step Functions.
• Advanced experience in writing and optimizing efficient SQL queries with Python and Hive handling Large Data Sets in Big-Data Environments
• Experience in debugging, tunning and optimizing PySpark data pipelines
• Should have implemented concepts and have good knowledge of Pyspark data frames, joins, partitioning, parallelism etc.
• Understanding of Spark UI, Event Timelines, DAG, Spark config parameters, in order to tune the long running data pipelines.
• Experience working in Agile implementations
• Experience with Git and CI/CD pipelines to deploy cloud applications
• Good knowledge of designing Hive tables with partitioning for performance
Thanks and Regards
HR TEAM