Technologies / Skills:
Advanced SQL, Python and associated libraries like Pandas, Numpy etc., Pyspark , Shell scripting, Data- Modelling, Big data, Hadoop, Hive, ETL pipelines and IaC tools like Terraform etc.
Responsibilities:
Efficient communication skills to coordinate with users, technical teams and DataSolution architects.
Document technical design documents for given requirements or JIRA stories.
Communicate results and business impacts of insight initiatives to key stakeholders to collaboratively solve business problems.
Working closely with the overall Enterprise Data & Analytics Architect and Engineering practice leads to ensure adherence with the best practices and design principles.
Assures quality, security and compliance requirements are met for supported area.
Develop fault-tolerance data pipelines running on cluster
Ability to come up with scalable and modular solutions
Required Qualification:
1-8 yrs of hands-on experience developing data pipelines for Data Ingestion or transformation using Python (PySpark) /Spark SQL in AWS cloud
Experience in development of data pipelines and processing of data at scale using technologies like EMR, Lambda, Glue, Athena, Redshift, Step Functions.
Advanced experience in writing and optimizing efficient SQL queries with Python and Hive handling Large Data Sets in Big-Data Environments
Experience in debugging, tunning and optimizing PySpark data pipelines
Should have implemented concepts and have good knowledge of Pyspark data frames, joins, partitioning, parallelism etc.
Understanding of Spark UI, Event Timelines, DAG, Spark config parameters, in order to tune the long running data pipelines.
Experience working in Agile implementations
Experience with Git and CI/CD pipelines to deploy cloud applications
Good knowledge of designing Hive tables with partitioning for performance
Thanks and Regards
HR TEAM