Job Location is Bangalore not Ahmedabad
Responsibilities:
Collaborate with cross-functional teams to define the data architecture, data models, and data flow in our data warehouse, and to understand their data requirements and provide technical support as needed.
Design, develop, and maintain scalable data pipelines using PySpark and AWS
Implement data quality assurance practices to ensure data accuracy, reliability, and integrity by implementing data validation, cleansing, and transformation processes.
Create and maintain documentation of the data architecture, data models, and data pipelines to ensure understanding and efficient collaboration among team members.
Maximise the performance and scalability of the data warehouse in AWS for each unit of computation and storage cost.
Optimise and tune data pipelines for performance and scalability.
Implement data governance policies, data security and privacy best practices to protect sensitive information and comply with relevant regulations.
Work closely with other team members to troubleshoot and resolve data-related issues.
Continuously monitor the performance of the data infrastructure and make improvements as needed to ensure optimal efficiency and reliability.
Stay up-to-date with the latest industry trends and advancements in data architecture and technologies to drive innovation and maintain a competitive edge.
Requirements:
Bachelor's or Master's degree in Computer Science, Data Science, Statistics, or a related field.
5+ years of experience in data architecture and data engineering roles, preferably on AWS.
Strong proficiency in PySpark, Python, SQL, or other big data technologies.
Strong experience with Data Warehouses
Experience with DevOps and version control practices and tools such as Jenkins or GitLab