Job Requirements:
Participate in the customer’s system design meetings and collect the functional/technical requirements.
Build-up data pipelines for consumption by the data science team.
Skillful in ETL process and tools.
Clear understanding and experience with Python and PySpark or Spark and SCALA, with HIVE, Airflow, Impala, and Hadoop and RDBMS architecture.
Experience in writing Python programs and SQL queries.
Experience in SQL Query tuning.
Experienced in Shell Scripting(Unix/Linux).
Build and maintain data pipelines in Spark/Pyspark with SQL and Python or SCALA.
Knowledge of Cloud (Azure/AWS/GCP, etc..) technologies is additional.
Good to have knowledge of Kubernetes, CI/CD concepts, Apache Kafka
Suggest and implement best practices in data integration.
Guide the QAteam indefining system integration tests as needed.
Split the planned deliverables into tasks and assign them to the team.
Needs to Maintain/Deploy the ETL code and follow the Agile methodology
Needs to work on optimization wherever applicable.
Good oral,written and presentation skills.
Preferred Qualifications:
Degree in Computer Science, IT, or similar field; a Master’s is a plus.
Hands-on experience with Python and Pyspark
Hands-on experience with Spark and SCALA.
Great numerical and analytical skills.
Working knowledge of cloud platforms such as MS Azure, AWS, etc...
Technical expertise with data models, data mining, and segmentation techniques.