Being responsible for developing data pipelines in the Big Data technology using Python (PySpark, Hadoop/HDFS, Azure, Kafka)
Implementing the connection to new source systems (e.g. Kafka, relational databases, REST API based services or file based data sources)
Ensuring implemented artefacts are covered with automated tests and are integrated in CI/CD pipeline
Work with stakeholders including the Data officers and stewards to assist with data-related technical issues and support their data infrastructure needs
Assemble large, complex data sets that meet functional / non-functional business requirements
Value the principles of the Agile working model. Continuously grow skills and support colleagues in improving theirs
Travel to local RB locations might be required
Qualifications
Education: Excellent degree in computer science, major with computational focus or a comparable qualification
Personality: Open-minded, reliable and motivated global team player who is keen to learn new technologies and share knowledge while being able to work independently. Solid responsibility feeling for deliverables.
Background:
Experience in Python / Scala and respective packages/frameworks in the area of Data Engineering (Azure, PySpark, PyTest)
Experience in the Apache Hadoop ecosystem as a developer (Spark, HDFS, Hive/Impala, Kafka)
Experience in the Azure ecosystem (Databricks)
Experience in common interfaces to data sources and data formats (Kafka, REST API, relational databases, file shares, JSON, Protocol Buffers, Parquet, JSON)
Experience in Edge computing
Familiar with using CI/CD toolchains (GIT, Jenkins)
Familiar with the usage of relational databases and usage of SQL