Lead and coordinate project team members in project delivery
Involve in technical sales processes and delivery process design and standardization
Gather user requirements and construct UML diagrams for requirements modelling
Gather and process raw data at scale (including writing scripts, web scraping, calling APIs, writing SQL queries, etc)
Design, develop & document data pipeline and analysis programs using Hadoop and related ecosystem tools such as Hive & Spark
Design, develop & document predictive models utilizing tools included in Hadoop cluster such as Spark MLlib
Design, develop & document data ingestion, data pre-preparation, data cleansing and data standardization rules to prepare datasets for analysis, and ensure the process are executed in optimized and timely manner.
Design, develop & document methods to transform unstructured datasets such as text, audio and video to structured attributes.
Design, develop & document data processing workflow and governance rules on Python, Airflow and Ranger
Design, develop & document RESTful Web API and Web Applications for productization of data pipeline and processing workflow on Python.
Conduct requirement gathering to understand customer needs and as-is data ecosystem
Work with subject matter experts to translate domain knowledge into data processing pipelines and data products
Design, develop & document data products such as web based visualization dashboards and data collection applications/services.
Design, deploy, manage & document data processing infrastructure both on-site and on-cloud
Design and develop automated unit test scripts for developed softwares
Required Skills :
5+ years’ experience in software development projects or ETL / data warehousing / master data management projects
Experience in system development lifecycle, either professionally or as hobby.
Programming knowledge to clean, and scrub noisy datasets.
Self-driven and able to take own initiative to learn and explore
Capable of picking up new technologies and practices in a rapid manner
Solid foundation in mathematical and algorithmic thinking
Strong background and experience in statistics is a plus
Background in UML modeling is a plus
Additional Information :
Coaching and self-paced training materials will be provided.
Join a high energy team, which includes several Open Source contributors working towards transforming the local IT industry through Open Source technologies.
Opportunity to work with multiple high-demand Open Source technologies in the Big Data market which includes the following technologies’ ecosystems