Abyres Holding Sdn Bhd

Job Description :

  • Lead and coordinate project team members in project delivery
  • Involve in technical sales processes and delivery process design and standardization
  • Gather user requirements and construct UML diagrams for requirements modelling
  • Gather and process raw data at scale (including writing scripts, web scraping, calling APIs, writing SQL queries, etc)
  • Design, develop & document data pipeline and analysis programs using Hadoop and related ecosystem tools such as Hive & Spark
  • Design, develop & document predictive models utilizing tools included in Hadoop cluster such as Spark MLlib
  • Design, develop & document data ingestion, data pre-preparation, data cleansing and data standardization rules to prepare datasets for analysis, and ensure the process are executed in optimized and timely manner.
  • Design, develop & document methods to transform unstructured datasets such as text, audio and video to structured attributes.
  • Design, develop & document data processing workflow and governance rules on Python, Airflow and Ranger
  • Design, develop & document RESTful Web API and Web Applications for productization of data pipeline and processing workflow on Python.
  • Conduct requirement gathering to understand customer needs and as-is data ecosystem
  • Work with subject matter experts to translate domain knowledge into data processing pipelines and data products
  • Design, develop & document data products such as web based visualization dashboards and data collection applications/services.
  • Design, deploy, manage & document data processing infrastructure both on-site and on-cloud
  • Design and develop automated unit test scripts for developed softwares

Required Skills :

  • 5+ years’ experience in software development projects or ETL / data warehousing / master data management projects
  • Experience in system development lifecycle, either professionally or as hobby.
  • Programming knowledge to clean, and scrub noisy datasets.
  • Self-driven and able to take own initiative to learn and explore
  • Capable of picking up new technologies and practices in a rapid manner
  • Solid foundation in mathematical and algorithmic thinking
  • Strong background and experience in statistics is a plus
  • Background in UML modeling is a plus
  • Team player

Additional Information :

  • Coaching and self-paced training materials will be provided.
  • Join a high energy team, which includes several Open Source contributors working towards transforming the local IT industry through Open Source technologies.
  • Opportunity to work with multiple high-demand Open Source technologies in the Big Data market which includes the following technologies’ ecosystems
  • Hadoop, Hive, Spark, Python (Morepath, Flask, Celery, Scikit, Superset, PySpark, Buildbot, SQLAlchemy, Py.test), PostgreSQL, Druid, HBase, RabbitMQ, Ansible, Docker, Kubernetes, HAProxy, S3