Duties:
- Develop technical architecture to enable Analytics and Data Science using industry best practices for large scale processing;
- Design, develop and implement ETL data pipelines for batch and streaming solutions;
- Research and develop distributed crawler and data acquisition system; optimize the crawling strategy and improve the crawling effect;
- Monitor data quality across data processing lifecycle;
- Cloud-based data infrastructure administration and database administration when necessary.
Background
- Bsc. in relevant disciplines with 2- 6 years' working experience in Data Engineering;
- Proficient with Python, R, Julia (at least one), SQL and Linux;
- Proficient in data pipeline development including batch and streaming processing;
- Familiar with concepts and techniques of crawling; Able to design and develop crawler system an advantage;
- Experience with cloud platforms, in particular, AWS a plus;