Yingchi | Data Miner

Skills

Experience

Apr 2019
|
Present

Indeed

Data Scientist

Applied NLP techniques (entity embeddings) with tree-based ML models to estimate job salary using structured as well as unstructured text features.

• Designed and developed the jobseeker salary inference pipeline including model (re)training with AWS SageMaker, model deployment by setting up REST and gRPC service from Python, and model monitoring with scheduled jobs.

• Build Python modules for text summarization and ranking to generate representative content items, using NLP techniques such as TextRank and Word2Vec.

• Prototype exploration and exploitation pipeline for dynamic ranking.
Jul 2018
|
Jan 2019

Bitmain

Data Scientist

Part of the btc.com team.

• Provided data insights for cryptocurrency mining platforms and blockchain explorers using Airflow scheduled Spark jobs.

• Developed a transaction fee prediction engine using Neural Networks and Generalized Linear Models, building the end-to-end process from acquiring real-time data (Python parser with Redis and MySQL) to training and evaluating models.

• Generated internal data reports using Spark SQL, Hive and graph databases like neo4j.
Jul 2017
|
Jul 2018

DataSpark

Data Scientist

Work in the application team.

• Researched on footfall analytics with telco data using machine learning algorithms such as Naive Bayes, Logistic Regression, and Random Forests. Implemented and productionized models into our data analytics platform using Python. Submitted two research papers based on that with one published..

• Designed and develop the network planning application for telco operators to reduce upgrading cost while improving customer experience. The application was built with Scala and deployed in a big-data environment with Hadoop and Spark.
Dec 2016
|
Jan 2017

ViSenze

Data Analytics Intern

• Established the pipeline of internal metrics reporting by understanding the raw data, current data management system and the requirements from various team leaders

• Produced dashboards on system and business performance to enable stakeholders to make effective decisions, using Chartio and SQL

• Assisted engineering teams in database design
June 2016
|
Nov 2016

DataSpark

Data Science Intern

• Conducted geolocation data analysis projects to undercover new features and improve model accuracy by running Hadoop and Spark jobs; implemented reproducible code using R Markdown and Python for the projects.

• Built interactive data visualizations (Web apps) using JavaScript, Node.js and React for internal and external clients.
May 2015
|
July 2015

Millward Brown

Market Research Analyst Intern

• Prepared Budweiser's 2015 Q1 report which was well received by the client; discovered unusual patterns from data and initiated deep dive research to find explanations.

• Collected and complied the consumer survey data weekly using SPSS Survey Reporter.

Education

2018
|
Present

National University of Singapore

Master of Computer Science, 4.83/5.0

Main courses taken:
Neural Networks and Deep Learning (CS5242)
Big-Data Analytics Technology (CS5344)
Phenomena and Theories of Human-Computer Interaction (CS4249)
Text Mining (CS5246)
Knowledge Discovery and Data Mining (CS5228)
Uncertainty Modelling in AI (CS5340)
2013
|
2017

National University of Singapore

Bachelor of Business Analytics, 4.91/5.0

Honours with Highest Distinction 🎓

Winner of Lee Kuan Yem Gold Medal 🥇

Awarded for Dean's List for 5 semesters

Main courses taken:
Mining Web Data for Business Insights (BT4222) | Search Engine Optimization & Analytics (BT4212)
Data Mining (ST4240) | Business Intelligence Systems (IS4240)
Stochastic Models in Management (DSC3215) | Computational Methods for Business
Analytics (BT3102) | Statistical Methods for Finance (ST4245)
Social Media Network Analysis (IS4241) | Simulation (ST3247)
Stochastic Process (ST3236) | Regression Analysis (ST3131)
2016
|
2016

CFA Institute

Passed Level I of the CFA Program

Publications

Footfall Count Estimation Techniques Using Mobile Data

2017 IEEE 18th International Conference on Mobile Data Management (MDM)

Playground

RNN Chinese Novel Generator

A Chinese text generator using RNN (Recurrent Neural Network) and LSTM (Long-short Term Memory) layers. The training text is Modu 《默读》, a popular web fiction in Chinese.

Flask Calendar Integrated with Plotly Charts

A concise calendar (Fullcalendar) using Flask framework, and integrated with plotly.js to showcase interactive charts for the data.

Contact

Somewhere in Singapore

yingchi.pei@gmail.com

www.linkedin.com/in/yingchi-pei

About Me

Skills

Programming Language

Big Data Tools

Machine Learning Models

Other

Experience

Apr 2019 | Present

Indeed

Data Scientist

Jul 2018 | Jan 2019

Bitmain

Data Scientist

Jul 2017 | Jul 2018

DataSpark

Data Scientist

Dec 2016 | Jan 2017

ViSenze

Data Analytics Intern

June 2016 | Nov 2016

DataSpark

Data Science Intern

May 2015 | July 2015

Millward Brown

Market Research Analyst Intern

Education

2018 | Present

National University of Singapore

Master of Computer Science, 4.83/5.0

2013 | 2017

National University of Singapore

Bachelor of Business Analytics, 4.91/5.0

Honours with Highest Distinction 🎓

Winner of Lee Kuan Yem Gold Medal 🥇

Awarded for Dean's List for 5 semesters

2016 |2016

CFA Institute

Passed Level I of the CFA Program

Publications

Footfall Count Estimation Techniques Using Mobile Data

Playground

RNN Chinese Novel Generator

Flask Calendar Integrated with Plotly Charts

Contact

Leave me a message :D

Apr 2019
|
Present

Jul 2018
|
Jan 2019

Jul 2017
|
Jul 2018

Dec 2016
|
Jan 2017

June 2016
|
Nov 2016

May 2015
|
July 2015

2018
|
Present

2013
|
2017

2016
|
2016