你面试的时候可以解释说一下如何scale 整个性能瓶颈会出现哪里?如何解决?

Professional Experience Data Engineer, 7Chord, New York Jan 2017 – Sep 2017 • Designed and built near real time data ETL pipeline scheduled by Airflow, which streams 10GB data from 10 different data source Distributed web crawler with Scrapy • Predicted signal of bond trading by ensemble learning • Visualized machine learning results by d3.js

https://www.confluent.io/blog/stream-data-platform-1/

Data Analyst, Uber Beijing, China Jan 2015 – Dec 2015 • Optimized, and Maintained ETL pipelines • Detected drivers’ or customers’ fraud behaviors by PCA, which reduced detection time by 50%, and increased accuracy to 90%

Data Analyst, ICBC Beijing, China Jun 2013 – Jan 2015 • Decreased workflow time by 10% through developing a growth development strategy with the help of business intelligence analysis • Analyzed 100,000 customer surveys through DMAIC process in Six Sigma projects and then applied optimized service strategies, which resulted in a 20% improvement of custom satisfaction

Academic Research Re-rank web documents with Learning to Rank Director: Guoxiao Yang Sep 2016 – Jan 2017 • Optimized and implemented ranking model, LambdaMART to improve rerank quality, evaluated by NCDG, ERR, and P

• Retrieved documents from ClueWeb09 databases, built index, and ranked queries results by tf-idf with indri query language • Obtained important features, such as query likelihood, by NLP methods

Education New York University Brooklyn, NY Jan 2018 Master of Science in Computer Science GPA: 3.94.0 Beijing Institute of Technology Beijing, China Jun 2013 Bachelor of Science in Mathematics & Applied Mathematics GPA: 3.64.0

Technical Skills Programming Language: Python, JavaScript, Java, Scala, Go Machine Learning: scikit-learn, MLlib Cluster Computing: Apache Spark, Apache Hadoop MapReduce Database: PostgreSQL, MongoDB, MySQL Message Broker: RabbitMQ, Apache Kafka Data Visualization: d3.js, React