Machine Learning

Machine Learning Supervised Learning Classification (qualitative) K-NN Logistic Regression Neural Networks error analysis skewed data F1 score = 2PR/(P+R) accuracy precision recall learning curve SVM Kernal Parameters C sigma Regression (quantitative) Linear Regression validation sensitivity compactness Ensemble Learning decrease variance (bagging), bias (boosting), or improve predictions (stacking). The basic motivation of sequential methods is to exploit the dependence between the base learners. The basic motivation of parallel methods is to exploit independence between the base learners since the error can be reduced dramatically by averaging. Unsupervised Learning K-means k-node classification Principal Components Analysis Dara Vis Anomaly detection Clustering

Metrics

select precision/recall/f-score Silhouette score Inter-‐cluster similarity Intra-‐cluster entropy

Programming

Programming Python numpy pandas matplotlib scipy scikit-learn Java pass-by-value Subtopic 1 Subtopic 2 Javascript JavaScript and HTML D3.js AJAX implementation jQuery Data structure Algorithm Design pattern MapReduce

Communication

Data Visualization

Communication & Data Visualisation 技巧不是最重要，想清楚再开口才是关键化繁为简，高屋建瓴的表达能力 Prototype Design Research/publication Visual Encoding Spreadsheet tools (Excel) dashboard pivot table Data Presentation Knowing Your Audience

Maths

Linear Algebra vector matrix Eigenvalues and eigenvectors determinant reduced rows echelon form Linear independence invertible Functions and Graphing Multivariable derivatives and integration in Calculus

Statistics

Time Series Bayesian Longitudinal Experimental Design / Causal Inference A/B testing Descriptive and Inferential statistics Experimental design

Data Intuition

Program management Industrial Knowledge

Data Wrangling

1
2
3


SQL
Database Systems
MongoDB

Other

1
2
3
4
5
6
7
8


Version Control/Git
    git log
    git clone
    git diff
    git branch
    git checkout
    git merge
Markdown

Data Science Path

Contents