Data Science Path
Contents
Machine Learning
Machine Learning Supervised Learning Classification (qualitative) K-NN Logistic Regression Neural Networks error analysis skewed data F1 score = 2PR/(P+R) accuracy precision recall learning curve SVM Kernal Parameters C sigma Regression (quantitative) Linear Regression validation sensitivity compactness Ensemble Learning decrease variance (bagging), bias (boosting), or improve predictions (stacking). The basic motivation of sequential methods is to exploit the dependence between the base learners. The basic motivation of parallel methods is to exploit independence between the base learners since the error can be reduced dramatically by averaging. Unsupervised Learning K-means k-node classification Principal Components Analysis Dara Vis Anomaly detection Clustering
Metrics
- select precision/recall/f-score Silhouette score Inter-‐cluster similarity Intra-‐cluster entropy
Programming
Programming Python numpy pandas matplotlib scipy scikit-learn Java pass-by-value Subtopic 1 Subtopic 2 Javascript JavaScript and HTML D3.js AJAX implementation jQuery Data structure Algorithm Design pattern MapReduce
Communication
Data Visualization
Communication & Data Visualisation 技巧不是最重要,想清楚再开口才是关键 化繁为简,高屋建瓴的表达能力 Prototype Design Research/publication Visual Encoding Spreadsheet tools (Excel) dashboard pivot table Data Presentation Knowing Your Audience
Maths
Linear Algebra vector matrix Eigenvalues and eigenvectors determinant reduced rows echelon form Linear independence invertible Functions and Graphing Multivariable derivatives and integration in Calculus
Statistics
Time Series Bayesian Longitudinal Experimental Design / Causal Inference A/B testing Descriptive and Inferential statistics Experimental design
Data Intuition
Program management Industrial Knowledge
Data Wrangling
1 2 3 |
SQL Database Systems MongoDB |
Other
1 2 3 4 5 6 7 8 |
Version Control/Git git log git clone git diff git branch git checkout git merge Markdown |
Author Chen Tong
LastMod 2017-09-13