feature engineer

Onehot encoding One of k encoding on an array of lenght k hash encoding label encoding non-linear tree-based not increase dimensionality count encoding target encoding encode categorical variables by their ratio of target reducing dimensonality https://medium.com/towards-data-science/reducing-dimensionality-from-dimensionality-reduction-techniques-f658aec24dfe PCA t-SNE Auto Encoders

interview ds

Probability/Statistics Questions 1. conditional probability 如何判断独立性? 2. bayes rule problem: base rate fallacy sampling: bootstrap, reservoir A/B test, p value Poisson, Binomial以及Exponential queuing question: The most efficient way to queue is single line, multiple servers. It is efficient because: You don’t have servers

python pandas

aggragate in pandas https://pythonforbiologists.com/when-to-use-aggregatefiltertransform-in-pandas/