• Gerardnico
    • About
    • Log In
    • Page Tools
      • Show pagesource
      • Old revisions
      • Backlinks
      • Back to top
      Breadcrumb:
    1. Test
    Advertising

    Test

    > (Statistics|Probability|Machine Learning|Data Mining|Data and Knowledge Discovery|Pattern Recognition|Data Science|Data Analysis)

    1 - List

    • Statistics - Hypothesis (Tests|Testing)
    • How to unit test machine learning code
    data_mining/test.txt · Last modified: 2017/11/01 14:22 by gerardnico

    (Statistics|Probability|Machine Learning|Data Mining|Data and Knowledge Discovery|Pattern Recognition|Data Science|Data Analysis) 302 pages
  • The 1 Percent Rule
  • (Absolute|True) Zero
  • (Parameters|Model) (Accuracy|Precision|Fit|Performance) Metrics
  • Adjusted R^2
  • Akaike information criterion (AIC)
  • Algorithms
  • (Anomaly|outlier) Detection
  • Apriori algorithm
  • Association (Rules Function|Model) - Market Basket Analysis
  • Attribute (Importance|Selection) - Affinity Analysis
  • Area under the curve (AUC)
  • Automatic Discovery
  • Bootstrap aggregating (bagging)
  • (Base rate fallacy|Bonferroni's principle)
  • (Baseline|Naive) classification (Zero R)
  • Bayes’ Theorem (Probability)
  • Bayesian
  • Benford's law (frequency distribution of digits)
  • Best Subset Selection Regression
  • Bias (Sampling error)
  • Bias-variance trade-off (between overfitting and underfitting)
  • Bayesian Information Criterion (BIC)
  • R (Big R)
  • Bimodal Distribution
  • Binary logistic regression
  • Mathematics - (Combination|Binomial coefficient|n choose k)
  • (Probability|Statistics) - Binomial Distribution
  • Data Mining, Book
  • (Boosting|Gradient Boosting|Boosting trees)
  • Decision boundary Visualization
  • (C4.5|J48) algorithm
  • (Case-control|retrospective) sampling
  • Causation - Causality (Cause and Effect) Relationship
  • Cumulative Distribution Function (CDF)
  • Centering Continous Predictors
  • Central limit theorem (CLT)
  • Centroid (center of gravity)
  • Chance
  • Customer - Churn Analysis (Customer retention)
  • (Class|Category|Label) Target
  • (Classifier|Classification Function)
  • Clustering (Function|Model)
  • (Prediction|Recommender System) - Collaborative filtering
  • Competitions (Kaggle and others)
  • Statistics - (Confidence|likelihood) (Prediction probabilities|Probability classification)
  • Confidence Interval
  • Confounding (factor|variable) - (Confound|Confounder)
  • Confusion Matrix
  • Content Analysis and Acquisition
  • Continuous Variable
  • Convex
  • Correlation (Coefficient analysis)
  • Cosine Similarity (Measure of Angle)
  • Covariance
  • Mallow's Cp
  • Cross Product (of X and Y) (CP|SP)
  • (Statistics|Data Mining) - (K-Fold) Cross-validation (rotation estimation)
  • (Periodicity|Periodic phenomena|Cycle)
  • (Data|Knowledge) Discovery - Statistical Learning
  • Data Point
  • Data (Preparation | Wrangling | Munging)
  • Data Product
  • Data - Science
  • Data Scientist
  • Decision Tree (DT) Algorithm
  • Decision Stump
  • Deep Learning (Network)
  • (Degree|Level) of confidence
  • Degree of freedom (df)
  • (dependent|paired sample) t-test
  • Math - Derivative (Sensitivity to Change, Differentiation)
  • Design Matrix (X)
  • Deviance
  • Deviation Score (for one observation)
  • Dimensionality (number of variable, parameter) (P)
  • (Dimension|Feature) (Reduction)
  • (Data|Text) Mining - Word-sense disambiguation (WSD)
  • (Discretizing|binning) (bin)
  • Discriminant analysis
  • Quadratic discriminant analysis (QDA)
  • (Discriminative|conditional) models
  • Distance
  • (Probability|Sampling) Distribution
  • Dummy (Coding|Variable) - One-hot-encoding (OHE)
  • Effects (between predictor variable)
  • Effect Size
  • Elastic Net Model
  • Ensemble Learning (meta set)
  • Entropy (Information Gain)
  • Prediction Error (Training versus Test)
  • (Error|misclassification) Rate - false (positives|negatives)
  • (Estimation|Approximation)
  • (Estimator|Point Estimate) - Predicted (Score|Target|Outcome|...)
  • Data analysis - Explanatory
  • Exponential Distribution
  • F-distributions
  • (F-Statistic|F-test|F-ratio)
  • Face Recognition
  • (Factor Variable|Qualitative Predictor)
  • Factor Analysis
  • Factorial Anova
  • Feature Engineering
  • (Feature|Attribute) Extraction Function
  • Feature Hashing
  • (Attribute|Feature) (Selection|Importance)
  • Fraud Detection
  • (Frequency|Rate)
  • (Frequent itemsets|co-occurring items)
  • Frequentist
  • Data Model - Fudge factor
  • Fuzzy Logic (Partial Truth)
  • Generalized additive model (GAM)
  • Gaussian processes (modelling probability distributions over functions)
  • Generalized Boosted Regression Models
  • Generative Model
  • Getting Started
  • Generalized Linear Models (GLM) - Extensions of the Linear Model
  • (Stochastic) Gradient descent (SGD)
  • User Group
  • Grouping
  • Head
  • Hierarchical Clustering
  • Hierarchy
  • High Dimension (Curse of Dimensionality)
  • Data Science - History
  • Homoscedasticity
  • Hypothesis (Tests|Testing)
  • ID3 Algorithm
  • Intrusion detection systems (IDS)
  • Image classification
  • Independent t-test
  • Statistical - Inference
  • Information Gain
  • Information Retrieval
  • (Interaction|Synergy) effect
  • Intercept - Regression (coefficient|constant) <math>B_0</math>
  • Model Interpretation
  • (Interval|Delta) (Measurement)
  • Java API for data mining (JDM)
  • K-Means Clustering algorithm
  • Kernel
  • K-Nearest Neighbors (KNN) algorithm - Instance based learning
  • Knots (Cut points)
  • Kurtosis (Distribution Tail extremity)
  • Statistical Learning - Lasso
  • Standard Least Squares Fit (Guassian linear model)
  • Leptokurtic distribution
  • (Level|Label)
  • (Lying|Lie)
  • (Life cycle|Project|Data Pipeline)
  • Lift Chart
  • Statistical Learning - Simple Linear Discriminant Analysis (LDA)
  • Fisher (Multiple Linear Discriminant Analysis|multi-variant Gaussian)
  • Linear (Regression|Model)
  • (Linear spline|Piecewise linear function)
  • Little r - (Pearson product-moment Correlation coefficient)
  • Global vs Local
  • LOcal (Weighted) regrESSion (LOESS|LOWESS)
  • Log-likelihood function (cross-entropy)
  • Logistic regression (Classification Algorithm)
  • (Logit|Logistic) (Function|Transformation)
  • Loss functions (Incorrect predictions penalty)
  • Data Science - (Kalman Filtering|Linear quadratic estimation (LQE))
  • Machine Learning
  • Main Effect
  • Probability mass function (PMF)
  • Function - Maximum (Max)
  • Maximum Entropy Algorithm
  • Maximum likelihood
  • Measure
  • (Scales of measurement|Type of variables)
  • (Missing Value|Not Available)
  • Model Size (d)
  • Model vs Expert
  • Moderator Variable (Z) - Moderation
  • Monte Carlo (method|experiment) (stochastic process simulations)
  • (Average|Mean) Squared (MS) prediction error (MSE)
  • Multi-variant logistic regression
  • Multi-class (classification|problem)
  • (Multiclass Logistic|multinomial) Regression
  • Multidimensional scaling ( similarity of individual cases in a dataset)
  • Multiple Linear Regression
  • Naive Bayes (NB)
  • (Probabilistic?) Neural Network (PNN)
  • (No Predictor|Mean|Null) Model
  • Noise (Unwanted variation)
  • Non-linear (effect|function|model)
  • Non-Negative Matrix Factorization (NMF) Algorithm
  • Multi-response linear regression (Linear Decision trees)
  • (Normal|Gaussian) Distribution - Bell Curve
  • Orthogonal Partitioning Clustering (O-Cluster or OC) algorithm
  • Odds (Ratio)
  • (One|Simple) Rule - (One Level Decision Tree)
  • Outliers Cases
  • (Overfitting|Overtraining|Robust|Generalization) (Underfitting)
  • Data Science - Over-generalization
  • (Paretian|Power law) distribution
  • Pareto ( Principle | Distribution )
  • Pattern
  • Principal Component (Analysis|Regression) (PCA)
  • (Probability) Density Function (PDF)
  • Mathematics - Permutation (Ordered Combination)
  • Piecewise polynomials
  • Partial least squares (PLS)
  • Predictive Model Markup Language (PMML)
  • Poisson (Process|distribution)
  • (Global) Polynomial Regression (Degree)
  • Population Parameter
  • Post-hoc test
  • Power of a test
  • (Prediction|Guess)
  • Predictive Model Markup Language (PMML)
  • (Machine|Statistical) Learning - (Predictor|Feature|Regressor|Characteristic) - (Independent|Explanatory) Variable (X)
  • Privacy (Anonymization)
  • Probability
  • Probit Regression (probability on binary problem)
  • Problem
  • Process control (SPC)
  • Pruning (a decision tree, decision rules)
  • R-squared (<math>R^2</math>|Coefficient of determination) for Model Accuracy
  • Random forest
  • Random Variable
  • Range
  • Rare Event
  • (Fraction|Ratio|Percentage|Share) (Variable|Measurement)
  • Raw score
  • Regression
  • (Regression Coefficient|Weight|Slope) (B)
  • Assumptions underlying correlation and regression analysis (Never trust summary statistics alone)
  • (Machine learning|Inverse problems) - Regularization
  • Reinforcement learning
  • Sampling - Sampling (With|without) replacement (WR|WOR)
  • Research
  • (Residual|Error Term|Prediction error|Deviation) (e|<math>\epsilon</math>)
  • Resistant
  • Result Considerations
  • Ridge regression
  • Root Mean Square (RMS)
  • Root mean squared (Error|Deviation) (RMSE|RMSD)
  • ROC Plot and Area under the curve (AUC)
  • Rote Classifier
  • Residual sum of Squares (RSS) = Squared loss ?
  • (Decision) Rule
  • Sampling
  • Sampling Distribution
  • Sampling Error
  • Scale
  • Scoring (Applying)
  • (Random) Seed
  • (Shrinkage|Regularization) of Regression Coefficients
  • Signal (Wanted Variation)
  • Significance level
  • (Significance | Significant) Effect
  • Similarity
  • Simple Effect
  • (Univariate|Simple) Logistic regression
  • (Univariate|Simple|Basic) Linear Regression
  • Skew (-ed Distribution|Variable)
  • ( Spread | Variability ) of a sample
  • Stacking
  • Standard Deviation (SD|s|<math>\sigma</math>|RMS width)
  • Standard Error (SE)
  • (Normalize|Standardize)
  • Statistic
  • Forward and Backward Stepwise (Selection|Regression)
  • (Stochastic|random) process
  • (Supervised|Directed) Learning ("Training") (Problem)
  • Support Vector Machines (SVM) algorithm
  • Singular Value Decomposition (SVD)
  • (Student's) t-test (Mean Comparison)
  • T-distributions
  • Tail
  • (Machine|Statistical) Learning - (Target|Learned|Outcome|Dependent|Response) (Attribute|Variable) (Y|DV)
  • Test
  • (Test|Expected|Generalization) Error
  • Test Set
  • (Threshold|Cut-off) of binary classification
  • Titanic Data Set
  • Training Error
  • Training (Data|Set)
  • Nested (Transactional|Historical) Data
  • Transform
  • Treatments (Combination of factor level)
  • True score (Classical test theory)
  • (True Function|Truth)
  • (Total) Sum of the square (TSS|SS)
  • Tuning Parameter
  • (two class|binary) classification problem
  • Statistical Learning - Two-fold validation
  • Data - Uncertainty
  • Uniform Distribution (platykurtic)
  • Unsupervised Learning ("Mining")
  • Resampling through Random Percentage Split
  • Validity (Valid Measures)
  • (Variance|Dispersion|Mean Square) (MS)
  • Variation (Change?)
  • Probability and Vizualization
  • Statistics vs (Machine Learning|Data Mining)
  • Random Walk
  • (Golf|Weather) Data Set
  • Z Scale
  • Z Score (Zero Mean) or Standard Score
  • Back to top
    Advertising

    Data (State)

    • Data Processing
    • Data Modeling
    • Data Quality
    • Data Structure
    • Data Type
    • Data Warehouse
    • Data Visualization
    • Data Partition
    • Data Persistence
    • Data Concurrency

    Data Type

    • Number
    • Time
    • Text
    • Collection
    • Relation (Table)
    • Tree
    • Key/Value
    • Graph
    • Spatial
    • Color

    Measure Levels

    • Order
    • Nominal
    • Discrete
    • Distance
    • Ratio

    Code

    • Compiler
    • Lexical Parser
    • Grammar
    • Function
    • Testing
    • Shipping
    • Data Type
    • Versioning

    Web

    • HTML
    • HTTP
    • CSS
    • Selector
    • Javascript
    • DOM
    • Browser
    • Web Services

    System

    • Operating System
    • Process (Thread)
    • Security
    • File System
    • Network
    Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 4.0 International
    CC Attribution-Noncommercial-Share Alike 4.0 International Valid HTML5 Valid CSS Driven by DokuWiki
    Bootie Template designed by Gerardnico with the help of Bootstrap.