Predixion Insight supports a multitude of algorithms to help you Predict Everything™. Predixion enables you to simultaneously run multiple algorithms from different libraries, compare their performance and accuracy, and automatically or manually select a ‘champion model’ based on your specific goals.
Based on this functionality, we natively integrate and include the following algorithms for model creation, in-process scoring, and collaboration:
Classification involves assigning a new observation to one of several pre-determined classes by analyzing information about the observation. Applications of classification include:
- Identifying a new email as spam or non-spam based on information about the sender and content.
- Predicting if a device is likely or unlikely to fail.
Algorithm |
Description |
Decision Forest |
An ensemble learning method which creates a multitude of decision trees over different samples (with repetition) over the training set and then uses a voting mechanism to output a class. The implementation is based on the paper authored by Breiman and Cutler.
|
Boosted Trees |
An ensemble learning method which successively creates multiple decision trees, reweighting the data after each iteration and increasing the weight of misclassified cases. A voting mechanism is used to output a class. The implementation is based on the “Multi-class Ada Boost” paper by Zhu, Rosset, Zou and Hastie. |
Principal Component Analysis |
A linear transformation technique to describe the variance of the original data in descending order. This algorithm can be used for data analysis but also before applying another regression, clustering or classification algorithm. Principal Components Analysis is frequently used in conjunction with Decision Forests, Boosted Trees or Neural Networks algorithms to improve accuracy. |
Logistic Regression with L1 Regularization |
Create a logistic regression that classifies two or more outcome states from one or more input attributes with L1 regularization to improve accuracy. |
MS/R Classification Rules |
Create rules that classify two or more outcome states from one or more input attributes. |
MS/R Decision Trees |
Create a decision tree that classifies two or more outcome states from one or more input attributes. |
MS/R Logistic Regression |
Create a logistic regression that classifies two or more outcome states from one or more input attributes. |
MS/R Naive Bayes |
Create a naïve Bayes classifier that classifies two or more outcome states from one or more input attributes. |
MS/R Neural Network |
Create a single-hidden-layer neural network that estimates a continuous outcome from one or more input attributes. |
Estimation predicts the value of a parameter based on observed values of other parameters. Examples include:
- Predicting a person’s weight based on their height, using the known values of several other people’s height and weight.
- Estimating lifetime value of a customer based on demographic and behavioral information.
Algorithm |
Description |
MS/R Gaussian Trees |
Create a Gaussian regression tree that estimates a continuous outcome from one or more input attributes. |
MS/R Linear Regression |
Create a linear regression that estimates a continuous outcome from one or more continuous input attributes. |
MS/R Neural Network |
Create a single-hidden-layer neural network that estimates a continuous outcome from one or more input attributes. |
MS Logistic Regression |
Create a logistic regression that classifies two or more outcome states from one or more input attributes. |
MS Regression Trees |
Create a regression tree that uses multiple linear regressions to estimate a continuous outcome from one or more input attributes. |
Segmentation is separating a group into subgroups (segments, clusters) based on unknown shared characteristics. Members of one segment are more similar to one another than to members of a different segment. For example:
- Identifying groups of customers with similar purchasing behaviors so that marketing strategies can then be targeted to specific segments.
- Grouping website visitors by browsing behavior for improving user interface design.
- Assist in fraud exposure and prevention using anomaly and outlier detection.
Algorithm |
Description |
MS/R K-means Clustering |
Create a k-means segmentation that clusters observation with continuous and nominal inputs. |
MS/R Probabilistic Clustering |
Create a multivariate Gaussian segmentation that clusters observation with continuous and nominal inputs. |
Association is a data-mining technique that identifies items or events that occur together. For example:
- Analyzing supermarket transaction records and determining that milk and cereal are often purchased together.
- Determining which specific sets of faults frequently occur together during a vehicle failure.
Algorithm |
Description |
MS/R Association Rules |
Create an association model to mine frequent itemsets and association rules, where items are grouped into itemsets by a transaction ID. |
MS/R Associative Trees |
Create a decision tree that associates frequent itemsets with association rules, where items are grouped into itemsets by a transaction ID. |
Forecasting is the technique used to predict future values based on previously observed values. For example:
- Predicting the volume of sales and required warehouse space for the upcoming holiday season based on previous patterns of holiday sales and this year’s consumer spending.
- Predicting ahead of time whether an equipment malfunction will occur based on historical sensor readings.
Algorithm |
Description |
R ARIMA |
Create an ARIMA model from univariate or multivariate time series. |
MS ARTXP |
Create an ARTXP model from multivariate time series. |
MS Blended ARIMA and ARTXP |
Create a blended ARTXP and ARIMA model from multivariate time series. |
MS ARIMA/R Auto ARIMA |
Create the best ARIMA model based on AIC, AICc, and BIC from a univariate time series. |