Home
TABLE OF CONTENTS
1.1.
Introduction
1.2.
Getting Started
1.2.1.
Installation
1.2.2.
Install as permanent functions
1.2.3.
Input Format
1.3.
List of Functions
1.4.
Tips for Effective Hivemall
1.4.1.
Explicit add_bias() for better prediction
1.4.2.
Use rand_amplify() to better prediction results
1.4.3.
Real-time prediction on RDBMS
1.4.4.
Ensemble learning for stable prediction
1.4.5.
Mixing models for a better prediction convergence (MIX server)
1.4.6.
Run Hivemall on Amazon Elastic MapReduce
1.5.
General Hive/Hadoop Tips
1.5.1.
Adding rowid for each row
1.5.2.
Hadoop tuning for Hivemall
1.6.
Troubleshooting
1.6.1.
OutOfMemoryError in training
1.6.2.
SemanticException generate map join task error: Cannot serialize object
1.6.3.
Asterisk argument for UDTF does not work
1.6.4.
The number of mappers is less than input splits in Hadoop 2.x
1.6.5.
Map-side join causes ClassCastException on Tez
Part II - Generic Features
2.1.
List of Generic Hivemall Functions
2.2.
Efficient Top-K Query Processing
2.3.
Text Tokenizer
2.4.
Approximate Aggregate Functions
Part III - Feature Engineering
3.1.
Feature Scaling
3.2.
Feature Hashing
3.3.
Feature Selection
3.4.
Feature Binning
3.5.
Feature Paring
3.5.1.
Polynomial features
3.6.
Feature Transformation
3.6.1.
Feature vectorization
3.6.2.
Quantify non-number features
3.6.3.
Binarize label
3.6.4.
One-hot encoding
3.7.
Term Vector Model
3.7.1.
TF-IDF Term Weighting
3.7.2.
Okapi BM25 Term Weighting
Part IV - Evaluation
4.1.
Binary Classification Metrics
4.1.1.
Area under the ROC curve
4.2.
Multi-label Classification Metrics
4.3.
Regression Metrics
4.4.
Ranking Measures
4.5.
Data Generation
4.5.1.
Logistic Regression data generation
Part V - Supervised Learning
5.1.
How Prediction Works
5.2.
Step-by-Step Tutorial on Supervised Learning
Part VI - Binary Classification
6.1.
Binary Classification
6.2.
a9a Tutorial
6.2.1.
Data Preparation
6.2.2.
General Binary Classifier
6.2.3.
Logistic Regression
6.2.4.
Mini-batch Gradient Descent
6.3.
News20 Tutorial
6.3.1.
Data Preparation
6.3.2.
Perceptron, Passive Aggressive
6.3.3.
CW, AROW, SCW
6.3.4.
General Binary Classifier
6.3.5.
Baggnig classiers
6.3.6.
AdaGradRDA, AdaGrad, AdaDelta
6.3.7.
Random Forest
6.3.8.
XGBoost
6.4.
KDD2010a Tutorial
6.4.1.
Data Preparation
6.4.2.
PA, CW, AROW, SCW
6.5.
KDD2010b Tutorial
6.5.1.
Data Preparation
6.5.2.
AROW
6.6.
Webspam Tutorial
6.6.1.
Data Pareparation
6.6.2.
PA1, AROW, SCW
6.7.
Kaggle Titanic Tutorial
6.8.
Criteo Tutorial
6.8.1.
Data Preparation
6.8.2.
Field-Aware Factorization Machines
Part VII - Multiclass Classification
7.1.
News20 Multiclass Tutorial
7.1.1.
Data Preparation
7.1.2.
Data Preparation for one-vs-the-rest classifiers
7.1.3.
PA
7.1.4.
CW, AROW, SCW
7.1.5.
XGBoost
7.1.6.
Ensemble learning
7.1.7.
one-vs-the-rest Classifier
7.2.
Iris Tutorial
7.2.1.
Data preparation
7.2.2.
SCW
7.2.3.
Random Forest
7.2.4.
XGBoost
Part VIII - Regression
8.1.
Regression
8.2.
E2006-tfidf Regression Tutorial
8.2.1.
Data Preparation
8.2.2.
General Regessor
8.2.3.
Passive Aggressive, AROW
8.2.4.
XGBoost
8.3.
KDDCup 2012 Track 2 CTR Prediction Tutorial
8.3.1.
Data Preparation
8.3.2.
Logistic Regression, Passive Aggressive
8.3.3.
Logistic Regression with amplifier
8.3.4.
AdaGrad, AdaDelta
Part IX - Recommendation
9.1.
Collaborative Filtering
9.1.1.
Item-based Collaborative Filtering
9.2.
News20 Related Article Recommendation Tutorial
9.2.1.
Data Preparation
9.2.2.
LSH/MinHash and Jaccard Similarity
9.2.3.
LSH/MinHash and Brute-force Search
9.2.4.
kNN search using b-Bits MinHash
9.3.
MovieLens Movie Recommendation Tutorial
9.3.1.
Data Preparation
9.3.2.
Item-based Collaborative Filtering
9.3.3.
Matrix Factorization
9.3.4.
Factorization Machine
9.3.5.
SLIM for fast top-k Recommendation
9.3.6.
10-fold Cross Validation (Matrix Factorization)
Part X - Anomaly Detection
10.1.
Outlier Detection using Local Outlier Factor (LOF)
10.2.
Change-Point Detection using Singular Spectrum Transformation (SST)
10.3.
ChangeFinder: Detecting Outlier and Change-Point Simultaneously
Part XI - Clustering
11.1.
Latent Dirichlet Allocation
11.2.
Probabilistic Latent Semantic Analysis
Part XII - GeoSpatial Functions
12.1.
Lat/Lon functions
Part XIII - Hivemall on SparkSQL
13.1.
Getting Started
13.1.1.
Installation
13.2.
Binary Classification
13.2.1.
a9a Tutorial for SQL
13.3.
Regression
13.3.1.
E2006-tfidf Regression Tutorial for SQL
Part XIV - Hivemall on Docker
14.1.
Getting Started
Part XIV - External References
15.1.
Hivemall on Apache Pig
Published with GitBook
Getting Started
Summary
results matching "
"
No results matching "
"