A trainer learns the function f(x)=y, or weights W, of the following form to predict a label y where x is a feature vector. y=f(x)=Wx
Without a bias clause (or regularization), f(x) cannot make a hyperplane that divides (1,1) and (2,2) becuase f(x) crosses the origin point (0,0).
With bias clause b, a trainer learns the following f(x). f(x)=Wx+b Then, the predicted model considers bias existing in the dataset and the predicted hyperplane does not always cross the origin.
add_bias() of Hivemall, adds a bias to a feature vector. To enable a bias clause, use addbias() for both(important!) training and test data as follows. The bias _b is a feature of "0" ("-1" in before v0.3) by the default. See AddBiasUDF for the detail.
Note that Bias is expressed as a feature that found in all training/testing examples.
Adding a bias clause to test data
create table e2006tfidf_test_exploded as
select
rowid,
target,
split(feature,":")[0] as feature,
cast(split(feature,":")[1] as float) as value
-- extract_feature(feature) as feature, -- hivemall v0.3.1 or later
-- extract_weight(feature) as value -- hivemall v0.3.1 or later
from
e2006tfidf_test LATERAL VIEW explode(add_bias(features)) t AS feature;
Adding a bias clause to training data
create table e2006tfidf_pa1a_model as
select
feature,
avg(weight) as weight
from
(select
pa1a_regress(add_bias(features),target) as (feature,weight)
from
e2006tfidf_train_x3
) t
group by feature;