Using the E2006 tfidf regression example, we explain how to evaluate the prediction model on Hive.
Scoring by evaluation metrics
select avg(actual), avg(predicted) from e2006tfidf_pa2a_submit;
-3.8200363760415414 -3.9124877451612488
set hivevar:mean_actual=-3.8200363760415414;
select
-- Root Mean Squared Error
rmse(predicted, actual) as RMSE,
-- sqrt(sum(pow(predicted - actual,2.0))/count(1)) as RMSE,
-- Mean Squared Error
mse(predicted, actual) as MSE,
-- sum(pow(predicted - actual,2.0))/count(1) as MSE,
-- Mean Absolute Error
mae(predicted, actual) as MAE,
-- sum(abs(predicted - actual))/count(1) as MAE,
-- coefficient of determination (R^2)
-- 1 - sum(pow(actual - predicted,2.0)) / sum(pow(actual - ${mean_actual},2.0)) as R2
r2(predicted, actual) as R2
from
e2006tfidf_pa2a_submit;
0.38538660838804495 0.14852283792484033 0.2466732002711477 0.48623913673053565
Logarithmic Loss
Logarithmic Loss can be computed as follows:
WITH t as (
select
0 as actual,
0.01 as predicted
union all
select
1 as actual,
0.02 as predicted
)
select
-SUM(actual*LN(predicted)+(1-actual)*LN(1-predicted))/count(1) as logloss1,
logloss(predicted, actual) as logloss2 -- supported since Hivemall v0.4.2-rc.1
from
from t;
1.9610366706408238 1.9610366706408238