Predictions

In this section you will learn how to use your model do precitions over new data and data stored on your database. The examples below assume a table named iris as follow:

CREATE TABLE iris (
	id integer generated by default as identity,
	sepal_length float,
	sepal_width float,
	petal_length float,
	petal_width float,
	class VARCHAR(20) -- This column will be filled by pgpyml
);

With your model prepared you can easily make predictions inside your database with pgpyml. You can use the predict function to apply your model over the data.

The predict function is the easiest way to deploy your model, you only need to specificate the path on your disk to the model and pass and array of features. This function can be used to predict a single instance or multiple instances at same time. The features must be passed as a nested array, like you usually do in sklearn.

SELECT * FROM pgpyml.predict(
    '/home/vagrant/examples/iris/models/iris_decision_tree.joblib', 
    '{{5.2,3.5,1.5,0.2}}'
); 
-- Output: {Iris-setosa} (or any other class your model predict)

-- Prediction multiple instances at once
SELECT * FROM pgpyml.predict(
    '/home/vagrant/examples/iris/models/iris_decision_tree.joblib', 
    '{{5.2,3.5,1.5,0.2}, {7.7,2.8,6.7,2.0}}'
);
-- Output: {Iris-setosa,Iris-virginica}

-- You can also use the ARRAY notation
SELECT * FROM pgpyml.predict(
    '/home/vagrant/examples/iris/models/iris_decision_tree.joblib', 
    ARRAY[[5.2,3.5,1.5,0.2], [7.7,2.8,6.7,2.0]]
);
-- Output: {Iris-setosa,Iris-virginica}
        

The predict function expects two arguments, the first argument is the path to your trained model, this path must be reachable by your Postgres server. The second argument is a nested array of instances to be predicted, each element of the array will have an element on the output. The output are an text array with the predictions of your model.

Casts

The predict function gives the output as an array of text, this way you can cast the produced output to the suitable type. You can use the "casted" version of predict to obtain the output casted to int or real.

SELECT * FROM pgpyml.predict_int(
    '/path/to/model.joblib', 
    '{{1.0, 0.0, 1.0}}'
); 
-- Output as INT

SELECT * FROM pgpyml.predict_real(
    '/path/to/model.joblib', 
    '{{1.0, 0.0, 1.0}}'
); 
-- Output as REAL

In-table Prediction

You may want to use your model over data that are already stored on your database. To do that you can use the predict_table_row function.

This function expects as the first argument the model you want to use, the second argument is the name of the table where the data is stored, the third argument is an array with the name of the columns that will be used as features by your model, and finally the forth argument is the id of the row you want to classify:

SELECT * FROM pgpyml.predict_table_row(
    '/home/vagrant/examples/iris/models/iris_decision_tree.joblib', -- The trained model
    'iris', -- Table with the data
    '{"sepal_length", "sepal_width", "petal_length", "petal_width"}', -- The columns used as feature
    1 -- The ID of your data
);
    

Predict on Insertion

You can use pgpyml to predict data on insertion. You can create a trigger to classify the new data and populate a column on the same table.

Use the classification_trigger function to create a trigger and classify the data when a insert or update occurs:

CREATE TRIGGER classify_iris
BEFORE INSERT OR UPDATE ON "iris"
FOR EACH ROW 
EXECUTE PROCEDURE pgpyml.classification_trigger(
    '/home/vagrant/examples/iris/models/iris_decision_tree.joblib', -- The trained model
    'class', -- Column name where the prediction will be saved
    'sepal_length', -- Feature 1
    'sepal_width', -- Feature 2
    'petal_length', -- Feature 3
    'petal_width'-- Feature 4
);

The first argument is the path to the model, the second argument is the name of the column where the prediction will be stored. Any argument after the second one will be used as the name of the column that holds the features values to be used by the model.

After creating the trigger you can insert new data on the table, and the result of the classification will be saved on the column specified in the second argument:

-- Notice that the class is not being inserted, but will be added by the trigger function
INSERT INTO iris (sepal_length, sepal_width, petal_length, petal_width) VALUES (5.2,3.5,1.5,0.2);

-- Check the last inserted row, it will have the column 'class' filled
SELECT * FROM iris WHERE id = (SELECT MAX(id) FROM iris);

Next Steps

You can read more about loading and unloading models in the IO Page.