N
NeuronLabs
📄 article

Model Deployment & REST APIs

Difficulty: M.TechRead Time: ~15 min

The MLOps Lifecycle

Building a model is only 10% of the work. The remaining 90% is deploying, serving, monitoring, and maintaining that model in production.

Serving Models with FastAPI

FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.7+ based on standard Python type hints.

Building a Prediction API

First, you must serialize (save) your trained model using a library like joblib or pickle.

python
import joblib

# Save the model
joblib.dump(clf, 'model.joblib')

Then, load the model inside a FastAPI application to serve predictions over HTTP.

python
from fastapi import FastAPI
from pydantic import BaseModel
import joblib
import numpy as np

app = FastAPI()

# Load the model globally so it's loaded only once on startup
model = joblib.load('model.joblib')

class PredictionRequest(BaseModel):
    features: list[float]

@app.post("/predict")
def predict(request: PredictionRequest):
    # Convert input to 2D numpy array
    input_data = np.array(request.features).reshape(1, -1)
    
    # Generate prediction
    prediction = model.predict(input_data)
    
    return {"prediction": int(prediction[0])}

To test this API, you can send a POST request with a JSON payload containing the features. This is how the frontend will communicate with your machine learning model!