How to use my model? ​
We propose a simple decision tree to to help you understand how to integrate a machine learning model in a Marcelle application.
Do you have a pre-existing model?
What do you want to do?
What framework do you use?
What do you want to do?
What language to you want to use?
Using PyTorch Models ​
There are three possible solutions to use a PyTorch model in a Marcelle application.
Solution 1: server-side inference with Ray Recommended for most cases ​
In most cases, the simplest solution consists in using a Python web framework of serving library to expose the model at a HTTP endpoint. Several possibilities exist, including Torch-specific libraries such as TorchServe or generic web frameworks such as Starlette.
We recommend using Ray Serve, a framework-agnostic model serving library for building online inference APIs. Ray enables you to expose your prediction function over an HTTP endpoint, which can be interogated from a lightweight custom Marcelle component and seamlessly integrated in a Marcelle App.
Pros:
- High compatibility: it is possible to run any Python code, with any ML framework
- Scalability: Ray facilitate scaling and using various architectures
- Independent from the client's capabilities
Cons:
- Requires setting up and managing a HTTP server
- Requires sending client data to the server
Solution 2: client-side inference with ONNX Recommended for small models ​
In some cases, it can be useful to run the inference on the client side, to avoid, for instance, sending private client data to the server. It also simplifies the deployment of the application, as it is not necessary to run and maintain a web server that performs inference in real-time, and a static website might be enough.
For these scenarios, your PyTorch model can be converted to the ONNX format (Open Neural Network Exchange), so that inference is performed in the web browser using onnxruntime.
Pros:
- Privacy: no user data needs to be sent to the server for inference
- Low latency: once the model is loaded, predictions do not depend on the internet connection, which is useful for high-throuput applications
- Easy deployment: no need to manage an inference server
Cons:
- Limited compatibility: ONNX Runtime-web remains experimental and not all operators are supported, which can
- Dependence on the user's device can limit performance
- Not appropriate for "big" models, both regarding network and performance issues
Solution 3: server-side inference through the backend Recommended for long inference ​
This solution is recommended when inference is long (e.g. in generative tasks) and requires real-time monitoring. In that case, it is possible to use a data service on the Marcelle backend to manage and monitor inference runs. Predictions are requested by the web client, and the status of the processing can be updated from Python to enable realtime monitoring in the web client, thanks to websocket communication.
Pros:
- Realtime monitoring: during inference, the Python code can update the status
- Python scripts are websocket clients, meaning that the machine running the Python code does not need to run a server and expose an endpoint publicly.
Cons:
- Complex for simple use cases
- Experimental: stability is not ensured
Using Tensorflow or Keras Models ​
There are 3 possible solutions to use a Tensorflow model in a Marcelle application.
TODO
Write docs
Solution 1: client-side inference with Tensorflow.js ​
In some cases, it can be useful to run the inference on the client side, to avoid, for instance, sending private client data to the server. It also simplifies the deployment of the application, as it is not necessary to run and maintain a web server that performs inference in real-time, and a static website might be enough.
For these scenarios, your Tensorflow model can be converted to a web-friendly format ro run inference in the web browser using the Tensorflow.js library.
Pros:
- TODO
Cons:
- TODO
Solution 2: server-side inference with Ray ​
Another solution consists in using a Python web framework of serving library to expose the model at a HTTP endpoint. Several possibilities exist, including Tensorflow-specific libraries such as TFX's TensorFlow Serving or generic web frameworks such as Starlette.
We recommend using Ray Serve, a framework-agnostic model serving library for building online inference APIs. Ray enables you to expose your prediction function over an HTTP endpoint, which can be interogated from a lightweight custom Marcelle component and seamlessly integrated in a Marcelle App.
Pros:
- High compatibility: it is possible to run any Python code, with any ML framework
- Scalability: Ray facilitate scaling and using various architectures
- Independent from the client's capabilities
Cons:
- Requires setting up and managing a HTTP server
- Requires sending client data to the server
Solution 4: server-side inference through the backend Recommended for long inference ​
This solution is recommended when inference is long (e.g. in generative tasks) and requires real-time monitoring. In that case, it is possible to use a data service on the Marcelle backend to manage and monitor inference runs. Predictions are requested by the web client, and the status of the processing can be updated from Python to enable realtime monitoring in the web client, thanks to websocket communication.
Pros:
- Realtime monitoring: during inference, the Python code can update the status
- Python scripts are websocket clients, meaning that the machine running the Python code does not need to run a server and expose an endpoint publicly.
Cons:
- Complex for simple use cases
- Experimental: stability is not ensured
Using Scikit-Learn Models ​
There are two possible solutions to use a Scikit-Learn model in a Marcelle application.
Solution 1: server-side inference with Ray Recommended for most cases ​
In most cases, the simplest solution consists in using a Python web framework of serving library to expose the model at a HTTP endpoint. Several possibilities exist, including generic web frameworks such as Starlette.
We recommend using Ray Serve, a framework-agnostic model serving library for building online inference APIs. Ray enables you to expose your prediction function over an HTTP endpoint, which can be interogated from a lightweight custom Marcelle component and seamlessly integrated in a Marcelle App.
Pros:
- High compatibility: it is possible to run any Python code, with any ML framework
- Scalability: Ray facilitate scaling and using various architectures
- Independent from the client's capabilities
Cons:
- Requires setting up and managing a HTTP server
- Requires sending client data to the server
Solution 2: client-side inference with ONNX Recommended for small models ​
In some cases, it can be useful to run the inference on the client side, to avoid, for instance, sending private client data to the server. It also simplifies the deployment of the application, as it is not necessary to run and maintain a web server that performs inference in real-time, and a static website might be enough.
For these scenarios, your Scikit-Learn model can be converted to the ONNX format (Open Neural Network Exchange), so that inference is performed in the web browser using onnxruntime.
Pros:
- Privacy: no user data needs to be sent to the server for inference
- Low latency: once the model is loaded, predictions do not depend on the internet connection, which is useful for high-throuput applications
- Easy deployment: no need to manage an inference server
Cons:
- Limited compatibility: ONNX Runtime-web remains experimental and not all operators are supported, which can
- Dependence on the user's device can limit performance
- Not appropriate for "big" models, both regarding network and performance issues
Using HuggingFace Models ​
There are 4 possible solutions to use a PyTorch model in a Marcelle application.
TODO
Write docs
Solution 1: Use a model hosted on Huggingface.co Recommended ​
TODO
Solution 2: client-side inference with Transformers.js Recommended for small transformers ​
In some cases, it can be useful to run the inference on the client side, to avoid, for instance, sending private client data to the server. It also simplifies the deployment of the application, as it is not necessary to run and maintain a web server that performs inference in real-time, and a static website might be enough.
For these scenarios, your HuggingFace Transformer model can be converted to a web-friendly format ro run inference in the web browser.
Pros:
- TODO
Cons:
- TODO
Solution 3: server-side inference with Ray ​
Another solution consists in using a Python web framework of serving library to expose the model at a HTTP endpoint. Several possibilities exist, including generic web frameworks such as Starlette.
We recommend using Ray Serve, a framework-agnostic model serving library for building online inference APIs. Ray enables you to expose your prediction function over an HTTP endpoint, which can be interogated from a lightweight custom Marcelle component and seamlessly integrated in a Marcelle App.
Pros:
- High compatibility: it is possible to run any Python code, with any ML framework
- Scalability: Ray facilitate scaling and using various architectures
- Independent from the client's capabilities
Cons:
- Requires setting up and managing a HTTP server
- Requires sending client data to the server
Solution 4: server-side inference through the backend Recommended for long inference ​
This solution is recommended when inference is long (e.g. in generative tasks) and requires real-time monitoring. In that case, it is possible to use a data service on the Marcelle backend to manage and monitor inference runs. Predictions are requested by the web client, and the status of the processing can be updated from Python to enable realtime monitoring in the web client, thanks to websocket communication.
Pros:
- Realtime monitoring: during inference, the Python code can update the status
- Python scripts are websocket clients, meaning that the machine running the Python code does not need to run a server and expose an endpoint publicly.
Cons:
- Complex for simple use cases
- Experimental: stability is not ensured
Using Machine Learning Models implemented in Python ​
If your model was developed using Python, there are two options.
Solution 1: server-side inference with Ray Recommended for most cases ​
In most cases, the simplest solution consists in using a Python web framework of serving library to expose the model at a HTTP endpoint. Several possibilities exist, including generic web frameworks such as Starlette.
We recommend using Ray Serve, a framework-agnostic model serving library for building online inference APIs. Ray enables you to expose your prediction function over an HTTP endpoint, which can be interogated from a lightweight custom Marcelle component and seamlessly integrated in a Marcelle App.
Pros:
- High compatibility: it is possible to run any Python code, with any ML framework
- Scalability: Ray facilitate scaling and using various architectures
- Independent from the client's capabilities
Cons:
- Requires setting up and managing a HTTP server
- Requires sending client data to the server
Solution 2: server-side inference through the backend Recommended for long inference ​
This solution is recommended when inference is long (e.g. in generative tasks) and requires real-time monitoring. In that case, it is possible to use a data service on the Marcelle backend to manage and monitor inference runs. Predictions are requested by the web client, and the status of the processing can be updated from Python to enable realtime monitoring in the web client, thanks to websocket communication.
Pros:
- Realtime monitoring: during inference, the Python code can update the status
- Python scripts are websocket clients, meaning that the machine running the Python code does not need to run a server and expose an endpoint publicly.
Cons:
- Complex for simple use cases
- Experimental: stability is not ensured
Using Machine Learning Models implemented in JavaScript ​
If your model was developed using a JavaScript library, do some JavaScript.
TODO
Tutorial and example with MLJS?
Good for you! ​
Train your model and come back to the start.