Skip to content

How to use my model? ​

We propose a simple decision tree to to help you understand how to integrate a machine learning model in a Marcelle application.

Do you have a pre-existing model?

What do you want to do?

What framework do you use?

What do you want to do?

What language to you want to use?

Using PyTorch Models ​

There are three possible solutions to use a PyTorch model in a Marcelle application.

In most cases, the simplest solution consists in using a Python web framework of serving library to expose the model at a HTTP endpoint. Several possibilities exist, including Torch-specific libraries such as TorchServe or generic web frameworks such as Starlette.

We recommend using Ray Serve, a framework-agnostic model serving library for building online inference APIs. Ray enables you to expose your prediction function over an HTTP endpoint, which can be interogated from a lightweight custom Marcelle component and seamlessly integrated in a Marcelle App.

Pros:

  • High compatibility: it is possible to run any Python code, with any ML framework
  • Scalability: Ray facilitate scaling and using various architectures
  • Independent from the client's capabilities

Cons:

  • Requires setting up and managing a HTTP server
  • Requires sending client data to the server
See the guide

In some cases, it can be useful to run the inference on the client side, to avoid, for instance, sending private client data to the server. It also simplifies the deployment of the application, as it is not necessary to run and maintain a web server that performs inference in real-time, and a static website might be enough.

For these scenarios, your PyTorch model can be converted to the ONNX format (Open Neural Network Exchange), so that inference is performed in the web browser using onnxruntime.

Pros:

  • Privacy: no user data needs to be sent to the server for inference
  • Low latency: once the model is loaded, predictions do not depend on the internet connection, which is useful for high-throuput applications
  • Easy deployment: no need to manage an inference server

Cons:

  • Limited compatibility: ONNX Runtime-web remains experimental and not all operators are supported, which can
  • Dependence on the user's device can limit performance
  • Not appropriate for "big" models, both regarding network and performance issues
See the guide

This solution is recommended when inference is long (e.g. in generative tasks) and requires real-time monitoring. In that case, it is possible to use a data service on the Marcelle backend to manage and monitor inference runs. Predictions are requested by the web client, and the status of the processing can be updated from Python to enable realtime monitoring in the web client, thanks to websocket communication.

Pros:

  • Realtime monitoring: during inference, the Python code can update the status
  • Python scripts are websocket clients, meaning that the machine running the Python code does not need to run a server and expose an endpoint publicly.

Cons:

  • Complex for simple use cases
  • Experimental: stability is not ensured
See the guide

Using Tensorflow or Keras Models ​

There are 3 possible solutions to use a Tensorflow model in a Marcelle application.

TODO

Write docs

Solution 1: client-side inference with Tensorflow.js ​

In some cases, it can be useful to run the inference on the client side, to avoid, for instance, sending private client data to the server. It also simplifies the deployment of the application, as it is not necessary to run and maintain a web server that performs inference in real-time, and a static website might be enough.

For these scenarios, your Tensorflow model can be converted to a web-friendly format ro run inference in the web browser using the Tensorflow.js library.

Pros:

  • TODO

Cons:

  • TODO
See the guide

Solution 2: server-side inference with Ray ​

Another solution consists in using a Python web framework of serving library to expose the model at a HTTP endpoint. Several possibilities exist, including Tensorflow-specific libraries such as TFX's TensorFlow Serving or generic web frameworks such as Starlette.

We recommend using Ray Serve, a framework-agnostic model serving library for building online inference APIs. Ray enables you to expose your prediction function over an HTTP endpoint, which can be interogated from a lightweight custom Marcelle component and seamlessly integrated in a Marcelle App.

Pros:

  • High compatibility: it is possible to run any Python code, with any ML framework
  • Scalability: Ray facilitate scaling and using various architectures
  • Independent from the client's capabilities

Cons:

  • Requires setting up and managing a HTTP server
  • Requires sending client data to the server
See the guide

This solution is recommended when inference is long (e.g. in generative tasks) and requires real-time monitoring. In that case, it is possible to use a data service on the Marcelle backend to manage and monitor inference runs. Predictions are requested by the web client, and the status of the processing can be updated from Python to enable realtime monitoring in the web client, thanks to websocket communication.

Pros:

  • Realtime monitoring: during inference, the Python code can update the status
  • Python scripts are websocket clients, meaning that the machine running the Python code does not need to run a server and expose an endpoint publicly.

Cons:

  • Complex for simple use cases
  • Experimental: stability is not ensured
See the guide

Using Scikit-Learn Models ​

There are two possible solutions to use a Scikit-Learn model in a Marcelle application.

In most cases, the simplest solution consists in using a Python web framework of serving library to expose the model at a HTTP endpoint. Several possibilities exist, including generic web frameworks such as Starlette.

We recommend using Ray Serve, a framework-agnostic model serving library for building online inference APIs. Ray enables you to expose your prediction function over an HTTP endpoint, which can be interogated from a lightweight custom Marcelle component and seamlessly integrated in a Marcelle App.

Pros:

  • High compatibility: it is possible to run any Python code, with any ML framework
  • Scalability: Ray facilitate scaling and using various architectures
  • Independent from the client's capabilities

Cons:

  • Requires setting up and managing a HTTP server
  • Requires sending client data to the server
See the guide

In some cases, it can be useful to run the inference on the client side, to avoid, for instance, sending private client data to the server. It also simplifies the deployment of the application, as it is not necessary to run and maintain a web server that performs inference in real-time, and a static website might be enough.

For these scenarios, your Scikit-Learn model can be converted to the ONNX format (Open Neural Network Exchange), so that inference is performed in the web browser using onnxruntime.

Pros:

  • Privacy: no user data needs to be sent to the server for inference
  • Low latency: once the model is loaded, predictions do not depend on the internet connection, which is useful for high-throuput applications
  • Easy deployment: no need to manage an inference server

Cons:

  • Limited compatibility: ONNX Runtime-web remains experimental and not all operators are supported, which can
  • Dependence on the user's device can limit performance
  • Not appropriate for "big" models, both regarding network and performance issues
See the guide

Using HuggingFace Models ​

There are 4 possible solutions to use a PyTorch model in a Marcelle application.

TODO

Write docs

TODO

In some cases, it can be useful to run the inference on the client side, to avoid, for instance, sending private client data to the server. It also simplifies the deployment of the application, as it is not necessary to run and maintain a web server that performs inference in real-time, and a static website might be enough.

For these scenarios, your HuggingFace Transformer model can be converted to a web-friendly format ro run inference in the web browser.

Pros:

  • TODO

Cons:

  • TODO
See the guide

Solution 3: server-side inference with Ray ​

Another solution consists in using a Python web framework of serving library to expose the model at a HTTP endpoint. Several possibilities exist, including generic web frameworks such as Starlette.

We recommend using Ray Serve, a framework-agnostic model serving library for building online inference APIs. Ray enables you to expose your prediction function over an HTTP endpoint, which can be interogated from a lightweight custom Marcelle component and seamlessly integrated in a Marcelle App.

Pros:

  • High compatibility: it is possible to run any Python code, with any ML framework
  • Scalability: Ray facilitate scaling and using various architectures
  • Independent from the client's capabilities

Cons:

  • Requires setting up and managing a HTTP server
  • Requires sending client data to the server
See the guide

This solution is recommended when inference is long (e.g. in generative tasks) and requires real-time monitoring. In that case, it is possible to use a data service on the Marcelle backend to manage and monitor inference runs. Predictions are requested by the web client, and the status of the processing can be updated from Python to enable realtime monitoring in the web client, thanks to websocket communication.

Pros:

  • Realtime monitoring: during inference, the Python code can update the status
  • Python scripts are websocket clients, meaning that the machine running the Python code does not need to run a server and expose an endpoint publicly.

Cons:

  • Complex for simple use cases
  • Experimental: stability is not ensured
See the guide

Using Machine Learning Models implemented in Python ​

If your model was developed using Python, there are two options.

In most cases, the simplest solution consists in using a Python web framework of serving library to expose the model at a HTTP endpoint. Several possibilities exist, including generic web frameworks such as Starlette.

We recommend using Ray Serve, a framework-agnostic model serving library for building online inference APIs. Ray enables you to expose your prediction function over an HTTP endpoint, which can be interogated from a lightweight custom Marcelle component and seamlessly integrated in a Marcelle App.

Pros:

  • High compatibility: it is possible to run any Python code, with any ML framework
  • Scalability: Ray facilitate scaling and using various architectures
  • Independent from the client's capabilities

Cons:

  • Requires setting up and managing a HTTP server
  • Requires sending client data to the server
See the guide

This solution is recommended when inference is long (e.g. in generative tasks) and requires real-time monitoring. In that case, it is possible to use a data service on the Marcelle backend to manage and monitor inference runs. Predictions are requested by the web client, and the status of the processing can be updated from Python to enable realtime monitoring in the web client, thanks to websocket communication.

Pros:

  • Realtime monitoring: during inference, the Python code can update the status
  • Python scripts are websocket clients, meaning that the machine running the Python code does not need to run a server and expose an endpoint publicly.

Cons:

  • Complex for simple use cases
  • Experimental: stability is not ensured
See the guide

Using Machine Learning Models implemented in JavaScript ​

If your model was developed using a JavaScript library, do some JavaScript.

TODO

Tutorial and example with MLJS?

Good for you! ​

Train your model and come back to the start.

Training Models from Marcelle data in Python ​

Tout est bon dans le Python

See the guide

Training Models from Marcelle data in JavaScript ​

TODO

See the guide