Training Models from Marcelle Data
Some applications require training machine learning models from user data. Sometimes, it is important to provide users with personalized models that are fine-tuned on their personal data. In other cases, models can be improved over time by retraining periodically using data collected accross several users.
This guide describes how to train or fine-tune machine learning models from data stored in a Marcelle data store. As described in the dedicated guides (see Data Management), Marcelle provides a server-side data management system that can be easily configured and deployed. Marcelle Data Stores are structured in services that store collections of data. A Marcelle Dataset is typically a collection of instances, which can be manipulated with CRUD operations from Python or JavaScript.
The rest of this guide describes how to train models in JavaScript, or in Python. We will take image classification as a simple example, and describe both JavaScript and Python implementation of a user-defined image classifier trained from the webcam.
Collecting Data with the Webcam
Let's start by generating a new Marcelle application:
npm init marcelle marcelle-training
cd marcelle-training
npm installyarn create marcelle marcelle-training
cd marcelle-training
yarnpnpm create marcelle marcelle-training
cd marcelle-training
pnpm iSelect the default options of the CLI. Then, make sure the application works by running the development server:
npm run devAnd open http://localhost:5173 in your browser.
Then, update the main application entry point (src/index.js) with the following code:
src/index.js
import '@marcellejs/core/dist/marcelle.css';
import '@marcellejs/gui-widgets/dist/marcelle-gui-widgets.css';
import '@marcellejs/layouts/dist/marcelle-layouts.css';
import { dataset, datasetBrowser, dataStore, webcam } from '@marcellejs/core';
import { button, textInput } from '@marcellejs/gui-widgets';
import { dashboard } from '@marcellejs/layouts';
import { mobileNet } from '@marcellejs/tensorflow';
import { filter, from, map, mergeMap, zip } from 'rxjs';
// -----------------------------------------------------------
// INPUT PIPELINE & DATA CAPTURE
// -----------------------------------------------------------
const input = webcam();
const featureExtractor = mobileNet();
const label = textInput();
label.title = 'Instance label';
const capture = button('Hold to record instances');
capture.title = 'Capture instances to the training set';
const store = dataStore('localStorage');
const trainingSet = dataset('training-set-dashboard', store);
const trainingSetBrowser = datasetBrowser(trainingSet);
const $instances = zip(input.$images, input.$thumbnails).pipe(
filter(() => capture.$pressed.getValue()),
map(async ([img, thumbnail]) => ({
x: await featureExtractor.process(img),
y: label.$value.getValue(),
thumbnail,
})),
mergeMap((x) => from(x)),
);
$instances.subscribe(trainingSet.create);
// -----------------------------------------------------------
// DASHBOARDS
// -----------------------------------------------------------
const dash = dashboard({
title: 'Marcelle Example - Training Models',
author: 'Myself',
});
dash
.page('Data Management')
.sidebar(input, featureExtractor)
.use([label, capture], trainingSetBrowser);
dash.settings.datasets(trainingSet);
dash.show();in JavaScript
Love on the beat