This repository contains the code for the framework used to train ML-models for the research project BeACTIVE.
To effectively use the framework you will need the following:
- ACC data using the unisens format, either raw or preprocessed with the Movisens™ DataAnalyzer
- Mood values as described by Wilhelm&Schoebi(2007) [1]
- All software packages required by the framework
- install all requirements from the
requirements.txtfilepip install -r requirements.txt
- change
requirements-torch.txtdepending on your hardware and possible cuda version used and install requirementspip install -r requirements-torch.txt
- install all requirements from the
We recommend using hardware with support for CUDA (not required, but highly recommended)
A start to finish example is provided in example.py.
First, the paths to the the studys are associated with the corresponding adapter:
studys = {
"your\\path\\to\\Study_1": 1,
}Second, the Extractor class is used to create a dataset based on the provided studys. Invalid paths will result in an error, but not in exiting the script. The flags used to create a dataset are as follows:
- use_raw_data: This flag determines wether to use the raw acceleration data or the features extracted by the Movisens™ DataAnalyzer.
- use_lagged_mood: This flag determines wether the mood from the previous EMA should be included as a feature.
- window: This determines the length of the time window in minutes.
- locations: This list determines the sensor positions used for the dataset. The given string should be present in the corresponding folder name for the parameter to work as intended. If multiple locations are given, but the study does not contain all of them, the study is skipped.
my_extractor = Extractor(studys)
my_dataset = my_extractor.create_dataset(use_raw_data=True, use_lagged_mood=False, window=15, locations=['hip'])The ModelFactory class will create and train a ML-algorithm based on the previously created dataset
my_factory = ModelFactory(my_dataset)To get information about the implemented algorithms, the following can be used:
available_models = my_factory.get_available_model_types()A model can be created as follows with the default parameters:
my_model_1 = my_factory.create_model("lstm")However a model may also be created without training it straight away:
my_model_1 = my_factory.create_model("lstm", train=False)Or by changing the length of the training:
my_model_1 = my_factory.create_model("lstm", epochs=5)The performance of the model can be accessed as follows:
performance = my_model_1.get_performance()To use the framework, the following steps have to be performed:
- Adapters
- If your data source is already featured in one of your adapters, you can continue with step 2.
- If your data source is not featured in any available adapters, you need to get an overview of your data/folder structure. Then implement an adapter for your structure. A more detailed guide can be found here.
- Add your adapter to the extractor, i.e. assign an ID for your adapter and add it to the list. Then make sure to concatenate the extracted data to the corresponding lists. This includes your samples (data), the ground truth (labels) and the list that maps the samples to individuals (id).
- Data extraction
- To create a dataset, you will have to select the following parameters:
- Use_raw_data: This determines whether raw data or processed features are used
- use_lagged_mood: This determines if the last mood is added to the sample data
- window: This determines the length of the samples in minutes
- locations: This determines which sensor locations should be used. Value should be given as a list of locations. If only one location is to be used, the list should only have one element.
- A dataset with the given characteristics is returned
- To create a dataset, you will have to select the following parameters:
- Model creation
- To create a model, first create a model factory with your dataset created in step 2.
- Then select a model to be trained, the percentage of the data to be used as test set and the number of epochs the model should be trained
- The model factory can create multiple models with the provided dataset. If a different dataset is needed, you have to create a new model factory.
- Validation
- To further validate your model, you can use the validate function provided by each model. The function requires information about the study to save the validation statistics. This includes:
- The dataset to validate the model on (this can be a different dataset than the one the model was trained with)
- The study IDs that were used to train the model
- The window size used to create the training dataset
- The used model
- The percentage of the dataset used for the test set
- The number of epochs the model was trained for
- The sensor position used to create the training dataset
- The function then creates to csv-files, one containing the overall performance of the model and one containing the performance per individual and how many samples were included during training.
- To further validate your model, you can use the validate function provided by each model. The function requires information about the study to save the validation statistics. This includes:
available_studies = {
"E:\\data\\Studie_2": 2,
"E:\\data\\Studie_1": 1,
}
# get selected studies
selected_studies_indices = 1
used_studies = {key: value for key, value in available_studys.items() if value in selected_studies_indices}
# create a dataset from selected studies
my_extractor = Extractor(used_studies)
my_dataset = my_extractor.create_dataset(use_raw_data=True,use_lagged_mood=False,window=15,locations=['hip'])
# create a model based on the dataset
my_factory = ModelFactory(my_dataset)
available_models = my_factory.get_available_model_types() # get a list of implemented model types to chose from
desired_model_1 = available_models[0]
desired_model_2 = available_models[1]
my_model_1 = my_factory.create_model("lstm",split=0.8,epochs=10) # we can now create multiple models featuring our dataset
my_model_2 = my_factory.create_model(desired_model_2) # because the dataset is tied to the ModelFactory-object
my_model_1.validate(my_dataset, 1, 15, 'lstm', 0.8, 10, 'hip')The following section explains how to extend the framework for your needs.
An adapter has to implement one load function to load your data. The parameters and return values of the function are as follows:
| Parameter | Description | Data type |
|---|---|---|
| study_path | Path to the root directory of the study | String |
| use_raw_data | Flag for switching between raw data and feature engineering | Boolean |
| use_lagged_mood | Flag for switching between adding previous mood values to the data samples or not | Boolean |
| window | Lookback length in time, e.i. how long should your timeseries be. Given in minutes | Integer |
| location | Sensor position | String |
| id_creator | Handles creating UUIDs for each subject across multiple studies | IDCreator object |
| Return value | Description | Data type |
|---|---|---|
| data | Contains all the extracted samples from the study | Numpy array |
| mood | Contains the mood (EA,C,V) labels corresponding to the samples in data | Numpy array |
| id_list | List of UUIDs linking individual samples to subjects of the study | List |
The sampling process of the data is described below.
Visual description on how to create the return values.
The adapter must then be included in the Extractor class as well:
if studyID == 1:
logging.info("Study 1")
data, mood,ids = load_study1(path, use_raw_data=use_raw_data,use_lagged_mood=use_lagged_mood,window=window,location=location, id_creator=self.ID_creator)
data_list.append(data)
mood_list.append(mood)
id_list = id_list + idsTo implement a new model, the wrapper class has to implement the following functions:
- train(self, dataset, [model specific parameters]) → None
- This function
- validate(self, dataset, studies, time_window, model, split, epochs, locations, silent=False) → [test results]
The model must also be added to the Model class, both in the constructor and the validation function as shown below:
#get model based on available implementations
if model_type == "lstm":
self.model = LSTMModel(input_shape, output_shape)
elif model_type == "gru":
self.model = GRUModel(input_shape, output_shape)
elif model_type == "xgb":
self.model = XGBmodel()
elif model_type == "svm":
self.model = SVMModel()
else:
self.model = None
logging.error(f"No model implementation available for {model_type}") def validate(self, dataset, studies, time_window, model, split, epochs, locations):
if self.model_type == 'lstm':
self.validate_nn2(dataset, studies, time_window, model, split, epochs, locations)
elif self.model_type == 'gru':
self.validate_nn2(dataset, studies, time_window, model, split, epochs, locations)
else:
self.model.validate(dataset, studies, time_window, model, split, epochs, locations) def train(self,dataset,epochs,split=0.8):
if self.model_type == 'lstm':
self.train_nn(dataset,epochs,split)
elif self.model_type == 'gru':
self.train_nn(dataset,epochs,split)
else:
self.model.train(dataset,epochs,split)Furthermore, the model has to be added to the ModelFactory class:
def __init__(self,dataset):
self.dataset = dataset
self.available_model_types = ["lstm", "gru","xgb", "svm"]
self.input_shape, self.output_shape = self.dataset.get_io()
self.model = None