File size: 3,954 Bytes
b4c56ea
 
 
 
 
 
 
 
64e3467
b4c56ea
32e7d7d
 
3d8fcda
b4c56ea
 
 
32e7d7d
b4c56ea
32e7d7d
b4c56ea
 
 
64e3467
 
b4c56ea
 
 
 
32e7d7d
b4c56ea
 
 
64e3467
 
b4c56ea
 
 
 
32e7d7d
b4c56ea
 
 
64e3467
 
b4c56ea
 
 
 
 
 
 
 
32e7d7d
b4c56ea
32e7d7d
b4c56ea
 
 
64e3467
 
b4c56ea
 
 
 
32e7d7d
b4c56ea
 
 
64e3467
 
b4c56ea
 
 
 
32e7d7d
b4c56ea
 
 
64e3467
 
 
 
b4c56ea
 
 
32e7d7d
b4c56ea
 
 
64e3467
 
 
 
b4c56ea
 
 
32e7d7d
b4c56ea
 
 
64e3467
 
32e7d7d
b4c56ea
 
 
32e7d7d
b4c56ea
 
 
64e3467
 
32e7d7d
b4c56ea
 
 
32e7d7d
b4c56ea
 
 
64e3467
 
b4c56ea
 
 
 
32e7d7d
b4c56ea
 
 
64e3467
 
b4c56ea
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
# Dataset Preparation Commands

## Overview

This document provides the commands to prepare various EMG datasets for pretraining and downstream tasks. Each dataset preparation script takes in raw data, processes it into overlapping windows, and saves the processed data in HDF5 format for efficient loading during model training.

Remember to add the flag `--download_data` if the dataset is not downloaded yet.

Substitute the `$DATA_PATH` environment variable with your path for saving the dataset.

The `seq_len` parameter in the scripts corresponds to the window size in samples, and the `stride` parameter corresponds to the step size between windows in samples. The sampling rate for the pretraining datasets is 2 kHz, while for the downstream datasets it is either 200 Hz or 2 kHz depending on the dataset.

The required libraries for running the scripts are located inside the `requirements.txt` file.

## Pretraining Datasets

For the pretraining datasets, we use a window size of 0.5 seconds with a 50% overlap at 2 kHz sampling rate:

### emg2pose (0.5 sec, 50% overlap)

```bash
python scripts/emg2pose.py \
    --data_dir $DATA_PATH/datasets/emg2pose_data/ \
    --save_dir $DATA_PATH/datasets/emg2pose_data/h5/ \
    --window_size 1000 \
    --stride 500
```

### Ninapro DB6 (0.5 sec, 50% overlap)

```bash
python scripts/db6.py \
    --data_dir $DATA_PATH/datasets/ninapro/DB6/ \
    --save_dir $DATA_PATH/datasets/ninapro/DB6/h5/ \
    --window_size 1000 \
    --stride 500
```

### Ninapro DB7 (0.5 sec, 50% overlap)

```bash
python scripts/db7.py \
    --data_dir $DATA_PATH/datasets/ninapro/DB7/ \
    --save_dir $DATA_PATH/datasets/ninapro/DB7/h5/ \
    --window_size 1000 \
    --stride 500
```

---

## Downstream Datasets

For the downstream tasks, gesture classification is performed on NinaPro DB5, EMG-EPN612, and UCI EMG datasets (200 Hz) while regression is performed on NinaPro DB8 (2 kHz).

### Ninapro DB5 (1 sec, 25% overlap)

```bash
python scripts/db5.py \
    --data_dir $DATA_PATH/datasets/ninapro/DB5/ \
    --save_dir $DATA_PATH/datasets/ninapro/DB5/h5/ \
    --window_size 200 \
    --stride 50
```

### Ninapro DB5 (5 sec, 25% overlap)

```bash
python scripts/db5.py \
    --data_dir $DATA_PATH/datasets/ninapro/DB5/ \
    --save_dir $DATA_PATH/datasets/ninapro/DB5/h5/ \
    --window_size 1000 \
    --stride 250
```

### EMG-EPN612 (1 sec, no overlap)

```bash
python scripts/epn.py \
    --data_dir $DATA_PATH/datasets/EPN612/ \
    --source_training $DATA_PATH/datasets/EPN612/trainingJSON/ \
    --source_testing $DATA_PATH/datasets/EPN612/testingJSON/ \
    --dest_dir $DATA_PATH/datasets/EPN612/h5/ \
    --window_size 200
```

### EMG-EPN612 (5 sec, no overlap)

```bash
python scripts/epn.py \
    --data_dir $DATA_PATH/datasets/EPN612/ \
    --source_training $DATA_PATH/datasets/EPN612/trainingJSON/ \
    --source_testing $DATA_PATH/datasets/EPN612/testingJSON/ \
    --dest_dir $DATA_PATH/datasets/EPN612/h5/ \
    --window_size 1000
```

### UCI EMG (1 sec, 25% overlap)

```bash
python scripts/uci.py \
    --data_dir $DATA_PATH/datasets/UCI_EMG/EMG_data_for_gestures-master/ \
    --save_dir $DATA_PATH/datasets/UCI_EMG/EMG_data_for_gestures-master/h5/ \
    --seq_len 200 \
    --stride 50
```

### UCI EMG (5 sec, 25% overlap)

```bash
python scripts/uci.py \
    --data_dir $DATA_PATH/datasets/UCI_EMG/EMG_data_for_gestures-master/ \
    --save_dir $DATA_PATH/datasets/UCI_EMG/EMG_data_for_gestures-master/h5/ \
    --seq_len 1000 \
    --stride 250
```

### Ninapro DB8 (100 ms, no overlap)

```bash
python scripts/db8.py \
    --data_dir $DATA_PATH/datasets/ninapro/DB8/ \
    --save_dir $DATA_PATH/datasets/ninapro/DB8/h5/ \
    --window_size 200 \
    --stride 200
```

### Ninapro DB8 (500 ms, no overlap)

```bash
python scripts/db8.py \
    --data_dir $DATA_PATH/datasets/ninapro/DB8/ \
    --save_dir $DATA_PATH/datasets/ninapro/DB8/h5/ \
    --window_size 1000 \
    --stride 1000
```