DEV Community

Raj
Raj

Posted on

python group multiple

I have the dataset (attached image) contains device and sensor data with a timestamp. I want to generate aggregates for every minute, device and sensor. I tried this expression but it does not help. As there are two devices and three sensors, I should get only 6 records per minute but the expression below returns more than that. Can anyone help fix my code please?

deviceid (S):timestamp (S):datatype (S):value (N)
BSM_DEV01:2021-06-27T23-13-12.788845:Temperature:101.4
BSM_DEV01:2021-06-27T23-13-13.333129:SPO2:90
BSM_DEV01:2021-06-27T23-13-13.925137:HeartRate:106
BSM_DEV01:2021-06-27T23-13-14.187335:HeartRate:74
BSM_DEV01:2021-06-27T23-13-14.796972:HeartRate:90
BSM_DEV01:2021-06-27T23-13-15.794070:HeartRate:96
BSM_DEV02:2021-06-27T23-13-47.802607:HeartRate:78
BSM_DEV02:2021-06-27T23-13-48.800688:HeartRate:91
BSM_DEV02:2021-06-27T23-13-49.802867:HeartRate:68
BSM_DEV02:2021-06-27T23-13-50.789980:HeartRate:77
BSM_DEV02:2021-06-27T23-13-51.799726:HeartRate:95
BSM_DEV02:2021-06-27T23-13-52.801113:SPO2:89
BSM_DEV02:2021-06-27T23-13-53.112701:HeartRate:95
BSM_DEV02:2021-06-27T23-13-53.790853:HeartRate:84
BSM_DEV02:2021-06-27T23-13-54.794649:HeartRate:84
BSM_DEV02:2021-06-27T23-13-55.794330:HeartRate:107
BSM_DEV02:2021-06-27T23-13-56.792531:HeartRate:106
BSM_DEV01:2021-06-27T23-14-00.790879:HeartRate:89
BSM_DEV01:2021-06-27T23-14-01.789911:HeartRate:102
BSM_DEV01:2021-06-27T23-14-02.789042:SPO2:89
BSM_DEV01:2021-06-27T23-14-03.083248:HeartRate:99
BSM_DEV01:2021-06-27T23-14-03.795986:HeartRate:104
BSM_DEV01:2021-06-27T23-14-04.800193:HeartRate:85
BSM_DEV01:2021-06-27T23-14-05.799052:HeartRate:80
BSM_DEV01:2021-06-27T23-14-06.800894:HeartRate:83
BSM_DEV01:2021-06-27T23-14-07.803821:HeartRate:94
BSM_DEV01:2021-06-27T23-14-08.804032:HeartRate:75
BSM_DEV01:2021-06-27T23-14-09.794357:HeartRate:75
BSM_DEV01:2021-06-27T23-14-10.799049:HeartRate:85
BSM_DEV01:2021-06-27T23-14-11.796780:HeartRate:88
BSM_DEV01:2021-06-27T23-14-12.796486:Temperature:98
BSM_DEV02:2021-06-27T23-14-13.104431:SPO2:91
BSM_DEV02:2021-06-27T23-14-13.392373:HeartRate:81
BSM_DEV02:2021-06-27T23-14-13.792080:HeartRate:98
BSM_DEV02:2021-06-27T23-14-14.791361:HeartRate:78
BSM_DEV02:2021-06-27T23-14-15.801512:HeartRate:77
BSM_DEV02:2021-06-27T23-14-16.802908:HeartRate:80
BSM_DEV02:2021-06-27T23-14-17.802081:HeartRate:92
BSM_DEV02:2021-06-27T23-14-18.801312:HeartRate:88
BSM_DEV02:2021-06-27T23-14-19.803425:HeartRate:115
BSM_DEV02:2021-06-27T23-14-20.797984:HeartRate:64
BSM_DEV02:2021-06-27T23-14-21.792599:HeartRate:85
BSM_DEV02:2021-06-27T23-14-22.794500:SPO2:87
BSM_DEV02:2021-06-27T23-14-23.093992:HeartRate:84
BSM_DEV02:2021-06-27T23-14-23.793901:HeartRate:54
BSM_DEV02:2021-06-27T23-14-24.802120:HeartRate:80
BSM_DEV02:2021-06-27T23-14-25.790220:HeartRate:69
BSM_DEV02:2021-06-27T23-14-26.795034:HeartRate:108
BSM_DEV02:2021-06-27T23-14-27.791092:Temperature:99.1

items_by_minute = itertools.groupby(
            items, 
            key=lambda x: (x["timestamp"][:16], x['deviceid'], x['datatype'])
        )

        # Calculate the statistics for each minute
        for minute, items in items_by_minute:
            values_per_minute = [item["value"] for item in items]

            avg = statistics.mean(values_per_minute)
            min_value = min(values_per_minute)
            max_value = max(values_per_minute)

            print(f"Minute: {minute} / Average {avg} | Min {min_value} / Max {max_value}")
Enter fullscreen mode Exit fullscreen mode

Top comments (1)

Collapse
 
krisrajzlive profile image
Raj

I have fixed it myself by calling sorted method before calling groupby