DEV Community

Max Myroshnychenko
Max Myroshnychenko

Posted on • Edited on

Pandas tools you didn’t know you needed, part 1: Apply

Dataframe rows can contain arrays rather than a single value. This is the case with Datajoint's "blobs." Let's examine an instance of such a dataframe I fetched from a datajoint table:

fid brain_region single_unit single_unit_phy spike_time waveform snr
0 bucket_1_m026_1568757659_ pfc 0 12 [ 1.71006667 2.14576667 3.35306667 3.6982 22.4359 22.80743333 [] []
26.00536667 26.61663333 28.39046667 29.45536667]
1 bucket_1_m026_1568757659_ pfc 1 15 [17.8181 37.5422 37.90253333 39.36106667 46.77196667 60.7908 [] []
77.62783333 78.71806667 82.33186667 93.15203333]
2 bucket_1_m026_1568757659_ pfc 2 19 [ 1.552 115.68683333 115.69416667 160.0344 342.9736 [] []
346.51526667 346.8301 348.25513333 348.2767 348.29066667]
3 bucket_1_m026_1568757659_ pfc 3 37 [ 64.1048 145.53183333 185.3421 187.57793333 281.31683333 [] []
304.80466667 326.5742 339.3119 348.08556667 350.2595 ]
4 bucket_1_m026_1568757659_ pfc 4 40 [ 0.66776667 3.9761 4.8187 16.72106667 22.01286667 25.6975 [] []
29.069 29.25703333 30.07223333 31.55446667]

We'd like to get the interspike intervals.
In this case, 
np.diff(df['spike_time'])
will not work as it will attempt to work across rows:

array([array([16.10803333, 35.39643333, 34.54946667, 35.66286667, 24.33606667,
37.98336667, 51.62246667, 52.10143333, 53.9414 , 63.69666667]),
array([-16.2661 , 78.14463333, 77.79163333, 120.67333333,
296.20163333, 285.72446667, 269.20226667, 269.53706667,
265.94483333, 255.13863333]),
array([ 62.5528 , 29.845 , 69.64793333, 27.54353333,
-61.65676667, -41.7106 , -20.2559 , -8.94323333,
-0.19113333, 1.96883333]),
...,
array([29.80516667, 38.70003333, 40.26796667, 53.5446 , 45.7118 ,
55.4116 , 55.9258 , 79.12003333, 77.03993333, 83.48073333]),
array([-11.45576667, -11.35606667, -9.69276667, -23.187 ,
-22.03416667, -31.00213333, -29.06093333, -51.92653333,
-51.33696667, -57.59203333]),
array([ 72.3652 , 92.1018 , 157.1723 , 173.74453333,
185.13646667, 188.13473333, 187.84116667, 187.3681 ,
206.6341 , 206.1082 ])], dtype=object)

You can use a for loop, but there is an easier way.

To get our operation of interest to work row by row without looping, use apply:

df['ISI']=df['spike_time'].apply(np.diff)
df['ISI']
Enter fullscreen mode Exit fullscreen mode
fid brain_region single_unit single_unit_phy spike_time waveform snr ISI
0 bucket_1_m026_1568757659_ pfc 0 12 [ 1.71006667 2.14576667 3.35306667 3.6982 22.4359 22.80743333 [] [] [ 0.4357 1.2073 0.34513333 18.7377 0.37153333 3.19793333
26.00536667 26.61663333 28.39046667 29.45536667] 0.61126667 1.77383333 1.0649 ]
1 bucket_1_m026_1568757659_ pfc 1 15 [17.8181 37.5422 37.90253333 39.36106667 46.77196667 60.7908 [] [] [19.7241 0.36033333 1.45853333 7.4109 14.01883333 16.83703333
77.62783333 78.71806667 82.33186667 93.15203333] 1.09023333 3.6138 10.82016667]
2 bucket_1_m026_1568757659_ pfc 2 19 [ 1.552 115.68683333 115.69416667 160.0344 342.9736 [] [] [1.14134833e+02 7.33333333e-03 4.43402333e+01 1.82939200e+02
346.51526667 346.8301 348.25513333 348.2767 348.29066667] 3.54166667e+00 3.14833333e-01 1.42503333e+00 2.15666667e-02
1.39666667e-02]
3 bucket_1_m026_1568757659_ pfc 3 37 [ 64.1048 145.53183333 185.3421 187.57793333 281.31683333 [] [] [81.42703333 39.81026667 2.23583333 93.7389 23.48783333 21.76953333
304.80466667 326.5742 339.3119 348.08556667 350.2595 ] 12.7377 8.77366667 2.17393333]
4 bucket_1_m026_1568757659_ pfc 4 40 [ 0.66776667 3.9761 4.8187 16.72106667 22.01286667 25.6975 [] [] [ 3.30833333 0.8426 11.90236667 5.2918 3.68463333 3.3715
29.069 29.25703333 30.07223333 31.55446667] 0.18803333 0.8152 1.48223333]

This get the mean without having to resort to looping over rows. If you have a more complex procedure in mind, simply define it before using apply:

def inverse_isi(spikes):
    """
    Inverse mean ISI of spikes is an alternative measure of firing rate
    """
    return 1/np.mean(np.diff(spikes))
Enter fullscreen mode Exit fullscreen mode

and then apply it in the same way

df['ISI-based firing rate']=df['blob_column'].apply(inverse_isi)
Enter fullscreen mode Exit fullscreen mode

Top comments (0)