Statistical Analysis

This notebook contains routines for analyzing the output of keypoint-MoSeq.

Note

The interactive widgets require jupyterlab launched from the keypoint_moseq environment. They will not work properly in jupyter notebook.

Setup

We assume you have already have keypoint-MoSeq outputs that are organized as follows.

<project_dir>/               ** current working directory
└── <model_name>/            ** model directory
    ├── results.h5           ** model results
    └── grid_movies/         ** [Optional] grid movies folder

Use the code below to enter in your project directory and model name.

import keypoint_moseq as kpms

project_dir='path/to/project' # the full path to the project directory
model_name='model_name' # name of model to analyze (e.g. something like `2023_05_23-15_19_03`)

Assign Groups

The goal of this step is to assign group labels (such as “mutant” or “wildtype”) to each recording. These labels are important later for performing group-wise comparisons.

  • The code below creates a table called {project_dir}/index.csv and launches a widget for editing the table. To use the widget:

    • Click cells in the “group” column and enter new group labels.

    • Hit Save group info when you’re done.

  • If the widget doesn’t appear, you also edit the table directly in Excel or LibreOffice Calc.

kpms.interactive_group_setting(project_dir, model_name)

Generate dataframes

Generate a pandas dataframe called moseq_df that contains syllable labels and kinematic information for each frame across all the recording sessions.

moseq_df = kpms.compute_moseq_df(project_dir, model_name, smooth_heading=True) 
moseq_df
name centroid_x centroid_y heading angular_velocity velocity_px_s syllable frame_index group onset
0 21_11_8_one_mouse.top.irDLC_resnet50_moseq_exa... 245.691668 210.796020 -1.217558 0.000000 0.000000 7 0 mutant True
1 21_11_8_one_mouse.top.irDLC_resnet50_moseq_exa... 246.797705 208.926666 -1.217558 -0.079308 65.161529 7 1 mutant False
2 21_11_8_one_mouse.top.irDLC_resnet50_moseq_exa... 246.880092 208.750297 -1.227725 -0.160751 5.839875 7 2 mutant False
3 21_11_8_one_mouse.top.irDLC_resnet50_moseq_exa... 247.338747 206.761270 -1.240335 -0.246197 61.236711 7 3 mutant False
4 21_11_8_one_mouse.top.irDLC_resnet50_moseq_exa... 248.073132 205.021514 -1.240335 -0.336099 56.652130 7 4 mutant False
... ... ... ... ... ... ... ... ... ... ...
643906 22_27_04_cage4_mouse2_0.top.irDLC_resnet50_mos... 217.514933 196.355583 0.193480 -0.618441 565.377392 12 53618 default False
643907 22_27_04_cage4_mouse2_0.top.irDLC_resnet50_mos... 202.928966 183.149695 0.086206 -0.318911 590.280706 12 53619 default False
643908 22_27_04_cage4_mouse2_0.top.irDLC_resnet50_mos... 187.950492 169.656667 0.193808 -0.126241 604.793226 12 53620 default False
643909 22_27_04_cage4_mouse2_0.top.irDLC_resnet50_mos... 173.977080 155.679578 0.302726 -0.026824 592.919672 12 53621 default False
643910 22_27_04_cage4_mouse2_0.top.irDLC_resnet50_mos... 158.353112 143.198606 0.136897 0.004577 599.912268 12 53622 default False

643911 rows × 10 columns

Next generate a dataframe called stats_df that contains summary statistics for each syllable in each recording session, such as its usage frequency and its distribution of kinematic parameters.

stats_df = kpms.compute_stats_df(
    project_dir,
    model_name,
    moseq_df, 
    min_frequency=0.005,       # threshold frequency for including a syllable in the dataframe
    groupby=['group', 'name'], # column(s) to group the dataframe by
    fps=30)                    # frame rate of the video from which keypoints were inferred

stats_df
group name syllable heading_mean heading_std heading_min heading_max angular_velocity_mean angular_velocity_std angular_velocity_min angular_velocity_max velocity_px_s_mean velocity_px_s_std velocity_px_s_min velocity_px_s_max frequency duration
0 default 21_12_10_def6a_1_1.top.irDLC_resnet50_moseq_ex... 0 -0.185726 1.675327 -3.141268 3.141216 0.006244 5.969984 -188.391070 188.423102 23.873621 18.251097 0.052929 250.661913 0.193452 1.278718
1 default 21_12_10_def6a_1_1.top.irDLC_resnet50_moseq_ex... 1 -0.186075 1.545966 -3.138770 3.141024 -0.047082 9.326394 -188.077053 3.634033 39.028218 31.174287 0.688662 216.440283 0.113839 0.884532
2 default 21_12_10_def6a_1_1.top.irDLC_resnet50_moseq_ex... 2 -0.270518 1.744452 -3.141173 3.140562 -0.237998 12.575613 -187.851585 188.203382 61.339926 45.208260 0.699143 249.923269 0.130208 0.846095
3 default 21_12_10_def6a_1_1.top.irDLC_resnet50_moseq_ex... 3 0.300797 1.392815 -3.139225 3.140805 -0.005278 10.043521 -188.285953 188.290531 35.085909 27.308359 0.488694 234.898832 0.112351 1.552759
4 default 21_12_10_def6a_1_1.top.irDLC_resnet50_moseq_ex... 4 0.004180 1.850509 -3.136158 3.138170 0.292483 13.734939 -188.161345 188.074944 38.752501 31.086649 0.536752 204.338343 0.068452 1.219203
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
163 mutant 22_04_26_cage4_1_1.top.irDLC_resnet50_moseq_ex... 12 -0.083118 2.002411 -3.127063 3.121900 1.286797 16.501619 -11.525765 186.041027 54.491124 35.084645 1.652698 219.963610 0.020379 0.280460
164 mutant 22_04_26_cage4_1_1.top.irDLC_resnet50_moseq_ex... 13 -0.460300 1.651639 -2.784471 2.309956 0.353800 1.015509 -2.823881 4.302075 41.289938 28.827368 1.202097 199.331569 0.011244 0.675000
165 mutant 22_04_26_cage4_1_1.top.irDLC_resnet50_moseq_ex... 14 -1.045921 2.058440 -3.125407 3.129775 -0.195097 18.098854 -187.265397 184.470467 29.156940 16.905312 0.490859 93.623653 0.010541 0.473333
166 mutant 22_04_26_cage4_1_1.top.irDLC_resnet50_moseq_ex... 15 -0.480634 1.747967 -2.962049 2.290607 -0.912562 1.141117 -2.711569 2.690855 38.201989 19.694319 2.358622 92.955113 0.004216 0.588889
167 mutant 22_04_26_cage4_1_1.top.irDLC_resnet50_moseq_ex... 16 -1.007293 1.526657 -3.033172 3.117420 0.956545 14.582857 -2.738624 186.765689 39.685048 27.693812 4.199067 161.559462 0.004919 0.785714

168 rows × 17 columns

Optional: Save dataframes to csv

Uncomment the code below to save the dataframes as .csv files

# import os

# # save moseq_df
# save_dir = os.path.join(project_dir, model_name) # directory to save the moseq_df dataframe
# moseq_df.to_csv(os.path.join(save_dir, 'moseq_df.csv'), index=False)
# print('Saved `moseq_df` dataframe to', save_dir)

# # save stats_df
# save_dir = os.path.join(project_dir, model_name)
# stats_df.to_csv(os.path.join(save_dir, 'stats_df'), index=False)
# print('Saved `stats_df` dataframe to', save_dir)

Label syllables

The goal of this step is name each syllable (e.g., “rear up” or “walk slowly”).

  • The code below creates an empty table at {project_dir}/{model_name}/syll_info.csv and launches an interactive widget for editing the table. To use the widget:

    • Select a syllable from the dropdown to display its grid movie.

    • Enter a name into the label column of the table (and optionally a short description too).

    • When you are done, hit Save syllable info at the bottom of the table.

  • If the widget doesn’t appear, you can also edit the file directly in Excel or LibreOffice Calc.

kpms.label_syllables(project_dir, model_name, moseq_df)