Statistical Analysis

This notebook contains routines for analyzing the output of keypoint-MoSeq.

Note

The interactive widgets require jupyterlab launched from the keypoint_moseq environment. They will not work properly in jupyter notebook.

Setup

We assume you have already have keypoint-MoSeq outputs that are organized as follows.

<project_dir>/               ** current working directory
└── <model_name>/            ** model directory
    ├── results.h5           ** model results
    └── grid_movies/         ** [Optional] grid movies folder

Use the code below to enter in your project directory and model name.

import keypoint_moseq as kpms

project_dir='path/to/project' # the full path to the project directory
model_name='model_name' # name of model to analyze (e.g. something like `2023_05_23-15_19_03`)

Assign Groups

The goal of this step is to assign group labels (such as “mutant” or “wildtype”) to each recording. These labels are important later for performing group-wise comparisons.

  • The code below creates a table called {project_dir}/index.csv and launches a widget for editing the table. To use the widget:

    • Click cells in the “group” column and enter new group labels.

    • Hit Save group info when you’re done.

  • If the widget doesn’t appear, you also edit the table directly in Excel or LibreOffice Calc.

kpms.interactive_group_setting(project_dir, model_name)

Generate dataframes

Generate a pandas dataframe called moseq_df that contains syllable labels and kinematic information for each frame across all the recording sessions.

moseq_df = kpms.compute_moseq_df(project_dir, model_name, smooth_heading=True) 
moseq_df
name centroid_x centroid_y heading angular_velocity velocity_px_s syllable frame_index group onset
0 21_11_8_one_mouse.top.irDLC_resnet50_moseq_exa... 245.691668 210.796020 -1.217558 0.000000 0.000000 7 0 mutant True
1 21_11_8_one_mouse.top.irDLC_resnet50_moseq_exa... 246.797705 208.926666 -1.217558 -0.079308 65.161529 7 1 mutant False
2 21_11_8_one_mouse.top.irDLC_resnet50_moseq_exa... 246.880092 208.750297 -1.227725 -0.160751 5.839875 7 2 mutant False
3 21_11_8_one_mouse.top.irDLC_resnet50_moseq_exa... 247.338747 206.761270 -1.240335 -0.246197 61.236711 7 3 mutant False
4 21_11_8_one_mouse.top.irDLC_resnet50_moseq_exa... 248.073132 205.021514 -1.240335 -0.336099 56.652130 7 4 mutant False
... ... ... ... ... ... ... ... ... ... ...
643906 22_27_04_cage4_mouse2_0.top.irDLC_resnet50_mos... 217.514933 196.355583 0.193480 -0.618441 565.377392 12 53618 default False
643907 22_27_04_cage4_mouse2_0.top.irDLC_resnet50_mos... 202.928966 183.149695 0.086206 -0.318911 590.280706 12 53619 default False
643908 22_27_04_cage4_mouse2_0.top.irDLC_resnet50_mos... 187.950492 169.656667 0.193808 -0.126241 604.793226 12 53620 default False
643909 22_27_04_cage4_mouse2_0.top.irDLC_resnet50_mos... 173.977080 155.679578 0.302726 -0.026824 592.919672 12 53621 default False
643910 22_27_04_cage4_mouse2_0.top.irDLC_resnet50_mos... 158.353112 143.198606 0.136897 0.004577 599.912268 12 53622 default False

643911 rows × 10 columns

Next generate a dataframe called stats_df that contains summary statistics for each syllable in each recording session, such as its usage frequency and its distribution of kinematic parameters.

stats_df = kpms.compute_stats_df(
    project_dir,
    model_name,
    moseq_df, 
    min_frequency=0.005,       # threshold frequency for including a syllable in the dataframe
    groupby=['group', 'name'], # column(s) to group the dataframe by
    fps=30)                    # frame rate of the video from which keypoints were inferred

stats_df
group name syllable heading_mean heading_std heading_min heading_max angular_velocity_mean angular_velocity_std angular_velocity_min angular_velocity_max velocity_px_s_mean velocity_px_s_std velocity_px_s_min velocity_px_s_max frequency duration
0 default 21_12_10_def6a_1_1.top.irDLC_resnet50_moseq_ex... 0 -0.185726 1.675327 -3.141268 3.141216 0.006244 5.969984 -188.391070 188.423102 23.873621 18.251097 0.052929 250.661913 0.193452 1.278718
1 default 21_12_10_def6a_1_1.top.irDLC_resnet50_moseq_ex... 1 -0.186075 1.545966 -3.138770 3.141024 -0.047082 9.326394 -188.077053 3.634033 39.028218 31.174287 0.688662 216.440283 0.113839 0.884532
2 default 21_12_10_def6a_1_1.top.irDLC_resnet50_moseq_ex... 2 -0.270518 1.744452 -3.141173 3.140562 -0.237998 12.575613 -187.851585 188.203382 61.339926 45.208260 0.699143 249.923269 0.130208 0.846095
3 default 21_12_10_def6a_1_1.top.irDLC_resnet50_moseq_ex... 3 0.300797 1.392815 -3.139225 3.140805 -0.005278 10.043521 -188.285953 188.290531 35.085909 27.308359 0.488694 234.898832 0.112351 1.552759
4 default 21_12_10_def6a_1_1.top.irDLC_resnet50_moseq_ex... 4 0.004180 1.850509 -3.136158 3.138170 0.292483 13.734939 -188.161345 188.074944 38.752501 31.086649 0.536752 204.338343 0.068452 1.219203
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
163 mutant 22_04_26_cage4_1_1.top.irDLC_resnet50_moseq_ex... 12 -0.083118 2.002411 -3.127063 3.121900 1.286797 16.501619 -11.525765 186.041027 54.491124 35.084645 1.652698 219.963610 0.020379 0.280460
164 mutant 22_04_26_cage4_1_1.top.irDLC_resnet50_moseq_ex... 13 -0.460300 1.651639 -2.784471 2.309956 0.353800 1.015509 -2.823881 4.302075 41.289938 28.827368 1.202097 199.331569 0.011244 0.675000
165 mutant 22_04_26_cage4_1_1.top.irDLC_resnet50_moseq_ex... 14 -1.045921 2.058440 -3.125407 3.129775 -0.195097 18.098854 -187.265397 184.470467 29.156940 16.905312 0.490859 93.623653 0.010541 0.473333
166 mutant 22_04_26_cage4_1_1.top.irDLC_resnet50_moseq_ex... 15 -0.480634 1.747967 -2.962049 2.290607 -0.912562 1.141117 -2.711569 2.690855 38.201989 19.694319 2.358622 92.955113 0.004216 0.588889
167 mutant 22_04_26_cage4_1_1.top.irDLC_resnet50_moseq_ex... 16 -1.007293 1.526657 -3.033172 3.117420 0.956545 14.582857 -2.738624 186.765689 39.685048 27.693812 4.199067 161.559462 0.004919 0.785714

168 rows × 17 columns

Optional: Save dataframes to csv

Uncomment the code below to save the dataframes as .csv files

# import os

# # save moseq_df
# save_dir = os.path.join(project_dir, model_name) # directory to save the moseq_df dataframe
# moseq_df.to_csv(os.path.join(save_dir, 'moseq_df.csv'), index=False)
# print('Saved `moseq_df` dataframe to', save_dir)

# # save stats_df
# save_dir = os.path.join(project_dir, model_name)
# stats_df.to_csv(os.path.join(save_dir, 'stats_df'), index=False)
# print('Saved `stats_df` dataframe to', save_dir)

Label syllables

The goal of this step is name each syllable (e.g., “rear up” or “walk slowly”).

  • The code below creates an empty table at {project_dir}/{model_name}/syll_info.csv and launches an interactive widget for editing the table. To use the widget:

    • Select a syllable from the dropdown to display its grid movie.

    • Enter a name into the label column of the table (and optionally a short description too).

    • When you are done, hit Save syllable info at the bottom of the table.

  • If the widget doesn’t appear, you can also edit the file directly in Excel or LibreOffice Calc.

kpms.label_syllables(project_dir, model_name, moseq_df) 

Compare between groups

Test for statistically significant differences between groups of recordings. The code below takes a syllable property (e.g. frequency or duration), plots its disribution for each syllable across for each group, and also tests whether the property differs significantly between groups. The results are summarized in a plot that is saved to {project_dir}/{model_name}/analysis_figures.

There are two options for setting the order of syllables along the x-axis. When order='stat', syllables are sorted by the mean value of the statistic. When order='diff', syllables are sorted by the magnitude of difference between two groups that are determined by the ctrl_group and exp_group keywords. Note ctrl_group and exp_group are not related to significance testing.

kpms.plot_syll_stats_with_sem(
    stats_df, project_dir, model_name,
    plot_sig=True,    # whether to mark statistical significance with a star
    thresh=0.05,      # significance threshold
    stat='frequency', # statistic to be plotted (e.g. 'duration' or 'velocity_px_s_mean')
    order='stat',     # order syllables by overall frequency ("stat") or degree of difference ("diff")
    ctrl_group='a',   # name of the control group for statistical testing
    exp_group='b',    # name of the experimental group for statistical testing
    figsize=(8, 4),   # figure size    
    groups=stats_df['group'].unique(), # groups to be plotted
);
Saved figure to ../../testing/demo_project/2023_08_01-10_16_25/figures/frequency_stat_stats.png
_images/7980604d1088487f8edb4c8cc1f4e9a956c558c9b74fd792cd3bef5137bb40aa.png

Transition matrices

Generate heatmaps showing the transition frequencies between syllables.

normalize='bigram' # normalization method ("bigram", "rows" or "columns")

trans_mats, usages, groups, syll_include=kpms.generate_transition_matrices(
    project_dir, model_name, normalize=normalize,
    min_frequency=0.005 # minimum syllable frequency to include
)    

kpms.visualize_transition_bigram(
    project_dir, model_name, groups, trans_mats, syll_include, normalize=normalize, 
    show_syllable_names=True # label syllables by index (False) or index and name (True)
)
Group(s): default, mutant
Saved figure to ../../testing/demo_project/2023_08_01-10_16_25/figures/transition_matrices.png
_images/7e1d3b28fffbe89c7f6f4d4bc9cfa85de768e1ebfecddb317d1e50e120400fc0.png

Syllable Transition Graph

Render transition rates in graph form, where nodes represent syllables and edges represent transitions between syllables, with edge width showing transition rate for each pair of syllables (secifically the max of the two transition rates in each direction).

# Generate a transition graph for each single group

kpms.plot_transition_graph_group(
    project_dir, model_name, 
    groups, trans_mats, usages, syll_include, 
    layout='circular',        # transition graph layout ("circular" or "spring")
    show_syllable_names=False # label syllables by index (False) or index and name (True)
)
Saved figure to ../../testing/demo_project/2023_08_01-10_16_25/figures/transition_graphs.png
_images/0554133120da3c1a2aa7b083b37b61952ab1db8587aa926e78821d52b5f6bd04.png
# Generate a difference-graph for each pair of groups.

kpms.plot_transition_graph_difference(project_dir, model_name, 
                                      groups, trans_mats, usages, syll_include, 
                                      layout='circular') # transition graph layout ("circular" or "spring")
Saved figure to ../../testing/demo_project/2023_08_01-10_16_25/figures/transition_graphs_diff.png
_images/542a3eb806cac606d8da7541661950fa9c9a061a8aceb2d00a85f599737d148b.png