4. π¦ MISAW Dataset PreprocessingΒΆ
This notebook is intended to preprocess data from the MISAW dataset, commonly used for computer vision and machine learning tasks. It covers:
Downloading and extracting the dataset
Exploring the data structure
Parsing image and label files
π§° Importing Required LibrariesΒΆ
This section loads essential Python libraries like os, cv2, glob, and pandas which are needed for handling files, images, and data manipulation.
[1]:
import os
import cv2
import os
import glob
import pandas as pd
π₯ Downloading the DatasetΒΆ
Here we download the MISAW dataset in .zip format from a Dropbox link. This dataset contains videos and annotations for surgical workflow analysis.
[2]:
!wget -O MISAW.zip https://www.dropbox.com/scl/fi/psmlokrc5ms958ggqyv3u/MISAW.zip?rlkey=v91dz437npon5zz10olrbcqcd&st=54qvf31m&dl=0
--2025-03-26 11:57:17-- https://www.dropbox.com/scl/fi/psmlokrc5ms958ggqyv3u/MISAW.zip?rlkey=v91dz437npon5zz10olrbcqcd
Resolving www.dropbox.com (www.dropbox.com)... 162.125.65.18, 2620:100:6021:18::a27d:4112
Connecting to www.dropbox.com (www.dropbox.com)|162.125.65.18|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://ucdefc310fd419b691f38459327c.dl.dropboxusercontent.com/cd/0/inline/CmmpI3wyR5_tPJy7n5y9tf_oS3KPpKLx_9U8HksV7ZvYG2aPwmk5bBZhGZpp_MkbzVHVAvgLvkhCXhopOfhORd2U-8LXcbpvHfXK5FqGlsGgiONJp0xEnFCGp8J8snwtylxxHt7HAHw_za1Or3Pggoo0/file# [following]
--2025-03-26 11:57:17-- https://ucdefc310fd419b691f38459327c.dl.dropboxusercontent.com/cd/0/inline/CmmpI3wyR5_tPJy7n5y9tf_oS3KPpKLx_9U8HksV7ZvYG2aPwmk5bBZhGZpp_MkbzVHVAvgLvkhCXhopOfhORd2U-8LXcbpvHfXK5FqGlsGgiONJp0xEnFCGp8J8snwtylxxHt7HAHw_za1Or3Pggoo0/file
Resolving ucdefc310fd419b691f38459327c.dl.dropboxusercontent.com (ucdefc310fd419b691f38459327c.dl.dropboxusercontent.com)... 162.125.65.15, 2620:100:6021:15::a27d:410f
Connecting to ucdefc310fd419b691f38459327c.dl.dropboxusercontent.com (ucdefc310fd419b691f38459327c.dl.dropboxusercontent.com)|162.125.65.15|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: /cd/0/inline2/CmmM0ycv4drPTVNKeAud9nrLuqUaZKKYAJxQQ0rD75yitrkRrUMvgvjkVkpVy-P1sbUWqcbV5ZnDa9mPv2CL5NW4sh0nB3eET26PVJSObiiBia3LqmjTXaEfgTGkc4HfviZ3fy5gToa_TGZSn2dEfT3CBR6WcwZeCKq00CXRBmvBcLWX0lF6KGz4uaiUHQ528YKbAytcAk5FsGWcAFIEO0YWps_7bG5dj_Ca-KJb751ndhx3eCtPrnIQFy7SxaZherKvIaY9DJcVS8QI3bXHdEbZV9rKnhTYYKgo-0goqVZ_X8hKFgrO1SP3DKjU4OlVQYwpL62a9aXv2X069GfNqq5-i9faASF6ln7ECTGqEpK9oqHR0QBG8tvzHNWJXF7Ov1Y/file [following]
--2025-03-26 11:57:18-- https://ucdefc310fd419b691f38459327c.dl.dropboxusercontent.com/cd/0/inline2/CmmM0ycv4drPTVNKeAud9nrLuqUaZKKYAJxQQ0rD75yitrkRrUMvgvjkVkpVy-P1sbUWqcbV5ZnDa9mPv2CL5NW4sh0nB3eET26PVJSObiiBia3LqmjTXaEfgTGkc4HfviZ3fy5gToa_TGZSn2dEfT3CBR6WcwZeCKq00CXRBmvBcLWX0lF6KGz4uaiUHQ528YKbAytcAk5FsGWcAFIEO0YWps_7bG5dj_Ca-KJb751ndhx3eCtPrnIQFy7SxaZherKvIaY9DJcVS8QI3bXHdEbZV9rKnhTYYKgo-0goqVZ_X8hKFgrO1SP3DKjU4OlVQYwpL62a9aXv2X069GfNqq5-i9faASF6ln7ECTGqEpK9oqHR0QBG8tvzHNWJXF7Ov1Y/file
Reusing existing connection to ucdefc310fd419b691f38459327c.dl.dropboxusercontent.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 965450623 (921M) [application/zip]
Saving to: βMISAW.zipβ
MISAW.zip 100%[===================>] 920.72M 32.0MB/s in 26s
2025-03-26 11:57:44 (36.0 MB/s) - βMISAW.zipβ saved [965450623/965450623]
π¦ Extracting the DatasetΒΆ
This cell unzips the downloaded file to access the raw data.
[3]:
!unzip -qq MISAW.zip
ποΈ Frame Extraction from VideosΒΆ
This section reads the videos and extracts every N-th frame (controlled by resample_rate). Each video gets its own subdirectory of frames, organized as:
MISAW/
βββ train/
βββ Frames/
βββ <video_id>/frame_0000.jpg
These images will be later paired with annotations.
[4]:
import os
import cv2 # OpenCV is used for handling video files and frame extraction
# ----------------------------------------
# 1. Configuration
# ----------------------------------------
# Choose how often to save a frame (e.g., every 120th frame)
# A higher number results in fewer frames being saved. Start high for fast testing.
resample_rate = 120
# Path to the folder containing input videos
video_folder = 'MISAW/train/Video'
# Path where extracted frames will be saved
frames_folder = 'MISAW/train/Frames'
# Create the frames folder if it doesn't exist
os.makedirs(frames_folder, exist_ok=True)
# ----------------------------------------
# 2. Process each video file in the folder
# ----------------------------------------
for video_file in os.listdir(video_folder):
# Check for supported video formats
if video_file.endswith(('.mp4', '.avi', '.mov')):
video_path = os.path.join(video_folder, video_file) # Full path to video file
video_name = os.path.splitext(video_file)[0] # Extract video name without extension
# Create a subfolder for the frames of this video
video_frames_path = os.path.join(frames_folder, video_name)
os.makedirs(video_frames_path, exist_ok=True)
# Open the video file
cap = cv2.VideoCapture(video_path)
# Counters to keep track of frames read and saved
frame_count = 0 # Total number of frames read
saved_count = 0 # Number of frames saved
# ----------------------------------------
# 3. Read frames one-by-one
# ----------------------------------------
while True:
success, frame = cap.read() # Read one frame
if not success:
break # End of video
# Save every N-th frame (based on resample_rate)
if frame_count % resample_rate == 0:
# Save frame with a 4-digit padded name (e.g., frame_0001.jpg)
# `:04d` formats the integer with 4 digits, padded with zeros
frame_name = f"frame_{saved_count:04d}.jpg"
# Full path to save the frame
frame_path = os.path.join(video_frames_path, frame_name)
# Save the frame as an image file
cv2.imwrite(frame_path, frame)
# Increment saved frame counter
saved_count += 1
# Increment total frame counter
frame_count += 1
# Release the video capture object
cap.release()
# Print a summary for this video
print(f"Saved {saved_count} frames from {video_file}")
# ----------------------------------------
# 4. Final message
# ----------------------------------------
print("Done!")
Saved 123 frames from 2_3.mp4
Saved 100 frames from 2_2.mp4
Saved 57 frames from 1_3.mp4
Saved 40 frames from 3_2.mp4
Saved 50 frames from 2_6.mp4
Saved 79 frames from 2_1.mp4
Saved 32 frames from 6_1.mp4
Saved 63 frames from 1_1.mp4
Saved 47 frames from 1_2.mp4
Saved 49 frames from 3_3.mp4
Saved 61 frames from 6_2.mp4
Saved 108 frames from 2_5.mp4
Saved 51 frames from 6_3.mp4
Saved 37 frames from 3_1.mp4
Saved 83 frames from 1_4.mp4
Saved 36 frames from 6_4.mp4
Saved 63 frames from 2_4.mp4
Done!
π‘ Explanation: f-strings and formattingΒΆ
An f-string allows inline expressions in strings:
name = "Benjamin"
print(f"Hello, {name}!") # Outputs: Hello, Benjamin!
When saving frame names, we want them zero-padded to 4 digits so theyβre sorted correctly:
frame_num = 7
print(f"frame_{frame_num:04d}.jpg") # Outputs: frame_0007.jpg
0β pad with zeros4β total length of 4 digitsdβ itβs a decimal (integer)
This ensures your filenames look like: frame_0000.jpg, frame_0001.jpg, β¦, frame_0123.jpg.
π§Ύ Parsing Annotations and Building Frame-Label PairsΒΆ
Here we parse the .txt annotation files (one per video) to extract surgical phases, then pair each extracted frame with a corresponding label. We also map textual phase labels to numeric IDs, which is important for training machine learning models.
[5]:
# Folder where the annotation .txt files are stored (one per video)
annotation_folder = 'MISAW/train/Procedural decription'
# Folder where extracted frames from videos are stored (in subfolders per video)
frames_folder = 'MISAW/train/Frames'
# This is the same rate you used to extract frames from videos (e.g., every 120th frame)
# It must match or your labels won't align with the frames!
resample_rate = 120
# --- Collect all unique phases first ---
# We'll store all unique phase labels (like "Idle", "Suturing", etc.) in this set
all_phases = set()
# Find all annotation files in the folder (e.g., 1_1_annotation.txt, 1_2_annotation.txt, etc.)
annotation_files = sorted(glob.glob(os.path.join(annotation_folder, '*_annotation.txt')))
# Loop through each annotation file and collect unique phase names
for anno_file in annotation_files:
df = pd.read_csv(anno_file, sep='\t') # Load the .txt file into a DataFrame
all_phases.update(df['Phase'].unique()) # Add unique phases to the set
# Create a dictionary to map each phase string to a unique integer ID
# Useful for training machine learning models
phase_to_id = {name: i for i, name in enumerate(sorted(all_phases))}
# Create the inverse mapping (ID to phase name) for visualization later
id_to_phase = {i: name for name, i in phase_to_id.items()}
# --- Build (frame_path, label) pairs ---
# This list will hold tuples like: ("path/to/frame.jpg", 2) β (frame, label_id)
all_data = []
# Go through each annotation file (one per video)
for anno_file in annotation_files:
# Extract the video ID from the filename (e.g., "1_1_annotation.txt" β "1_1")
video_id = os.path.basename(anno_file).replace('_annotation.txt', '')
# Construct the path to the corresponding frame folder
frame_dir = os.path.join(frames_folder, video_id)
# Read the annotation file into a DataFrame
df = pd.read_csv(anno_file, sep='\t')
# Get the full list of phases (one per original video frame)
phases = df['Phase'].tolist()
# Resample: only keep every N-th label (e.g., every 120th label)
sampled_phases = phases[::resample_rate]
# Convert phase strings to numeric labels using our earlier mapping
sampled_ids = [phase_to_id[p] for p in sampled_phases]
# Get the list of frame image paths, sorted so they match the order of labels
frame_paths = sorted(glob.glob(os.path.join(frame_dir, '*.jpg')))
# Sanity check: if the number of frames doesnβt match the number of labels, skip this video
if len(frame_paths) != len(sampled_ids):
print(f"β οΈ Mismatch for {video_id}: {len(frame_paths)} frames vs {len(sampled_ids)} labels")
continue
# Add all (frame_path, label_id) pairs to our global list
all_data.extend(zip(frame_paths, sampled_ids))
# Final print to confirm total number of samples loaded
print(f"β
Total samples: {len(all_data)}")
β
Total samples: 1079
Visualize images and phasesΒΆ
[6]:
import random
import matplotlib.pyplot as plt
from PIL import Image
from collections import defaultdict
# CONFIGURATION
# Number of sample images to display per phase (i.e., class/label)
samples_per_phase = 3
# Create a dictionary to group image paths by their label (phase ID)
# defaultdict(list) automatically creates an empty list for new keys
label_to_paths = defaultdict(list)
# all_data is assumed to be a list of (frame_path, label_id) pairs
# Here we organize all frame paths under their respective label IDs
for path, label_id in all_data:
label_to_paths[label_id].append(path)
# Determine the number of unique phases (rows in the final plot)
n_rows = len(label_to_paths)
# Number of columns equals the number of samples we want to show per phase
n_cols = samples_per_phase
# Create a grid of subplots (n_rows x n_cols)
# figsize sets the overall size of the figure (in inches)
fig, axes = plt.subplots(n_rows, n_cols, figsize=(4 * n_cols, 4 * n_rows))
# Add a main title above all the subplots
fig.suptitle("Random Samples per Phase", fontsize=18)
# Get all label IDs in sorted order (for consistent row display)
sorted_labels = sorted(label_to_paths.keys())
# Loop over each row (i.e., each unique label/phase)
for row, label_id in enumerate(sorted_labels):
# Convert label ID back to its readable name using id_to_phase mapping
label_name = id_to_phase[label_id]
# Randomly sample a few images from this label
# Ensure we don't try to sample more images than we actually have
samples = random.sample(label_to_paths[label_id], min(samples_per_phase, len(label_to_paths[label_id])))
# For each column (i.e., each sampled image for this label)
for col in range(samples_per_phase):
# Access the correct subplot (row x col)
# If there's only one row, axes may be 1D
ax = axes[row, col] if n_rows > 1 else axes[col]
# Only plot if we have enough samples for this column
if col < len(samples):
# Open the image file using PIL
img = Image.open(samples[col])
# Display the image in the subplot
ax.imshow(img)
# Set the subplot title to the phase name
ax.set_title(f"{label_name}", fontsize=10)
# Remove axis ticks and labels for a cleaner look
ax.axis('off')
# Adjust layout to prevent overlaps
plt.tight_layout()
# Adjust top spacing to make room for the main title
plt.subplots_adjust(top=0.95)
# Display the final grid of images
plt.show()
π TODO: Customize the Annotation GranularityΒΆ
"Phase" column in the annotation files."Verb_Left", "Tool_Right", etc.)."Phase" with the desired column name when reading annotations.*_annotation.txt file inside the folder:MISAW/train/Procedural description/π Dataset Formats for Surgical Phase RecognitionΒΆ
In this notebook, we explore three common dataset formats used in computer vision and deep learning. All formats are based on the original MISAW/train/Frames and MISAW/train/Procedural description data, and are designed to support different model architectures and training needs:
1. COCO-style (JSON-based)ΒΆ
annotations.json file, where each entry includes the image path, video ID, frame name, and label.2. ImageNet-style (Folder-based)ΒΆ
Suturing/, Idle/, etc.), as expected by libraries like torchvision.datasets.ImageFolder.3. CSV-based (Flat Index)ΒΆ
annotations.csv file, which can be loaded with pandas for quick filtering, sampling, and data manipulation.Each format comes with TODO sections in the scripts β the idea is for you to:
π§ Fill in the missing parts (e.g., replace
"Phase"with another label like"Step")β Use the validation cells provided to check if the format is correctly built
Once completed, these formats will allow you to train and evaluate models easily using PyTorch or other frameworks.
ποΈ Converting to ImageNet-style FormatΒΆ
Organizes the dataset in a class-wise folder structure suitable for image classification tasks, like this:
MISAW_imnet/
βββ Phase1/
β βββ video1_frame_0000.jpg
βββ Phase2/
βββ video2_frame_0000.jpg
[7]:
import os # For directory operations
import glob # For listing files using patterns
import pandas as pd # For reading annotation files
import shutil # For copying images
# --- CONFIGURATION ---
resample_rate = 120 # Use the same value you used when extracting frames
# Path to folder containing annotation files (one .txt file per video)
annotation_folder = 'MISAW/train/Procedural decription'
# Path to the folder where extracted frames are stored
frames_folder = 'MISAW/train/Frames'
# Path to the output ImageNet-style dataset
output_root = 'MISAW_imnet'
os.makedirs(output_root, exist_ok=True) # Create output folder if it doesn't exist
# --- BUILD IMAGE FOLDER STRUCTURE BY PHASE (LABEL) ---
# Loop through all annotation files (e.g., 1_1_annotation.txt, 1_2_annotation.txt, ...)
for anno_file in glob.glob(f'{annotation_folder}/*_annotation.txt'):
# Extract the video ID (e.g., "1_1" from "1_1_annotation.txt")
video_id = os.path.basename(anno_file).replace('_annotation.txt', '')
# Read the annotation file as a DataFrame
df = pd.read_csv(anno_file, sep='\t')
# Extract the 'Phase' column and resample it (e.g., every 120th label)
phases = df['Phase'].tolist()[::resample_rate]
# Get the corresponding frame image paths (already extracted frames)
frame_paths = sorted(glob.glob(f"{frames_folder}/{video_id}/*.jpg"))
# Check that the number of labels matches the number of frames
if len(phases) != len(frame_paths):
print(f"β οΈ Skipping {video_id}: mismatched {len(phases)} labels vs {len(frame_paths)} frames")
continue # Skip this video if the lengths don't match
# For each (frame, phase) pair
for i, (frame_path, phase) in enumerate(zip(frame_paths, phases)):
# Create the folder for this phase if it doesn't exist
phase_folder = os.path.join(output_root, phase)
os.makedirs(phase_folder, exist_ok=True)
# Create a consistent new filename (e.g., "1_1_frame_0004.jpg")
new_name = f"{video_id}_frame_{i:04d}.jpg"
# Copy the frame to the corresponding phase folder
shutil.copy(frame_path, os.path.join(phase_folder, new_name))
# Final message
print(f"β
ImageNet-style folders created in {output_root}/")
β
ImageNet-style folders created in MISAW_imnet/
[8]:
import os
from collections import defaultdict
# Path to ImageNet root
imnet_root = 'MISAW_imnet'
phases = [d for d in os.listdir(imnet_root) if os.path.isdir(os.path.join(imnet_root, d))]
assert phases, "β No class folders found in ImageNet-style structure"
# Collect file counts per phase
phase_counts = defaultdict(int)
total = 0
for phase in phases:
path = os.path.join(imnet_root, phase)
files = [f for f in os.listdir(path) if f.endswith(('.jpg', '.png'))]
phase_counts[phase] = len(files)
total += len(files)
print(f"β
Found {len(phases)} phases with a total of {total} images")
# Show sample per phase
for phase in sorted(phase_counts.keys())[:3]:
print(f"π {phase}: {phase_counts[phase]} images")
sample = os.listdir(os.path.join(imnet_root, phase))[0]
print(f" e.g., {sample}")
β
Found 3 phases with a total of 1079 images
π Idle: 28 images
e.g., 2_4_frame_0001.jpg
π Knot tying: 659 images
e.g., 2_2_frame_0027.jpg
π Suturing: 392 images
e.g., 6_4_frame_0003.jpg
π Converting to CSV-style FormatΒΆ
Stores the dataset in a flat folder of frames and uses a CSV file to keep track of the mapping between frame paths and their class labels:
MISAW_csv/
βββ Frames/
β βββ video1_frame_0000.jpg
βββ annotations.csv
[9]:
import os # For working with directories
import glob # For listing files with wildcard patterns
import pandas as pd # For reading/writing CSV and annotation files
import shutil # For copying image files
# --- CONFIGURATION ---
resample_rate = 120 # Must match the rate used when extracting frames
# Folder where annotation .txt files are stored (e.g., "1_1_annotation.txt")
annotation_folder = 'MISAW/train/Procedural decription'
# Folder where extracted frames are stored (per video folder)
frames_folder = 'MISAW/train/Frames'
# Folder where the final CSV-based dataset will be saved
output_root = 'MISAW_csv'
# Subfolder to store all copied frames in a flat format (not per phase or per video)
frames_out_root = os.path.join(output_root, 'Frames')
os.makedirs(frames_out_root, exist_ok=True) # Create folder if it doesn't exist
# List to store all rows of the CSV (each row = one frame and its label)
rows = []
# --- PROCESS EACH VIDEO ANNOTATION FILE ---
# Loop through all annotation files (one per video)
for anno_file in glob.glob(f'{annotation_folder}/*_annotation.txt'):
# Extract the video ID from the filename (e.g., "1_1" from "1_1_annotation.txt")
video_id = os.path.basename(anno_file).replace('_annotation.txt', '')
# Load the annotation file as a DataFrame
df = pd.read_csv(anno_file, sep='\t')
# Get the 'Phase' labels and resample them (e.g., every 120th label)
phases = df['Phase'].tolist()[::resample_rate]
# Get corresponding resampled frame paths
frame_paths = sorted(glob.glob(f"{frames_folder}/{video_id}/*.jpg"))
# Ensure that the number of frames matches the number of labels
if len(phases) != len(frame_paths):
print(f"β οΈ Skipping {video_id}: mismatched {len(phases)} labels vs {len(frame_paths)} frames")
continue
# Process each (frame, label) pair
for i, (frame_path, phase) in enumerate(zip(frame_paths, phases)):
# Create a consistent new filename (e.g., "1_1_frame_0004.jpg")
new_frame_name = f"{video_id}_frame_{i:04d}.jpg"
# Path where this frame will be copied
new_frame_path = os.path.join(frames_out_root, new_frame_name)
# Copy the frame to the output folder
shutil.copy(frame_path, new_frame_path)
# Create a new row entry for the CSV file
rows.append({
"video_id": video_id,
"frame_name": new_frame_name,
"path": new_frame_path,
"label": phase
})
# --- SAVE THE CSV FILE ---
# Convert the list of rows to a DataFrame
df_out = pd.DataFrame(rows)
# Save the DataFrame as a CSV file
df_out.to_csv(os.path.join(output_root, 'annotations.csv'), index=False)
# Final confirmation message
print(f"β
CSV-style index saved in {output_root}/annotations.csv")
β
CSV-style index saved in MISAW_csv/annotations.csv
[10]:
import os
import pandas as pd
# Path to CSV structure
csv_root = 'MISAW_csv'
csv_file = os.path.join(csv_root, 'annotations.csv')
frames_folder = os.path.join(csv_root, 'Frames')
# Check existence
assert os.path.exists(csv_file), "β annotations.csv not found"
assert os.path.isdir(frames_folder), "β Frames folder not found"
# Load CSV and validate structure
df = pd.read_csv(csv_file)
# Check column names
expected_cols = {'video_id', 'frame_name', 'path', 'label'}
missing_cols = expected_cols - set(df.columns)
assert not missing_cols, f"β Missing columns in CSV: {missing_cols}"
# Check path existence
missing = df[~df['path'].apply(os.path.exists)]
# Print summary
print(f"β
CSV loaded with {len(df)} rows")
print(f"π Sample row:\n{df.iloc[0]}")
if not missing.empty:
print(f"β οΈ {len(missing)} listed frames are missing from disk")
else:
print("β
All listed frame paths exist")
β
CSV loaded with 1079 rows
π Sample row:
video_id 1_4
frame_name 1_4_frame_0000.jpg
path MISAW_csv/Frames/1_4_frame_0000.jpg
label Idle
Name: 0, dtype: object
β
All listed frame paths exist
π Converting to COCO-style FormatΒΆ
This section creates a new structure suitable for COCO-style datasets used in deep learning. It includes:
MISAW_coco/
βββ Frames/
β βββ video1_frame_0000.jpg
βββ annotations.json
βββ phase_to_id.json
This format is helpful for multi-label or object detection tasks.
[11]:
import os # For file and directory operations
import glob # For finding files using wildcard patterns
import pandas as pd # For reading annotation files
import json # For saving annotations in JSON format
import shutil # For copying images
# --- CONFIGURATION ---
resample_rate = 120 # Match this with your frame extraction step size
# Folder where annotation .txt files are stored
annotation_folder = 'MISAW/train/Procedural decription'
# Folder where frames are stored (e.g., MISAW/train/Frames/1_1/frame_0000.jpg)
frames_folder = 'MISAW/train/Frames'
# Output root folder for the new COCO-style dataset
output_root = 'MISAW_coco'
# Inside this, all frames will be copied to one flat "Frames" folder
frames_out_root = os.path.join(output_root, 'Frames')
os.makedirs(frames_out_root, exist_ok=True) # Create the folder if it doesn't exist
# --- STEP 1: COLLECT UNIQUE PHASES FROM ALL ANNOTATIONS ---
# Set to store all phase names (e.g., Idle, Suturing, etc.)
all_phases = set()
# Loop through each annotation file and collect phase names
for anno_file in glob.glob(f'{annotation_folder}/*_annotation.txt'):
df = pd.read_csv(anno_file, sep='\t') # Load tab-separated .txt file as a DataFrame
all_phases.update(df['Phase'].unique()) # Add unique phases to the set
# Create dictionaries to map phases to integer IDs and back
phase_to_id = {p: i for i, p in enumerate(sorted(all_phases))} # e.g., "Suturing" β 2
id_to_phase = {i: p for p, i in phase_to_id.items()} # e.g., 2 β "Suturing"
# --- STEP 2: BUILD JSON ENTRIES AND COPY FRAMES ---
entries = [] # This will hold all annotation entries
# Process each video annotation
for anno_file in glob.glob(f'{annotation_folder}/*_annotation.txt'):
video_id = os.path.basename(anno_file).replace('_annotation.txt', '') # e.g., "1_1"
# Read annotation file
df = pd.read_csv(anno_file, sep='\t')
# Resample phase labels to match saved frames (e.g., take every 120th label)
phases = df['Phase'].tolist()[::resample_rate]
# Get list of resampled frame image paths
frame_dir = os.path.join(frames_folder, video_id)
frame_paths = sorted(glob.glob(os.path.join(frame_dir, '*.jpg')))
# Ensure the number of frames and labels match
if len(phases) != len(frame_paths):
print(f"β οΈ Skipping {video_id}: mismatched {len(phases)} labels vs {len(frame_paths)} frames")
continue
# Loop over each frame-label pair
for i, (frame_path, phase) in enumerate(zip(frame_paths, phases)):
# Create a new frame name (e.g., "1_1_frame_0003.jpg")
new_frame_name = f"{video_id}_frame_{i:04d}.jpg"
# Full destination path for the copied frame
new_frame_path = os.path.join(frames_out_root, new_frame_name)
# Copy frame to the output folder
shutil.copy(frame_path, new_frame_path)
# Add a dictionary entry for this frame in COCO-style format
entries.append({
"video": video_id,
"frame": new_frame_name,
"path": new_frame_path,
"label": phase,
"label_id": phase_to_id[phase]
})
# --- STEP 3: SAVE TO DISK ---
# Save all frame annotations to a JSON file
with open(os.path.join(output_root, 'annotations.json'), 'w') as f:
json.dump(entries, f, indent=2)
# Save the phase-to-ID mapping for future use
with open(os.path.join(output_root, 'phase_to_id.json'), 'w') as f:
json.dump(phase_to_id, f, indent=2)
# Final confirmation
print(f"β
COCO-style structure created in {output_root}/ with {len(entries)} annotated frames.")
β
COCO-style structure created in MISAW_coco/ with 1079 annotated frames.
[12]:
import os
import json
# Path to COCO folder
coco_root = 'MISAW_coco'
json_path = os.path.join(coco_root, 'annotations.json')
frames_folder = os.path.join(coco_root, 'Frames')
# Check paths
assert os.path.exists(json_path), "β annotations.json not found"
assert os.path.isdir(frames_folder), "β Frames folder not found"
# Load and check content
with open(json_path, 'r') as f:
annotations = json.load(f)
assert isinstance(annotations, list), "β annotations.json should contain a list"
assert 'frame' in annotations[0] and 'label' in annotations[0], "β Missing keys in annotation entries"
# Check file existence and print sample
missing = [ann for ann in annotations if not os.path.exists(ann['path'])]
print(f"β
{len(annotations)} total entries")
print(f"π Sample entry:\n{annotations[0]}")
print(f"β
Found {len(annotations) - len(missing)} valid frame paths")
if missing:
print(f"β οΈ {len(missing)} frames listed but not found on disk")
β
1079 total entries
π Sample entry:
{'video': '1_4', 'frame': '1_4_frame_0000.jpg', 'path': 'MISAW_coco/Frames/1_4_frame_0000.jpg', 'label': 'Idle', 'label_id': 0}
β
Found 1079 valid frame paths