Introducing tidylens: An R Package for Computational Visual Analysis

Over the past few months, I’ve been building tidylens, an R package that extracts visual features from images and videos. If you’ve ever tried to measure something like color or shot length in a bunch of images or films using R, you know the pain. You end up stitching together Python scripts and a bunch of backends just to get a number out. With tidylens, every function returns a tibble. That’s it. That’s the pitch.

The short version: tidylens takes images, video, or audio and gives you back a tidy data frame of features you can immediately use with dplyr, ggplot2, and the rest of the R ecosystem you already know.

What’s in the Box?

Color analysis — extract_colourfulness() computes the Hasler-Suesstrunk colourfulness metric. extract_color_moments() gives you the Stricker-Orengo color moments: a compact 9-number summary of an image’s color distribution that’s great for similarity search.
Aesthetic fluency — extract_fluency_metrics() computes four measures from Mayer and Landwehr’s processing fluency. These predict aesthetic preferences and are pretty fun to play around with on art corpora.
Shot detection and cinemetrics — detect_shot_changes() finds cuts in video files, and film_compute_asl() computes Average Shot Length.
Deep learning embeddings — extract_embeddings() uses pretrained models via torch to generate vector representations of images. A ResNet-50 gives you a 2048-dimensional embedding per image. Good for clustering, similarity search, and all the usual things you’d want embeddings for.
Audio features — Because film is more than just pictures, the package also extracts acoustic features from video soundtracks. Admittedly, I don’t know much about audio, but with a mix of Googling and ChatGPT, I think I have put together a decent solution.

Why Tibbles?

This might sound boring, but it’s honestly the most important design decision. A call to extract_colourfulness("image.jpg") gives you a one-row tibble. Shot detection gives you a multi-row tibble with frame numbers and timestamps. You can pipe either directly into dplyr or ggplot2.

In practice, this means an art historian can extract color features from a thousand paintings, group by century, and make a ggplot in like five lines of code. A film scholar can compute ASL across hundreds of films and join the results with production metadata in a single script. Everything stays in R, and everything is reproducible.

If you work with visual materials in R, I’d love to hear what you think.