Introducing tidylens: An R Package for Computational Visual Analysis
tidylens brings validated image and video analysis algorithms into the R tidyverse ecosystem, making computational visual culture research accessible to humanities scholars.
Over the past few months, I’ve been building tidylens, an R package that extracts visual features from images and videos. If you’ve ever tried to measure something like color or shot length in a bunch of images or films using R, you know the pain. You end up stitching together Python scripts and a bunch of backends just to get a number out. With tidylens, every function returns a tibble. That’s it. That’s the pitch.
The short version: tidylens takes images, video, or audio and gives you back a tidy data frame of features you can immediately use with dplyr, ggplot2, and the rest of the R ecosystem you already know.
What’s in the Box?
-
Color analysis —
extract_colourfulness()computes the Hasler-Suesstrunk colourfulness metric.extract_color_moments()gives you the Stricker-Orengo color moments: a compact 9-number summary of an image’s color distribution that’s great for similarity search. -
Aesthetic fluency —
extract_fluency_metrics()computes four measures from Mayer and Landwehr’s processing fluency. These predict aesthetic preferences and are pretty fun to play around with on art corpora. -
Shot detection and cinemetrics —
detect_shot_changes()finds cuts in video files, andfilm_compute_asl()computes Average Shot Length. -
Deep learning embeddings —
extract_embeddings()uses pretrained models via torch to generate vector representations of images. A ResNet-50 gives you a 2048-dimensional embedding per image. Good for clustering, similarity search, and all the usual things you’d want embeddings for. -
Audio features — Because film is more than just pictures, the package also extracts acoustic features from video soundtracks. Admittedly, I don’t know much about audio, but with a mix of Googling and ChatGPT, I think I have put together a decent solution.
Why Tibbles?
This might sound boring, but it’s honestly the most important design decision. A call to extract_colourfulness("image.jpg") gives you a one-row tibble. Shot detection gives you a multi-row tibble with frame numbers and timestamps. You can pipe either directly into dplyr or ggplot2.
In practice, this means an art historian can extract color features from a thousand paintings, group by century, and make a ggplot in like five lines of code. A film scholar can compute ASL across hundreds of films and join the results with production metadata in a single script. Everything stays in R, and everything is reproducible.
If you work with visual materials in R, I’d love to hear what you think.