This tweet resonated with me while reading “Why small pull requests are better”:
500 lines of code = looks fine.
But is this actually true? In this post I check using pull requests from Flutter inner source projects.
Understanding the dataset
I extracted pull request and review data from the Flutter-Global organisation since its creation – 80000 pull requests, 3100 repositories and 180000 reviews.
For this blog post, I will only be using pull requests that have been merged and also be using a selection of properties from these three datasets, which are explained in the table below.
|Repositories||Primary Language||Primary language used in the repository|
|Pull requests||Changes||Sum of additions and deletions|
|Reviews||State||Review state which can be Approved, Commented, Changed Requested and Dismissed|
Pull request size distribution
Let’s look at the PR size distribution:
Most pull requests have up to 1000 changes. However, one important note is that pull request size distribution can potentially change depending on the purpose of the repository. For example, 100 changes in an application repository may be considered a small pull request, but in a configuration repository 100 changes can actually be considered a significant configuration change.
In order to determine if repository type influences pull request distribution, let’s look at the pull request distribution for Java* and Ruby** repositories:
The pull request distribution is, indeed, different depending on the repository primary language. Thus, this blog post will focus only on Java repositories, in order to avoid reaching too general of a conclusion that would be generally inaccurate.
Having our dataset with only Java repositories and their respective pull requests and reviews, I decided to categorise pull requests into three size categories, according to the following rules:
- small pull requests have up to 100 changes
- medium pull requests have between, not including, 100 and 1000 changes
- large pull requests have at least 1000 changes
The vast majority of pull request fall under the small category, which is reassuring.
* Commonly used language for application repositories
** Commonly used language for chef repositories, which are configuration repositories
Pull request size impact on reviews
To start, let’s take a look at the review state distribution per pull request size category:
The first thing to note is that the CHANGES_REQUESTED type of review is not widely used, which means most reviewers use the COMMENTED type of review to suggest changes.
It’s possible to see that bigger pull requests have more COMMENTED type reviews than smaller pull requests, which means the bigger the pull request, the higher the effort to leave it in a ready for approval state. However, it is good to see that as the pull request size gets bigger, the amount of effort also increases.
Although bigger pull requests have more reviews, it doesn’t mean those reviews are as thorough as the ones for smaller pull requests. Let’s take a look at review count distribution per 100 changes:
It’s possible to see that as the pull request size increases, the number of reviews per 100 changes decreases.
This blog post showed that:
- pull request size distribution depends on the repository type (application, config, etc.)
- the majority of pull requests of Java repositories have up to 100 changes
- bigger pull requests have more reviews of type COMMENTED, which most likely represent suggestions for change
- as pull request size increases, reviews tend to be less and less thorough
by: Tiago Almeida
tags: Pull Requests