Solving intractable rebase conflicts with git-bisect

Maintaining a long-lived series of changes against a fast-moving open source project can be a headache. Ideally we wouldn't put ourselves in this situation - instead we'd work with upstream to merge the changes and find better things to do with life. However, that's not always feasible.

Some of the bigger headaches are caused by merge conflicts. Sometimes conflicts are invisible as they are solved by clever merge strategies. Other times there's been significant rework of the code on both sides and you are forced to demonstrate you know what you're doing.

There comes a point where the size of the series you're maintaining and the progress of upstream yields rebase conflicts that are practically intractable to resolve. This was the situation we found ourselves in while maintaining a set of changes implementing Swordfish on top of OpenBMC's bmcweb.

The problem of conflicts comes in two flavours: overwhelming volume, or having to reason about too many problems simultaneously to have confidence that your resolutions are consistent and correct. With a large number of out-of-tree changes we frequently encounter both at once.

A source of pain in this situation is the behaviour of git rebase, which:

Rolls off your changes back to your branch's fork-point
Resets your current branch to the state of the target branch
Replays the rolled off patches on top of your new branch state

Step 2 combined with 3 is the source of our intractable conflicts, as there's no accounting for our changes relative to intermediate changes from upstream.

The 'work' branch rebased on 'origin/main'. The rebase fails with conflicts that are intractable to solve.

Managing the conflicts often becomes tractable if we address them as they occur at these intermediate commits. This strategy reduces the number of problems that we need to solve simultaneously to only those introduced by the offending commit.

So, what options do we have for that?

A tedious approach is to rebase our changes on top of each new upstream commit, for N rebase operations against N new upstream commits. This is guaranteed to find the first upstream commit causing a conflict, which is the fundamental property we're after. However it requires that we perform the rebase operation even if the current upstream commit doesn't generate any conflicts.

An alternative approach is to rebase our changes directly to the first upstream change that cause conflicts. However, this idea just shifts the problem: We now need to estimate which upstream changes cause conflict ahead of time.

The 'origin/main' branch contains commits that cause the conflicts, but which are they?

Luckily, this shifted problem is equivalent to finding the commit that introduced a code defect, just our defect is a merge conflict rather than a compile- or runtime bug. The tool for this search problem is git bisect.

The algorithm🔗

We'll give ourselves some git tags to make commits easier to track as we ratchet our way through. We'll also assume we have a branch called work and that we're trying to rebase it to origin/main.

With that in mind, we:

Tag the fork-point of work as work-base, such that git log work-base..work captures the commits we wish to rebase
Run git bisect start origin/main work-base to begin bisection
Run git bisect run sh -c '( git cherry-pick -n work-base..work && git reset --hard ) || ( git diff --stat && git cherry-pick --abort && exit 1 )' to identify the first change introducing conflicts against our series
Run git tag -f work-base-next bisect/bad to tag the conflicting upstream commit
Run git bisect reset to terminate bisection
Run git rebase --onto work-base-next work-base work to move the changes on work to the conflicting commit
Resolve the conflicts. The conflicts are now only those between a specific change of our own and the specific upstream commit pointed at by work-base-next.
Run git rebase --continue to apply the remainder of the series, resolving any further conflicts as usual
Use git rebase -x "..." work-base-next to exercise your conflict resolutions.
Run git tag -f work-base work-base-next to ratchet work-base forwards
Return to step 2, and iterate until work-base points to origin/main

Unpacking the algorithm🔗

Does rebasing 'work' to this commit conflict?

Setting aside the general process, we should probably work through what's happening in step 3:

git bisect run \
  sh -c '( git cherry-pick -n work-base..work && git reset --hard ) || \
         ( git diff --stat && git cherry-pick --abort && exit 1 )'

git bisect run gives us automation when we have a programmatic way to determine whether a commit is problematic. The exit status of the provided script drives the decision to mark a commit as good or bad. Beyond that, git bisect run executes the usual bisection process of its own accord.

As mentioned, we care about conflicts against our changes, so the first requirement is that we actually apply them. Here we use git cherry-pick for the job. We can't use git rebase in this instance as it conflicts with the branch-manipulation in progress with git bisect. Further, we use git cherry-pick -n to apply the changes to the working tree without making any commits.

git cherry-pick uses its exit status to indicate success or failure of the operation. We exploit this to determine what to do next. If the cherry-pick does not generate conflicts it exits successful. This is the uninteresting case¹ so we use git reset --hard to remove the uncommitted, cherry-picked changes. At this point we short-circuit out from the rest of the command expression as the exit status of the pipeline is success. git bisect run marks the commit as "good" and picks the subsequent bisection point.

Rebasing 'work' to this commit does not conflict

If the cherry-pick fails the implementation shows a diffstat to give some context for the conflicts, unwinds the cherry-pick intermediate state with git cherry-pick --abort, and propagates the exit status of the original cherry-pick operation with exit 1 from the subshell. The unsuccessful exit status causes git bisect run to mark the change as "bad", and to pick the subsequent bisection point.

Rebasing 'work' to this commit does conflict

By iteratively using bisection to find problematic commits in the upstream changes the strategy reduces the number of simultaneous resolutions we need to create. It trades off intractable conflicts from the big-bang approach of git rebase for minimised conflicts across multiple rounds of conflict resolution.

Reflection🔗

To some degree it feels like this is the rebase process that git actually needs. The happy case degenerates to the rebase behaviour git already has today, when the there are no conflicts between our local series and upstream. It shines when there are conflicts.

we're using bisect to find the conflicting changes!