Solving intractable rebase conflicts with git-bisect
Maintaining a long-lived series of changes against a fast-moving open source project can be a headache. Ideally we wouldn't put ourselves in this situation - instead we'd work with upstream to merge the changes and find better things to do with life. However, that's not always feasible.
Some of the bigger headaches are caused by merge conflicts. Sometimes conflicts are invisible as they are solved by clever merge strategies. Other times there's been significant rework of the code on both sides and you are forced to demonstrate you know what you're doing.
There comes a point where the size of the series you're maintaining and the progress of upstream yields rebase conflicts that are practically intractable to resolve. This was the situation we found ourselves in while maintaining a set of changes implementing Swordfish on top of OpenBMC's bmcweb.
The problem of conflicts comes in two flavours: overwhelming volume, or having to reason about too many problems simultaneously to have confidence that your resolutions are consistent and correct. With a large number of out-of-tree changes we frequently encounter both at once.
A source of pain in this situation is
the behaviour of git rebase
, which:
- Rolls off your changes back to your branch's fork-point
- Resets your current branch to the state of the target branch
- Replays the rolled off patches on top of your new branch state
Step 2 combined with 3 is the source of our intractable conflicts, as there's no accounting for our changes relative to intermediate changes from upstream.
Managing the conflicts often becomes tractable if we address them as they occur at these intermediate commits. This strategy reduces the number of problems that we need to solve simultaneously to only those introduced by the offending commit.
So, what options do we have for that?
A tedious approach is to rebase our changes on top of each new upstream commit,
for N
rebase operations against N
new upstream commits. This is guaranteed
to find the first upstream commit causing a conflict, which is the fundamental
property we're after. However it requires that we perform the rebase operation
even if the current upstream commit doesn't generate any conflicts.
An alternative approach is to rebase our changes directly to the first upstream change that cause conflicts. However, this idea just shifts the problem: We now need to estimate which upstream changes cause conflict ahead of time.
Luckily, this shifted problem is equivalent to finding the commit that
introduced a code defect, just our defect is a merge conflict rather than a
compile- or runtime bug. The
tool for this search problem is git bisect
.
The algorithm🔗
We'll give ourselves some git tags to make commits easier to track as we ratchet
our way through. We'll also assume we have a branch called work
and that we're
trying to rebase it to origin/main
.
With that in mind, we:
- Tag the fork-point of
work
aswork-base
, such thatgit log work-base..work
captures the commits we wish to rebase - Run
git bisect start origin/main work-base
to begin bisection - Run
git bisect run sh -c '( git cherry-pick -n work-base..work && git reset --hard ) || ( git diff --stat && git cherry-pick --abort && exit 1 )
to identify the first change introducing conflicts against our series - Run
git tag -f work-base-next
to tag the conflicting upstream commit - Run
git bisect reset
to terminate bisection - Run
git rebase --onto work-base-next work-base work
to move the changes onwork
to the conflicting commit - Resolve the conflicts. The conflicts are now only those between a
specific change of our own and the specific upstream commit pointed at by
work-base-next
. - Run
git rebase --continue
to apply the remainder of the series, resolving any further conflicts as usual - Use
git rebase -x "..." work-base-next
to exercise your conflict resolutions. - Run
git tag -f work-base work-base-next
to ratchetwork-base
forwards - Return to step 2, and iterate until
work-base
points toorigin/main
Unpacking the algorithm🔗
Setting aside the general process, we should probably work through what's happening in step 3:
git bisect run \
sh -c '( git cherry-pick -n work-base..work && git reset --hard ) || \
( git diff --stat && git cherry-pick --abort && exit 1 )'
git bisect run
gives us automation when we have a programmatic way to
determine whether a commit is problematic. The exit status of the provided
script drives the decision to mark a commit as good
or bad
. Beyond that,
git bisect run
executes the usual bisection process of its own accord.
As mentioned, we care about conflicts against our changes, so the first
requirement is that we actually apply them. Here we use git cherry-pick
for the job. We can't use git rebase
in this instance as it conflicts with
the branch-manipulation in progress with git bisect
. Further, we use git cherry-pick -n
to apply the changes to the working tree without making any
commits.
git cherry-pick
uses its exit status to indicate success or failure of the
operation. We exploit this to determine what to do next. If the cherry-pick does
not generate conflicts it exits successful. This is the uninteresting case1
so we use git reset --hard
to remove the uncommitted, cherry-picked changes.
At this point we short-circuit out from the rest of the command expression as
the exit status of the pipeline is success. git bisect run
marks the commit as
"good" and picks the subsequent bisection point.
If the cherry-pick fails the implementation shows a diffstat to give some
context for the conflicts, unwinds the cherry-pick intermediate state with git cherry-pick --abort
, and propagates the exit status of the original cherry-pick
operation with exit 1
from the subshell. The unsuccessful exit status causes
git bisect run
to mark the change as "bad", and to pick the subsequent
bisection point.
By iteratively using bisection to find problematic commits in the upstream
changes the strategy reduces the number of simultaneous resolutions we need
to create. It trades off intractable conflicts from the big-bang approach of
git rebase
for minimised conflicts across multiple rounds of conflict
resolution.
Reflection🔗
To some degree it feels like this is the rebase process that git
actually
needs. The happy case degenerates to the rebase behaviour git already has today,
when the there are no conflicts between our local series and upstream. It shines
when there are conflicts.
we're using bisect
to find the conflicting changes!