I would rely on consumer contract tests* here, which SHOULD (rfc2119) test both syntax and semantics (aka behaviour) sufficiently for a consumer to operate correctly with your API. The name semantic versioning hopefully reinforces this idea :)
If none of your API consumers have provided a contract test that relies on a bug, then it can be fixed without that major version uptick, otherwise it's comms time with those who rely on a bug..
The fly in this ointment of course is that API consumers rarely provide decent contract tests when they are your paying customers, you end up doing those yourself from the invariably opaque 'product requirements', and thus end up in the pickle described here.
Something we've toyed with here in GBG, with thousands of customers (most of whom have no idea what a consumer contract test is) is deriving those contract tests from operational monitoring, effectively capturing and replaying their real activity to ensure our responses remain the same while we make changes elsewhere such as adding new features.
Interesting - yeah, in cases like an API where you can track how people used the API and what the responses were like, it gives you the tools to be confident about how people use it and what it would take to break any rules.
From the perspective of an open source library though, you really don't have either of those as options. Really everything is on the table except maybe using reflection to access internal code (as that hardly seems fair and is against the spirit of consuming the library). Something as seemingly trivial as the error message on an exception object could change the flow of the consumer's code. I don't think that is obviously a good idea but I guess what I am getting at is the grey area of "breaking change".
It might start with something like an exception message to then you find out some code is depending on the actual stack trace of an exception (again, I think this would be a really bad idea BUT it is a valid property on an exception - in C# at least - without needing to use reflection).
Going back to your example of actually monitoring, capturing and replaying activity against an API - that is quite fascinating. I mean, it completely makes sense but at the same time, seems like quite an undertaking. Would love to know more, anything you can share (eg. Is it a custom solution you made? Something off-the-shelf? Do you need to worry about sensitive data? etc) would be great.
Ah the joys of maintaining Open Source libraries :)
In this case, possibly caveat utilitor (user beware) applies, and provided their tests don't fail they can use a new version of the library. They chose to use your library after all, and there is no commercial contract keeping them there or forcing their use of a specific version (unlike many SaaS things with APIs!). Serious users may want to write some test cases for you, so you both know when their contract is broken, they may even like to fix that breakage? This leverages the value of open source to provide visibility and options for all parties.
Regarding the operational sampling / replay thing - we don't do this yet (I did say toying with the idea, not shipping :)), but we've been looking at putting what amounts to a transaction recorder in the sidecars that terminate TLS and manage request routing in our stack. The problems are less technical than legal/privacy for us, being a major processor of sensitive info. We already record API call failures into an incident log for investigation, giving us another option to build requests that exhibit the same failure but with synthetic data, that we can push back up the pipeline to open development areas.
We're a place where coders share, stay up-to-date and grow their careers.
We strive for transparency and don't collect excess data.