DEV Community

Ben Halpern
Ben Halpern

Posted on

What's the longest you've ever spent debugging a single bug?

Latest comments (60)

Collapse
 
kimkulling profile image
Kim Kulling

In sum.: three weeks with 2-3 developers. It was a corrupted ponter in a medical device. The issue was really hard to reproduce and even harder to understand the root-cause. At the end we found a couple of threads which tried to release a pointer and only one implementation of those three threads was broken. And kt was a legacy codebase without support from the authors.

Collapse
 
kr428 profile image
Kristian R.

Depends on how you count, probably. Weeks to months I'd say, in terms of a quite peculiar persisetency bug bringing an application server to a screeching halt then and now for no obvious reason. Fixing this was rather trivial as soon as we actually understood what went wrong. 🙂

Collapse
 
nicolasomar profile image
Nicolás Omar González Passerino

1 week and a half, trying to overwrite a css rule from a .net core app causing styling issues on a child react-based app. I needed the help from another dev for 4 days until we get the fix. Man... that was a challenge at another lever for me.

Collapse
 
itz_giddy profile image
Audiophile

Three weeks. This was when I first started out in web development. I fixed a bug that prevented the project from building in heroku but I kept pushing to the wrong git branch(I used git push instead of git push origin master). So when I pushed again to heroku it would fail over and over again. I have never made that mistake again.

Collapse
 
markgawler profile image
Mark Gawler

Over the years time scales have shrunk for both release cycles as well as debugging, I started programing on hard real time systems, using assembly code. Then we would normally achieve one (occasionally two) release a year. The system would was expected to be in service for a minimum of 10 years.

I can think of one intermittent fault on an interface between two systems which took me nearly a decade to find. This wasn't continuous effort but I had at least three attempts at resolving it. By the time I started looking at the bug the system was already a legacy system with a replacement contracted through our competitor wa on its way. I was a Junior engineer known for having an aptitude for low level coding, so I was put to work. Not having access to one of the two systems I could only review the code and write a report.

Two or three years later out customer dug up my report and agreed we could have access to both system, the catch was I only had a single day on site at the opposite end of the country. The nature of intermittent faults when debugging is the fault will not occur, true to form the system ran perfectly for the whole day and not useful information was gained.

As luck would have it our competitor failed to deliver on promises and our legacy system got a life extension and was rehosted on new hardware. I lead the software effort and in went in to service at which point I left the project. Once in service the original intermittent fault came e back with avengeance, our customer was not happy. I go seconded back to help fix the issue, we enhanced our simulator to emulate the other system and started debugging. Eventually I found the issue which we traced to using the wrong entry point in an error recovery routine in the Real Time OS. The programmer some 22 years earlier had types a 5 instead of a 3. The junior engineer who modified the simulator for me was younger than the bug! Having fixed the bug I was reminded of the report I had written nearly 10 years before, which correctly pointed to the exact error routine at fault.

Collapse
 
stealthmusic profile image
Jan Wedel
Collapse
 
gsto profile image
Glenn Stovall

3 days, turned out there was a bug in PHP itself. We had to come up with some creative workarounds for that one.

Collapse
 
pinotattari profile image
Riccardo Bernardini

A couple of days; an issue with pointers in a C program.

It began as usual: the program dies with a segmentation fault, open it with the debugger to check where the fault happened and... the stack is a nonsensical mess. Ouch. This is not a good sign, stinks of dangling pointers or similar.

In cases like this the actual error can be anywhere and it could be necessary a veeery long time to find the actual bug. It turned out that there was a problem not with just a pointer, but with a pointer to pointer to pointer to ... deep three or four levels.

I am soooo happy that I now code in Ada and not in C anymore.

Collapse
 
mattbidewell profile image
Matthew Bidewell

Ha! I have a painful one. Took me nearly a good 3 days to find it. (between tackling other stuff when I hit a wall)

Note: I'm in GMT timezone.

The company I work for has an analytics view which takes a deep dive into the analytics of media the company serve. In November 2019, we got a message from a client saying numbers from our excel download functionality don't match that of their internal systems.

The numbers started off fine, but then massively increase after an arbitrary date. (clue 1)
The client was on the west coast of America, we provide all our analytics in UTC time (clue 2)
The client had multiple occurrences where the analytics was wrong after the arbitrary data. (clue 3).
I didn't have a problem when getting the data. (clue 4).

The problem?
Daylight saving
Without going into specifics.. the problem was going back an hour and then calling .startOfDay() on that date meant we would end up with two days worth of data after daylight savings.

Painful to find...easy to fix.

Collapse
 
unfor19 profile image
Meir Gabay

10 years. I still can't figure out why providing the wrong password on Windows login takes 1 minute to process

Collapse
 
jessekphillips profile image
Jesse Phillips

I don't have a means to know, but I recall one which would have been around 1 month, but most of that time was ignoring the bug.

I had just come on to the project, the bugs had been mentioned but were not something I could directly start investigating.

I had to build out my test infrastructure, with mocks of our integrated component. This meant reading 3rd party API docs and building the correct communication lines.

After all of this was worked out, replicating the bug was easy and being specific with the cause was just a matter of describing what the code was doing.

Being QA I sent it off for someone else to fix.

Collapse
 
mortoray profile image
edA‑qa mort‑ora‑y

I recently had a bug in my message stack, which took a few days to isolate what was happening, then took 2-3 weeks to fix, as I had to build a new message stack.

I wrote about it in dev.to/mortoray/high-throughput-ga...

My game has encountered several major defects, usually in libraries or the browsers, which required a lot of effort to workaround.

Collapse
 
diedoman profile image
diedoman

I have some nice embedded programming stories for ya:

Two old colleagues of mine spent about one week on a particular issue:
They were working on a SIP stack (for audio connections/sessions), when suddenly it stopped working completely. After one week it turned out that the PBX (kind of phone/SIP router) had blacklisted their device for too many failed calls)...

I have spent about 1,5 months on another issue with a driver for flash memory. TLDR: some bit in a settings register was not set/reset by our driver, so based on whether the device had used an older driver before it would work perfectly OR shift everything 1 byte.

The unfortunate part of embedded programming (at least back then) was that:

  • it took around 2 minutes to compile and flash ANY CHANGE that you had.
  • many errors show up when the linker gets involved, which is at 99% of the compilation process (so after 1 minute and 45 seconds, something like that)
  • especially in the beginning, none of us knew how to debug/profile embedded software.

Later on we added profilers and proper debugging setups (and a hardfault handler that printed stacktraces. GAMECHANGER!)

Good old arm-none-eabi-gcc days :P

Collapse
 
ferceg profile image
ferceg

It was very-very long time ago, in the late 90's.
I wrote a little game in Watcom C (somewhere between Wolfenstein and Doom, only walls and simple floor, but with not only perpendicular walls). Trigonometric functions were very expensive, so I used a generated sin table. I copy-pasted it into the source, but it looked ugly, so I lined it up with leading zeros. It was a mistake, because 0****** number are octal in C, so very strange things started to happen on the screen. It took a few hours to debug this and after that I was literally banging my head into the desk.

Collapse
 
brpaz profile image
Bruno Paz

Once I was working on a project that used ElasticSearch. I was changing some things in a list page and noticed that the results were pretty random.

After maybe 2 or 3 days trying to understand what's going on, I discovered my local ES config had the default cluster name and open in the network, so it automatically created a cluster with a colleague machine and I was seeing his data.

I don't think it was the bug I spent more time, but it's one I dont forget.

Now that I think about it, it's pretty funny, but it was definitely not at the time. ;)