DEV Community

Ben Halpern
Ben Halpern

Posted on

Tell me a bug story

Bugs are inevitable; Debugging is painful, but the experiences make us better developers.

So let's hear it! Tell us about some of the bugs you've encountered, and how you dealt with it.

Top comments (53)

Collapse
 
gosukiwi_20 profile image
Federico Ramirez

I had a bug and tried to fix it using threads. Nnoww II hhaavvee ttwwoo bbuugss.

Collapse
 
briwa profile image
briwa

Haha yup, just like what they said when solving a problem with Regex, now you have two problems...

Collapse
 
pinotattari profile image
Riccardo Bernardini

Regex are great! ☺ Seriously, I think I am one of the few that really like them.

I agree that as they get a bit complex, some regex look like line noise (I always try to find a less obscure syntax), but for searching and extracting text are really powerful.

Collapse
 
philnash profile image
Phil Nash

I used to work on a site that you could log into with your Instagram account, pick a bunch of pictures and buy them as fridge magnets.

This worked mostly well, but for the occasional pack where some or all of the images would fail to download. We spent ages working with the code that downloaded the image, trying to find where the bug was. (I feel more confident with downloading images in Ruby now!)

Ultimately, we decided the code that was downloading the images wasn't the issue, so perhaps it was the Instagram API? Or a flaky connection from our server?

More investigation lead to the discovery that on occasion a user would just delete their picture from Instagram, leading to our failed download.

So, we moved the downloading from on demand when the print job was run to a background worker once the user made their purchase. Jobs would still occasionally fail.

We moved the job to before the user even completed their purchase. This helped, but jobs would still occasionally fail.

I'm not even sure you could call it a bug in the end. Some users were uploading pictures to Instagram just to get them printed and then deleting them immediately. It didn't matter how many workers we ran against the job queue, there was always a user that was faster at deleting their images. The eventual fix was the loosening of the Instagram restriction, instead allowing users to use Facebook photos or upload photos from their computer/phone. When users no longer had to use only Instagram to get their images on to our site things became better. This was more work for us (Instagram photos were just square at the time, which fit the magnets, opening up to non-square photos meant we needed an image cropper and just a lot more UI) but was better for the user.

Am I calling users bugs here? Of course not! But understanding the ways that user actions can affect the way your site works is just as important as an esoteric language exception. And if something is failing, there are more ways to fix it than just inspecting the code.

Collapse
 
kamranayub profile image
Kamran Ayub

Recently we had a performance issue on a web app I was working on. We showed a series of questions where one question's answer may lead to a few new ones showing up.

The issue was that after answering the first question, answering the second question took 2 MINUTES to render the next set of questions. The UI was blocked the whole time.

I started to dig in. Our app let users manage 50 different products at once so this issue only started to manifest when you managed a ton of products. I also counted that the second question triggered 13 additional questions to show up.

When I started to log the number of Redux actions being dispatched, I was amazed to see we were dispatching (and re-rendering) 6000+ actions which is what was causing the slowdown. Each action dispatch was about 20ms (x 6000 =~ 2mins).

The questions had a graph structure where questions could relate to one another. Turns out we were following this graph structure and dispatching actions even if no actual data was changing, so I updated the logic to compare previous values; that cut it down to about 650 actions (50 products x 13 questions) which is what it took to make the new questions visible.

This reduced the time from 2 minutes to 25 seconds. Because the rest of the actions were technically needed to change state, I introduced the redux-batched-actions package to batch all those actions and dispatch one single action. Doing so reduced the time down to about 2 seconds. Much better!

Eventually, what I discovered was adding significant time was the JS ... spread operator. What! Turns out because we needed to support IE11, the spread operator was being polyfilled and this implementation was slow as heck. We switched some critical code to using assign instead and it reduced one functions total execution time from 1s to 12ms.

Overall, I got it down from 2 minutes to <100ms by doing these optimizations and simplifying the complexity of some functions to faster O(1) and O(N) implementations.

What a doozy! That took a good week to work through but the improvements were to core code in the app so the entire app benefited from it!

Collapse
 
philnash profile image
Phil Nash

Wow! That is quite the performance improvement. If only redux warned if that many actions were dispatched, it feels to me like 6000 actions for one request is unlikely to ever be on purpose!

Nice job tracking it all down though. 2 minutes to under 100ms is amazing!

Collapse
 
jacoby profile image
Dave Jacoby

We are a lab in a school that does science for other labs. We had a page that had been working fine for most people for a while, but we got a report that a person trying to use this page and the links wouldn't work, which means that user couldn't do her work.

I take a look, click links, and everything looks fine. And then I look at the logs, and by process of elimination, I figure out that this user is a Mac user using Safari.

I take a look at the source, and ...

We all know that URIs are (protocol)(server)(path), and this page would be (https://)(dev.to)(/ben/tell-me-a-bug-story-59e2), and you can do "absolute" links with just the (path), but did you know that you can include URLs that are protocol and path without server?

No, you didn't. Because that in an abomination before Tim.

But someone did. Someone, or a significant number of someones, did this, enough to make it so IE accepted it. And FireFox. And Chrome. But Safari didn't.

This was hard for me to test, because I have almost no Apple in my life.

So, to clarify: The bug isn't that we use this abomination (although that's what I fixed) but that it is industry standard for browsers to accept this, and Safari didn't. Be liberal in what you accept, I guess.

So, we:

  1. changed our code so we didn't continue in sin
  2. I created an Apple Dev account so I had access to the issue tracker, and when I tried to submit this bug, the bug tracker crashed. :shrug:
Collapse
 
jackharner profile image
Jack Harner 🚀

I created an Apple Dev account so I had access to the issue tracker, and when I tried to submit this bug, the bug tracker crashed. :shrug:

That's the best part of this one 😂

I definitely feel your pain on debugging Apple issues. I have access to an iPod Touch at work, but that's about it. Debugging on that without a Mac to connect it to is basically just trial and error.

Collapse
 
ben profile image
Ben Halpern

I’ve never been heavily involved in the Apple ecosystem but I’ve been around enough to feel this pain 😭

Collapse
 
kenbellows profile image
Ken Bellows • Edited

Several years ago, I was working on a python application. Tbh I don't remember much of anything about the actual program, but it's not important. I started seeing some weeeeiiirrdd behaviors in a particular function after the first time it was called. Let's pretend this was the function:

def add_score(obj={'type': 'default', 'score': 0}):
    obj['score'] += 10
    return obj

The code calling the function in question often called it without supplying an argument, and expected to get back the same result every time:

add_score() #=> {'type': 'default', 'score': 10}

But what was actually happening was much stranger:

agent_1 = add_score()
agent_1['type'] = 'red'

agent_2 = add_score()
agent_2['type'] = 'blue'

agent_3 = add_score()
agent_3['type'] = 'orange'

print(agent_1)
print(agent_2)
print(agent_3)

What would you expect to see? I expected this:

{'type': 'red', 'score': 10}
{'type': 'blue', 'score': 10}
{'type': 'orange', 'score': 10}

But what I actually got was this:

{'type': 'orange', 'score': 30}
{'type': 'orange', 'score': 30}
{'type': 'orange', 'score': 30}

⁉⁉😵⁉⁉

It took me SO LONG to understand the problem. The problem is that the default object in the function signature, the {'type': 'default', 'score': 0} object, is parsed and defined at function definition time, and it exists in the scope surrounding the function, when I thought it was defined each time you called the function, within the function scope. NOPE! 🤦‍♂️

So every time I called the function with no arguments, it was operating on and returning the same object! So all the agent_x variables in the code up there are referring to the same thing!!!

Oh my god the amount of time I wasted on this bug... but on the plus side, the very first article I wrote on dev.to was on this very bug (and how JavaScript's default parameters work the way I had expected Python's to work), so it sorta got me into tech blogging! Thanks bug!

Collapse
 
ben profile image
Ben Halpern

We had a bug in the DEV signin process forever which would randomly show an error message as invalid credentials or something like that as the error message being passed back to the oauth response. It was very uncommon, but never went away.

It turned out it wasn't invalid credentials, but the error message was actually just more of a catch all for "something went wrong". It turned out to be a timeout error. It sometimes took more than ~9 seconds to create an account and the request timed out.

That was a doozy. @andy mostly figured it out.

Collapse
 
mandaputtra profile image
Manda Putra

Oh it is, when my connection slow I always get that Invalid Cred message 😂

Collapse
 
andy profile image
Andy Zhao (he/him)

Oh yeah!! That bug was a trip. 😥

Collapse
 
vinceramces profile image
Vince Ramces Oliveros

I still experience the same thing when Sign in to GitHub account. I do want to submit an issue, but... It's a feature

Collapse
 
andy profile image
Andy Zhao (he/him)

We have some recently reported bugs about sign in issues, but each case has been a little bit different so far.

If you're having issues with your account, feel free to submit an issue for any bugs or feature requests to the repo: github.com/thepracticaldev/dev.to

We also provide support via our email: yo@dev.to

Collapse
 
razer profile image
Serge Stupachenko • Edited

Sometimes bugs can be fun.

I used to work on a geo-aware mobile app a while ago. We had an offshore QA team located in India.

They reported a bug about inconsistent behavior between iOS and Android apps. The restaurant was showing as open in one app and as closed in another.

We spent about a day trying to figure out what's wrong and about an hour in a conference call with them. Nobody was able to reproduce it again, so it got deferred.

As we figured out later, they run their test at 8:59 PM on one device, when the restaurant was still open. Their second run on another device was done at 9:01 PM when the restaurant was actually closed.

The funny part is the name of the restaurant - "The Blind Pig." 😅

A few years later, we developed a tool aimed to help distributed teams to inspect and debug mobile apps faster.

Collapse
 
briwa profile image
briwa • Edited

The recent one was about Highcharts. You were supposed to have an animation when hovering the mouse on the legend.

jsfiddle.net/y92haq35 (somehow the fiddle can't be embedded)

It was fine on an isolated environment, but somehow the animation didn't appear on our page. I thought it was a configuration issue, so I copied and pasted the exact same config on our page. It didn't appear too. I was having trouble inspecting the styles because it isn't triggered semantically by CSS, rather by JS.

I inspected the source code, but every hover class was firing up properly. I tried it on a different page in the app, the animation is there. So something was causing it in the original page.

After painstakingly removing the components/modules in that page one by one, seeing which one causing the problem, I found out that there is a line of CSS that goes like this:

/*
* (a note about a bug its trying to fix)
*/
.highcharts-series-hover {
  opacity: 1 !important;
}

Basically this line says all Highchart series would have an opacity of 1, so even if the animation kicks in, this line overrides it (with the !important) so that it looks like there is no animation. Should've fixed the actual bug from the issue tracker...

And that concludes 3 hours of debugging. I think I didn't do a good job debugging it, any suggestions? 😂

On another note, how do you all prevent these kind of CSS bugs? Visual regression test? Eye test?

Collapse
 
zerquix18 profile image
I'm Luis! \^-^/

I was using one of MaterializeCSS's date pickers. This one has a prop called minDate which allows you to set what's the minimal date for this selection. This worked wonderfully.

I was passing the variable to Materialize's date picker, which was a date object. This date object had the date and time selected by the user, so Materialize was working properly.

The problem was sending the data to the server. The date variable was somehow changing. It was losing the time. Time was 00:00:00.
I checked my code, from the very beginning, down to my Redux store. Everywhere. My date variable was OK and it HAD the proper time.
I spent hours checking why it was changing. Is Javascript suddenly crazy? Why is this happening to me? Could it be a bug IN javascript?

Turns out Materialize was mutating the date object. The solution was just cloning it minDate = new Date(selectedDate)

That fixed the issue. Lesson learned: be careful with mutation.

Collapse
 
downey profile image
Tim Downey

I work on a Ruby API that serves acts as a core piece of the control plane for an open-source PaaS platform called Cloud Foundry. Our users install and operate the platform on the infrastructure of their choice (on-prem vSphere, AWS, GCP, Azure, etc.) and everyone tends to use it a little bit differently. This leads to lots of possible configurations and makes certain types of bugs hard to triage and even harder to reproduce.

One bug (or unforeseen usage pattern) we had seems really obvious in hindsight, but ended up taking weeks of investigation. We had some users report that their APIs were consuming huge amounts of memory and every six-minutes would reach ~8GB of ram usage and restart. Now Ruby isn't the most lightweight programming language, but it shouldn't be that bad! We initially expected a bad memory leak, but pausing the interpreter and manually forcing garbage collection was able to free up most of the memory. So we ended up crawling through heap dumps (wrote a blog post on this process with my team) and eventually found out there were tons and tons of User model objects in memory.

Turns out that this installation had all of their users (10,000+) belonging to the same organizational unit (called a Space) and we had a frequently-accessed line of code that was loading this full array of users into memory every time an API endpoint was hit. It was simply trying to do an existence check to see if a particular user was a member of the Space in question, but because of how we used our ORM it was instantiating and loading all users within that space into memory. Since our test environments (and many other production environments) tend to only put dozens or hundreds of users in a Space we hadn't encountered this.

The fix ended up being super simple:
Do the existence check in SQL instead of in Ruby

😂

Collapse
 
danielshow profile image
Daniel Shotonwa

I worked with a team and one of my teammates had a bug with an editor package in React, he struggled just to add a placeholder to that editor. He literally tried to solve the bug for over 3 days and even reached out to me for solutions all to no avail. One night I was just thinking about the bug and added placeholder as a prop and it worked.

Collapse
 
javierg profile image
Javier Guerra

My favorite story is one time we where preparing for a big presentation on a new feature, which involved rendering lots of things with handlebars on the browser, everything was working perfect. We showed to the manager, he liked it, and ask us for a rehearsal on the presentation later this day.

During the rehearsal, the page stop working. The browser didn't render a thing. After some debugging, it turn out that the machine where the presentation was happening did an auto Chrome update that bloated the GPU usage, but only with OSX Yosemite. Took us many hours to find this out. But the fix was easy. Use Another machine for presentation, and wait a couple of weeks for another chrome update.

Collapse
 
jacoby profile image
Dave Jacoby

Batch processing is for when the job can take days to weeks. Our system uses Torque, which redirects STDOUT to (task name).o(process id) and STDERR to (task name).e(process id).

In Torque, you can use -n to pass that name, which allows you to send data to the shell script that does everything, like the project id that you're about to package into a 5TB tarball.

So, that happened, and I wanted to see the output. code 1234.o123456789000 1234.e123456789000

This gave me me a VSCode tab for 1234.o123456789000 and an empty tab titled "Infinity".

Code is Electron which is JS and CSS, and you give JS a number bigger than it can handle, it says Infinity. And Code uses Minimist to handle ARGV and didn't specify that the standard entry is a string. I get it, because your standard suffixes will trip the number detection tools.

But 1234.e123456789000 can be read as 1234.0 * 10 ** 123456789000, and that's a really big number.

Fixed by adding ,"_". Bug report and commit available on request.

Collapse
 
tchaflich profile image
Thomas C. Haflich

This one was a bug with a third party vendor. I found the bug for them, and it probably remains unfixed to this day.

We were sending off sheets to be printed, and these sheets included randomly generated unique passcodes of five alphanumeric digits (that is, matching /^[a-z0-9]{5}$/). We had users log into a portal with these passcodes, along with other authentication information. For years, this was the case without issue.

Then one day, I get a support call...

CALLER: Hi, I can't log onto the portal. It says it can't find me?
ME: Hold on one second, let me look you up.
  [I get the CALLER's information and search for them in the database.]
ME: Okay. It looks like your information is all correct on my end. 
ME: Are you seeing any error messages on the screen?
  [We go through the standard debugging steps. You know the drill.]
ME: Can you read me the code on your printout?
CALLER: One, two, three, zero, zero, zero, zero, zero...
ME: Sorry, the five digit code on your printout. 
ME: It should be under the heading "Passcode," in green.
CALLER: Yeah, that's it.
CALLER: It looks weird like it's going over the box or something though.

At this point, I have a suspicion. I look up the details in the database again.

+---------+--------+----------------+----------+
| fake_id |   name | account_number | passcode |
+---------+--------+----------------+----------+
|    9001 | CALLER | ASDF1234FOOBAR |    123E4 |
+---------+--------+----------------+----------+

Some of you may have spotted the issue already. For those who haven't, let's zoom in.

The passcode is listed in our database as 123E4.

ME: I'm very sorry, but can I call you back?

I had to confirm that we sent the passcode out as plain text to the vendor - we in fact did. Whatever process they used to lay out the prints had somehow interpreted our string as exponential notation all on its own.

We couldn't convince them that it was their issue, or tell them how to fix it, so our solution was...

To stop including the letter "e" in our codes ¯\_(ツ)_/¯

Collapse
 
garrettgreen profile image
Garrett Green

On a small island with a large tree, a colony of ants is preparing food for the arrival of a band of grasshoppers. Of the ants that are working, one of them that stands out is an industrious one named Flik. Flik is constantly inventing new things for the colony to reduce labor, but his ideas are often shouted down and shunned by the colony, who feel the old-fashioned way of preparing the grasshoppers' "offering" is the only way to do things. The only one who seems to believe in Flik is the young Princess of the colony named Dot. Secretly, Flik is attracted to her older sister, Atta, who is next in line for the throne.

As the time for the grasshoppers to arrive approach, the colony heads down into the anthill, intent to wait for the grasshoppers to eat the offering, and leave. Unfortunately, Flik is the last one to put his items on the offering stone, and ends up causing the platform to collapse with his piece of machinery. All the food they've gathered spills into the stream.

When the grasshoppers finally arrive and find nothing, they break into the anthill to terrorize the ants. The leader of the grasshoppers, Hopper, demands that the offering be replenished by the Fall season, and terrorizes Dot, before Flik comes forward to try and defend her. Hopper then commands that the offering be doubled, due to Flik's speaking up against them. During the meeting, Hopper's dimwitted brother, Molt, lets it slip that Hopper is afraid of birds. Hopper silences his brother and the grasshoppers leave, promising to return when the last leaf of Autumn falls.

The colony is now in trouble, as there isn't enough food to fulfill Hopper's request and provide sustenance for the colony. Flik is brought before a tribunal in regards to his causing the trouble. As the group convenes, Flik then thinks up a new idea: if the colony could find bigger bugs to help them defend the colony, they could be instrumental in scaring off the grasshoppers from ever returning. The others think this is a bad idea, until Flik volunteers to look. Thinking that his search will take a long time, the council decides to accept Flik's request, figuring he'll be away from them long enough to keep from causing further trouble.

In another part of the region, a motley crew of bugs are performing in a circus led by a flea, PT Flea. The circus isn't well-attended, most of the audience being flies who jeer and cajole the performers, especially when their acts fail. PT, desperate to keep the flies from leaving, announces a new act call "Flaming Death" where his crew will work together to keep him from being burned. The act fails miserable because the bugs can't coordinate their efforts and PT is burned anyway. He fires them all on the spot.

Flik sets off the next day to head for the big city, eventually finding his way into a 'bug bar,' asking around for tough 'warrior bugs.' His attention is suddenly drawn to a group of bugs in a corner, who it seems are preparing to take on a small gang of flies and their huge member, Thud. They make a valiant effort, posing as medieval knights but their ruse fails and the 'bug bar' ends up being wrecked, and Flik misses much of the fight. However, in the aftermath, Flik thinks he's found the perfect guardians to help his colony.

He pleads his case to the group, saying how he's been looking for bugs with their talent, and asking for their help regarding the incoming group of grasshoppers. The group eagerly accepts, thinking Flik wants them to perform at a dinner theatre and, hoping to avoid trouble from the bar's owner & the flies they fought with, they head off for Ant Island.

When Flik returns, Atta and the elders are shocked that Flik actually found 'warrior bugs.' Atta is at first unsure, but the ladybug of the group. Francis (Dennis Leary), promises that they will knock the grasshoppers 'dead' when they come.

A party is then held in honor of the group, including a tribute and art showing the warriors fighting off the grasshoppers. The group then grows leery, realizing they're meant to fight a war for the ants instead of merely performing. Rosie whispers to Flik that they're actually just circus bugs and Flik is horrified, accusing the group of tricking them. When Atta appears, Flik convinces her that the circus bugs will fight for them.

Discovering their mutual misunderstanding, the circus bugs attempt to leave, but are forced back by a bird. They work together to save Princess Dot, the Queen's daughter and Atta's sister, from the bird as they flee, gaining the ants' trust in the process. They continue the ruse of being "warriors" so the troupe can continue to enjoy the attention and hospitality of the ants. The bird encounter inspires Flik into creating an artificial bird to scare away Hopper, leader of the grasshoppers. The bird is constructed from sticks and leaves, but the circus bugs are exposed by their former ringmaster, P.T. Flea, when he arrives searching for them. Angered at Flik's deception, the ants exile him and desperately try to pull together enough food for a new offering to the grasshoppers, but fail to do so.

When the grasshoppers discover a meager offering upon their arrival, they take control of the entire colony and begin eating the ants' winter store of food. After overhearing Hopper's plan to kill the queen, Dot leaves in search of Flik and the warrior bugs and convinces them to return and save the colony with his original plan. The plan nearly works, but P.T. Flea lights the artificial bird on fire, causing it to crash and be revealed as a fake. Hopper has Flik beaten by his thug, Thumper, in retaliation, but Flik defies Hopper and inspires the entire colony and the warriors to stand up to the grasshoppers and drive them out of the colony.

Before Hopper can be disposed of, it begins to rain where the drops of water are like large bombs. In the chaos, Hopper viciously pursues Flik, who leads him to the actual bird's nest. Hopper mistakes the real bird for another fake bird, and taunts it, attracting its attention. The grasshopper is eaten by the bird and its chicks.

Some time later, Flik has been welcomed back to the colony, and he and Atta are now a couple. As the troupe departs with the last grasshopper, Molt, as an employee, Atta is crowned the new Queen, while Dot gets the princess' crown. The circus troupe then departs as Flik, Atta and Dot watch and wave farewell in a tree branch.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.