Hey! I'm not a very well-versed in calculus, but are you sure that would be the derivative? As I understand, self.sigmoid(s) * (1 - self.sigmoid(s)), takes the input s, runs it through the sigmoid function, gets the output and then uses that output as the input in the derivative. I tested it out and it works, but if I run the code the way it is right now (using the derivative in the article), I get a super low loss and it's more or less accurate after training ~100k times.
I'd really love to know what's really wrong. Could you explain why the derivative is wrong, perhaps from the Calculus perspective?
There is nothing wrong with your derivative. max is talking about the actual derivative definition but he's forgeting that you actually calculated sigmoid(s) and stored it in the layers so no need to calculate it again when using the derivative.
Hey! I'm not a very well-versed in calculus, but are you sure that would be the derivative? As I understand, self.sigmoid(s) * (1 - self.sigmoid(s)), takes the input s, runs it through the sigmoid function, gets the output and then uses that output as the input in the derivative. I tested it out and it works, but if I run the code the way it is right now (using the derivative in the article), I get a super low loss and it's more or less accurate after training ~100k times.
I'd really love to know what's really wrong. Could you explain why the derivative is wrong, perhaps from the Calculus perspective?
There is nothing wrong with your derivative. max is talking about the actual derivative definition but he's forgeting that you actually calculated sigmoid(s) and stored it in the layers so no need to calculate it again when using the derivative.
The derivation for the sigmoid prime function can be found here.