In this post I aim to convey a general understanding of what's generally intended as "functional programming". I will also interject subjective value judgements about the usefulness of certain features or paradigms. Finally, I will explain why you can't "just not use side-effects" as I see it asked a lot.
I have not included code samples, because using imperative constructs to describe functional programming is, in my opinion, entirely the wrong way of going about it.
If you have any questions regarding this post or related to (purely) functional programming, please do ask in a comment!
The "functional" in "functional programming" refers to mathematical functions, rather than "what it should do" as in "functional requirements".
You've probably encountered functions in school, e.g. for line equations such as
y = 3 - 0.5x.
In this equation, we can represent the function
f defining the value of
y for every
f(x) = 3 - 0.5x.
As you can see, the value of
f(x) depends entirely on input variable
x, not on any hidden state, randomness, user input or IO. This is an essential property of mathematical functions. In computer science, we often pretend procedures / routines are functions too, so we must distinguish between regular functions (procedures) and pure functions (mathematical functions).
This in itself is not particularly useful yet, but there's more! We can also use functions as arguments to other functions.
To do this, let's look at the thing that started it all, lambda calculus:
λ x . x + 1
The above expression says: if you give me some value x ("λ x ."), I will substitute "x" with that value in "x + 1". So, if we pass it a "7", it substitutes "x" for "7" and we get "7 + 1". Here's how it works step by step:
f = λ x . x + 1
We are trying to evaluate f 7, so first let's substitute the function for its value:
= (λ x . x + 1) 7
then we apply the lambda expression to the value 7. We do this by substituting the argument x with 7:
= 7 + 1
A lambda expression defines a function, but does not name it, like regular functions do. A lambda expression is just a value. As such, it can be the return type of a function. We write can write the function add(x, y) = x + y as a lambda expressions.
add = λ x . λ y . x + y
Here, the first lambda (λ x) denotes that it takes an "x", and returns a second lambda expression! To that second lambda expression we can then pass the second argument or our addition.
If we want to add 3 to 4, we would call this function as such:
add 3 4
Don't worry, we don't write functional code like this in practice. This is merely educational.
So let's evaluate this expression:
add 3 4
= (λ x . λ y . x + y) 3 4
Apply 3 to the lambda expression first:
= (λ y . 3 + y) 4
then apply 4:
= 3 + 4
We don't need to evaluate every lambda expression immediately, the following is perfectly valid:
successor = add 1
which evaluates to:
successor = add 1
= (λ x . λ y . x + y) 1
= λ y . 1 + y
successor is just another function!
We can also accept functions as inputs:
applyTo7 = λ f . f 7
We can evaluate this e.g. like so:
= (λ f . f 7) successor
= successor 7
= (λ y . 1 + y) 7
= 1 + 7
It turns out there is a lot of cool stuff you can do when using functions like regular variables. For now, you should probably give you mind a break if you've never seen lambda expressions.
I've previously mentioned how in CS we often call routines "functions". Unfortunately, this misnomer continues into "functional programming", and so we must distinguish between functional style and functional purity.
Functional style just means treating functions as variables (more or less), like we did above. Functionally purity is the same as earlier, indicating that your functions are mathematical functions.
When people talk about functional programming, they often intend functional style, but not necessarily functional purity. On the other hand, when people talk about the mathematical properties of functional programming, they probably mean purely functional programming. The inverse is not true, somebody who is talking about functional purity most likely also intends functional style.
Personally, I am (pun alert!1) categorically uninterested in impure programming, regardless of style. To get an idea why, let's look at the following impure pseudocode:
routine addLogged(a, b) log a return a + b addOne = addLogged 1 seven = addLogged 3 + 4 eight = addOne seven
Will the log be
"1\n3" or will it be
"3\n1"? Lambda calculus does not specify evaluation order beyond dependencies, so there is no way to tell!
An astute defender of impure FP might say "Oh but this is just a consequence of currying, which is not required for FP!". This is true, but only because I've given a very simple example for the sake of clarity. The core issue is that you're creating non-obvious dependencies between functions that can go all over the place. It's
goto spaghetti code all over again, just not for the instruction pointer.
Another argument in favor of impure functions is "a computer without effects is just a space heater!". Which technically isn't true, without effects, your program never even gets evaluated, so no heat is produced. Jokes aside, we're getting close to the point where every effect has a better representation than as a hidden side-effect of a seemingly pure computation. But this is a far too advanced topic for this post. The tools to easily use this new knowledge also aren't there yet.
Functional programming (typed FP that is) has been shown to be "right" in 3 different ways in mathematics. We've already seen lambda calculus, but it also corresponds to logic (constructive and linear) and category theory.
This correspondence allows functional programmers to take lessons not zome of the most fundamental fields in mathematics. This can make the discussions about functional programming seem daunting from time to time, but you don't need to understand why "the
Set type constructor is not monadic in the Set category"2 to write functional code. This discrepancy is known as the Haskell pyramid.
Nevertheless, advanced math is used to make functional programming easier. For instance, there is no documentation as reliable as the type signature of a pure function. Indeed, by the Curry-Howard isomorphism, the type of a variable is a proposition, the proof of which is its term (its body, if you will). If it weren't, the compiler would throw an error.
Looking from the category theory perspective, because category theory is the study of composition, we know that the way we are writing (composing from more basic instruction) our program is valid. At least, for the category we're in.
Loops are inherently stateful, we set a variable, and continuously change it. Without loops of some sort, surely a language cannot be Turing complete? Or even useful for that matter!
The answer for untyped lambda calculus doesn't interest us right now, because function programmers tend to really like types. In the case of typed lambda calculus, we use recursion!
f = λ x .
if x > 0
then 1 : f (x - 1)
This creates an list of 1's with the size of x. ":" is like the push_front function, except it's stateless, so "x : xs" is saying "create a list that starts with x, followed by some other list xs". In our example, "1 : f (x - 1)" creates a list starting with 1, followed by whatever comes out of "f (x - 1)". We're using f in its own body/term! Surely that's illegal if terms are proofs?
You're right, in regular functional programming the proof only holds if the program terminates. But by the halting problem we know that we cannot prove termination of a Turing complete language. So most people just accept the risk of non-termination. Others require proof of termination. In our case, we can get such a proof, because our integer x gradually descends, and we have a stop criterion (if x > 0) that will trigger at some point the recursion will reach. Going back to the example:
= (λ x . if x > 0 then 1 : f (x - 1) else ) 3
= if 3 > 0 then 1 : f (3 - 1) else 
= 1 : f (3 - 1)
= 1 : f 2
= 1 : ((λ x . if x > 0 then 1 : f (x - 1) else ) 2)
= 1 : (if 2 > 0 then 1 : f (2 - 1) else )
= 1 : (1 : f (2 - 1))
= 1 : (1 : f 1)
= 1 : (1 : ((λ x . if x > 0 then 1 : f (x - 1) else ) 1))
= 1 : (1 : (if 1 > 0 then 1 : f (1 - 1) else ) )
= 1 : (1 : (1 : f (1 - 1)))
= 1 : (1 : (1 : f 0))
= 1 : (1 : (1 : (λ x . if x > 0 then 1 : f (x - 1) else ) 0))
= 1 : (1 : (1 : (if 0 > 0 then 1 : f (0 - 1) else )))
= 1 : (1 : (1 : ))
= 1 : 1 : 1 : 
= [1, 1, 1] sytactic sugar
There really isn't much else to know about recursion. A function calls itself, that's it.
There is still a lot to learn about types, a cornerstone of safe functional programming, but they are a bit too broad of a topic to add to this already long post.
This is a question I've seen a lot. Why can I not just follow the convention of not using side-effects and keep using the language I know? Don't get me wrong, not having side-effects is generally a good convention, but it's not the same thing as language-enforced functional purity.
First, if you're not going to use side-effects, why do you want the option to use them?
Imagine somebody asking for your banking details and telling you "oh I'm not going to use it though". Would you trust them? Future and past you are assholes with no respect for the plight of current you, and should never be trusted.
The problem becomes more obvious in large organizations or when outsourcing. In these situations not everybody is as competent, or even just as willing. Knowledge is better than trust. With purity, I know that neither I nor others have fallen into old habits and snuk in a nasty one because of some imagined deadline.
Second, with purity restrictions, your tools know you have no side-effects. This makes automatic versioning, code verification, merge requests, documentation etc. etc. a lot simpler. Tools don't know which convention you've followed or where you've made an exception. There is a whole range of errors that in Node.js, for instance, would happen at runtime, but only sometimes, probably not during test, which in Haskell prevent your code from being compiled, and thus deployed, at all.
 The category of impure programming is a Kleisli category where the 'results' morphisms are decorated with side-effects and bottoms. I do not find this category a useful mindset for programming because it hides a lot of information, as Kleisli categories do. I did however mean the colloquial meaning of "categorically".
 In Haskell, the Set datastructure requires its elements to have some kind of ordered relation, such as ≤ for numbers, but not every type has such an order. Monads have no such restriction, so a Set is not a monad. Mathematical sets do not require order, only equality (incidentally, good luck finding the "right" definition for equality).