Introduction
Python is a simple language. Many people pick Python as their first language to learn. While you are coding in Python happily, have you ever curious about how a class is instantiated?
TL;DR: Learn how Python instantiate a class from concept level to reading source code.
Starting Point
Let take a look of this code snippet. We define a method that return an integer. And then we call the method to get the value.
def my_method():
return 1
a = my_method()
Then we take a look of this code snippet. We define a class, and then we instantiate the class.
class MyClass:
def __init__(self):
pass
myclass = MyClass()
Inside the method, we need to use return to get the value. But for the class, there is no return statement. Why we can get the instance of the class? There must be something hidden.
The hidden base class
Python is a OOP language, i.e. we can use inheritance.
In Python, we implement inheritance like this, which the derived class MyClass is inherited from the base class SuperClass.
class SuperClass():
pass
class MyClass(SuperClass):
pass
Now I want you to know every class has a default base class in Python 3, which is called object.
If you have been using Python for a while, you might have an idea seeing people define a class in this way. This is because Python 2 does not auto apply class object as the base class.
class MyClass(object):
pass
We can use dir method to verify it. The dir method returns all attributes and methods of the specified object, without the values.
>>> dir(object)
['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__']
>>> dir(MyClass)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__']
We can confirm that the class MyClass inherits all the things from class object.
You can also use issubclass(MyClass, object)
to verify it.
From the result of dir method, there is one method we need to be aware of, which is __new__. __new__ method is responsible to create instance from the class by allocating memory and initialising necessary fields,
Now we know the hidden return statement is inside __new__ method.
# Conceptual representation.
class object:
def __new__(cls):
instance = create_and_initialise(cls)
return instance
Ok. You should spot out two points. I will answer these points one-by-one.
- Where is the cls.__init__ method called?
- What you mean by conceptual?
Introduction of metaclass
I mentioned every class has a hidden base class Object.
Now I also want you to know every class has a hidden metaclass, which is called type.
Generally speaking metaclass is a class of the class that defines the behaviour of its class instances. When you create an instance from a class, Python needs to create the class itself first, as a form of instances of the metaclass.
Inside the metaclass, it defines how the class creates instances of itself in the __call__ method, which triggers __new__ method and then __init__ method from the class.
# Conceptual representation.
class type:
def __call__(cls, *args, **kwargs):
instance = cls.__new__(cls)
cls.__init__(cls, *args, **kwargs)
return instance
Now we have the full picture of how __init__ method is triggered.
Deep dive into Python implementation source code
We know Python is an interpreted programming language, it needs an interpreter to translate your Python code to bytecode and get it running.
Interpreter is a program. So we need to ask how Python interpreter is developed? The answer is CPython.
CPython is the official implementation of the interpreter. As the name implies, it is written in C language. When you download Python in the official page, you are using CPython based interpreter.
There are other Python implementations, such as Jython (Java based implementation), PyPy (Python based implementation).
How CPython is related to our topic?
CPython does not only translate your Python source code, it also included Python standard libraries.
Let say when you use print command in Python, do you notice that you never need to import any library, while in C you need to import standard library . This is because Python interpreter (CPython) help you to do so.
The implementation of class object and class type are part of the CPython. It is not written in Python directly, which is why I mention it as conceptual implementation in pythonic way.
Here is the most exciting part, we are going to look into the CPython source code
The CPython implementation of __new__ method
Let us visit CPython Github repository.
At the time I write this article, it is Python version 3.11.0 alpha 7.
The first file we need to look for is /include/object.h
file. It defines a struct called _object. Every instance of class is _object in C implementation.
struct _object {
_PyObject_HEAD_EXTRA
Py_ssize_t ob_refcnt;
PyTypeObject *ob_type;
};
There are three fields:
_PyObject_HEAD_EXTRA
It is for debug usage. We can skip it.Py_ssize_t ob_refcnt
Storing reference counter for garbage collection management.PyTypeObject *ob_type
The type of the object, i.e. the class of the object.
In /include/pytypedefs.h
file. We can see it defines an alias name PyObject to struct _object. All other CPython source code always references PyObject instead of _object.
typedef struct _object PyObject;
In /Objects/object.c
file, we can find the actual implementation of the __new__ method.
It returns a PyObject, such as instance of class.
PyObject *
_PyObject_New(PyTypeObject *tp)
{
PyObject *op = (PyObject *) PyObject_Malloc(_PyObject_SIZE(tp));
if (op == NULL) {
return PyErr_NoMemory();
}
_PyObject_Init(op, tp);
return op;
}
In /Objects/call.c
file, we can find the actual implementation of the __call__ method.
PyObject *
_PyObject_Call(PyThreadState *tstate, PyObject *callable,
PyObject *args, PyObject *kwargs)
{
ternaryfunc call;
PyObject *result;
/* PyObject_Call() must not be called with an exception set,
because it can clear it (directly or indirectly) and so the
caller loses its exception */
assert(!_PyErr_Occurred(tstate));
assert(PyTuple_Check(args));
assert(kwargs == NULL || PyDict_Check(kwargs));
vectorcallfunc vector_func = _PyVectorcall_Function(callable);
if (vector_func != NULL) {
return _PyVectorcall_Call(tstate, vector_func, callable, args, kwargs);
}
else {
call = Py_TYPE(callable)->tp_call;
if (call == NULL) {
_PyErr_Format(tstate, PyExc_TypeError,
"'%.200s' object is not callable",
Py_TYPE(callable)->tp_name);
return NULL;
}
if (_Py_EnterRecursiveCall(tstate, " while calling a Python object")) {
return NULL;
}
result = (*call)(callable, args, kwargs);
_Py_LeaveRecursiveCall(tstate);
return _Py_CheckFunctionResult(tstate, callable, result, NULL);
}
}
The line result = (*call)(callable, args, kwargs);
is actually calling another function called type_call
, which is defined in /Objects/typeobject.c
file.
static PyObject *
type_call(PyTypeObject *type, PyObject *args, PyObject *kwds)
{
PyObject *obj;
PyThreadState *tstate = _PyThreadState_GET();
#ifdef Py_DEBUG
/* type_call() must not be called with an exception set,
because it can clear it (directly or indirectly) and so the
caller loses its exception */
assert(!_PyErr_Occurred(tstate));
#endif
/* Special case: type(x) should return Py_TYPE(x) */
/* We only want type itself to accept the one-argument form (#27157) */
if (type == &PyType_Type) {
assert(args != NULL && PyTuple_Check(args));
assert(kwds == NULL || PyDict_Check(kwds));
Py_ssize_t nargs = PyTuple_GET_SIZE(args);
if (nargs == 1 && (kwds == NULL || !PyDict_GET_SIZE(kwds))) {
obj = (PyObject *) Py_TYPE(PyTuple_GET_ITEM(args, 0));
Py_INCREF(obj);
return obj;
}
/* SF bug 475327 -- if that didn't trigger, we need 3
arguments. But PyArg_ParseTuple in type_new may give
a msg saying type() needs exactly 3. */
if (nargs != 3) {
PyErr_SetString(PyExc_TypeError,
"type() takes 1 or 3 arguments");
return NULL;
}
}
if (type->tp_new == NULL) {
_PyErr_Format(tstate, PyExc_TypeError,
"cannot create '%s' instances", type->tp_name);
return NULL;
}
obj = type->tp_new(type, args, kwds);
obj = _Py_CheckFunctionResult(tstate, (PyObject*)type, obj, NULL);
if (obj == NULL)
return NULL;
/* If the returned object is not an instance of type,
it won't be initialized. */
if (!PyObject_TypeCheck(obj, type))
return obj;
type = Py_TYPE(obj);
if (type->tp_init != NULL) {
int res = type->tp_init(obj, args, kwds);
if (res < 0) {
assert(_PyErr_Occurred(tstate));
Py_DECREF(obj);
obj = NULL;
}
else {
assert(!_PyErr_Occurred(tstate));
}
}
return obj;
}
Let translate this function in pythonic style:
- Trigger __new__ method to get instance of the class.
- Check the returned instance is an instance of the class.
- If no, return the instance immediately. If yes, call __init__ method and then return the instance.
Summary
Reading CPython source code is not a trivial task. There are lots of details I do not cover. Anyway I hope you can learn more about Python after reading my article.
If you like my article, please give me some reactions as an encouragement. Thank you :)
Top comments (0)