I must confess: I write bugs.
But I’m not ashamed: bugs have a very short life expectancy in my programs, and I’m usually the only one to see them.
The secret with bugs is to eradicate them as soon as they appear, before they can grow big, hide, and spawn other bugs.
I assume you have bugs too, just like everybody.
In this post, I would like to tell you how to live with them, and how to avoid shipping your bugs to other people.
You will learn how to:
To run the code, you need python 3. If you don't have it, you can get it by installing anaconda for python 3.X .
Before each code block below, a file name is indicated. Create a file with this name and put the code in this file.
Don’t trust yourself.
Accept that you will write bugs, and that they can appear in the most simple piece of code that you will write.
Test very often.
Don't trust yourself.
Python is an interpreted language . This means that the code is compiled under the hood just before being executed, at runtime. In fact, you can see the compiled python modules as .pyc files next to your python modules.
Developers coming from a compiled language like C++ are used to the following workflow:
Most often the first objection these developers would do to python is that the lack of a compiler prevents them from using this workflow, and thus to write reliable code.
But that’s actually not an issue, since a similar workflow can be used, to some extent, in python. Here is what a developer would do:
The only problem with the python workflow above is that you’re only going to catch issues in the part of the code that you actually run. For example, if a function is not used by your program, you will not be able to find and fix problems in this function.
That’s why, in python, one needs to run the whole code during development, and that is the role of unit tests.
In compiled languages, unit tests are a must have for serious development. In python, they are absolutely indispensable.
Before we get into unit testing, I would like to comment about the most basic testing that you can do. Let’s say you want to write a class to describe a circle.
You might start like this:
import math class Circle(object): def __init__(self, center=(0.,0.), radius=1.): self.center = center self.radius = radius def area(self): return math.pi * self.radius**2
The first thing you want to do is to try to use your brand new class. To do this, you can add a main section at the end of the module, like this:
if __name__ == '__main__': c = Circle() print(c.center,c.radius) print (c.area()) c.radius = 2. print (c.area())
The main section is executed when the module is directly executed, and not when it’s imported in another module. You can now execute your script, and you get:
(0.0, 0.0) 1.0 3.141592653589793 12.566370614359172
That’s good, we can visually check that the default arguments are correctly stored in the circle object, and that the area method seems to work properly.
This is a perfectly acceptable way to test your code, and I do use this technique at times. But:
It’s much easier to use unit tests, as we will see now.
The main unit testing framework in python is unittest .
We start with a test that always succeeds, just to try the unittest framework.
import unittest class TestAlwaysOk(unittest.TestCase): def test_true(self): self.assertTrue(True) def test_false(self): self.assertFalse(False) if __name__ == '__main__': unittest.main()
See? it's easy. writing this takes 10s at most.
You can execute the tests in this module by doing:
python test_alwaysok.py .. ----------------------------------------------------- Ran 2 tests in 0.000s OK
Another way to run all tests in a directory and its subdirectories is to do:
python -m unittest discover
What I do, except in single-module projects maybe, is to write the tests while I’m developing. I’m using the tests to run the parts of the code I’m writing. And when I’m done, the tests exist, so I can run them later again whenever I want to change anything.
Actually, unit tests make you much more confident. You know that you can engage in a major refactoring of the code without fear. If the unit tests pass. everything's going to be ok.
I usually put the test modules just next to the code they test, with a name starting with test_. But you can feel free to do otherwise. Just make sure to stick to a naming scheme of your choice for your test modules.
Let’s assume our Circle class does not exist yet, and let’s start by writing the test module. The first method to create for the Circle class is the constructor. So we write the test before even writing the constructor:
import unittest from circle import Circle class TestCircle(unittest.TestCase): def test_constructor(self): '''simply tests that a circle can be built''' c = Circle( center=(0,0), radius=2) if __name__ == '__main__': unittest.main()
You may execute this test.
From now on, I will only mention the test methods that are added, and omit all the boilerplate code.
Now can you think of a way a circle would become unusable or ill defined?
I see an obvious way: if the radius is negative. Probably, the Circle class should be protected against negative radii. So we modify the constructor of the Circle class in this way:
class Circle(object): def __init__(self, center=(0.,0.), radius=1.): if radius < 0: raise ValueError('radius must be >= 0') self.center = center self.radius = radius
And we modify our test method to check that this exception is indeed raised properly. First we write a test that fails (the exception is not raised).
def test_constructor(self): '''tests that a circle can be built, and that negative radii are disallowed''' c = Circle( center=(0,0), radius=2) with self.assertRaises(ValueError): Circle(radius=1)
Run the test. The reason why we write a test that fails is to be sure that the test works in case of problems. Then, we implement the actual test.
def test_constructor(self): '''simply tests that a circle can be built''' c = Circle( center=(0,0), radius=2) with self.assertRaises(ValueError): Circle(radius=-1)
Run again. You should see that this time the test passes (the exception is indeed raised).
Write a test to check the output of the Circle.area method for several input values. Here is the list of the available assert methods.
Hint: You will want to compare floats, and comparing floats for equality is unreliable. So you should use the assertAlmostEqual method.
You already know enough to use unit tests effectively in your projects. In this section, I just want to mention a few useful tricks.
Sometimes, you care about the order in which tests are executed. For example, you might want to do the easiest and least time-consuming tests first.
In this case, please note that the tests are run according to the lexicographical order of their method names.
Here is a demonstration.
import unittest class TestOrder(unittest.TestCase): def test_1(self): print('i run first') def test_2(self): print('i run second') if __name__ == '__main__': unittest.main()
Run the test. test_1 runs first because test_1 < test_2 in lexicographical order.
Some tests require an initialization. For example, you might want to create a test input file that you want to use in all tests. You can do this in the setUp method, that is called before every test. Additionally, if you want do do something after every test, you can use the tearDown method.
Here is a typical pattern:
import unittest import os class TestSetup(unittest.TestCase): def setUp(self): self.testfname = 'testfile.txt' if not os.path.isfile(self.testfname): with open(self.testfname, 'w') as ifile: print('creating test file') testlines = ['first line\n', 'second line\n'] ifile.writelines(testlines) def test_nlines(self): with open(self.testfname) as ifile: self.assertEqual(len(ifile.readlines()),2) def test_lines(self): with open(self.testfname) as ifile: self.assertListEqual(ifile.readlines(), ['first line\n', 'second line\n']) if __name__ == '__main__': unittest.main()
Please note that setUp is called before every test. But the test file is created only once, in case the test file does not already exist. If you rerun these tests, the file is not re-created since it is already there.
This might not look very useful in this simple example, but creating or dowloading test data can take a long time. That’s where you do need to use this kind of technique.
In some cases, you might want to disable some tests. For example, depending on the availability of an external package: if the package is there, you test the parts of your code depending on this package. On the contrary, if it's not there, you skip these tests.
You can disable tests with a decorator, that you can use to decorate a TestCase class or a test method like this:
import unittest import datetime import getpass now = datetime.datetime.now().time() noon = datetime.time(12,0) evening = datetime.time(19,0) user = getpass.getuser() class TestSkip1(unittest.TestCase): @unittest.skipIf(now<noon or now>evening, 'only testing in the afternoon') def test_1(self): '''tested in the afternoon only''' self.assertTrue(True) @unittest.skipIf(user!='cbernet',"these are colin's private tests") class TestSkip2(unittest.TestCase): def test_1(self): self.assertTrue(True) def test_2(self): self.assertTrue(True) if __name__=='__main__': unittest.main()
If you want more control on the tests to run, you can write a test suite, instead of relying on the automatic discovery from
python -m unittest discover
Also, this will allow you to start your tests from a python script if needed. Here is a simple example.
import unittest from test_circle import TestCircle from test_skip import TestSkip1, TestSkip2 from test_alwaysok import TestAlwaysOk testcases = [ TestAlwaysOk, TestCircle, TestSkip1, TestSkip2 ] suite = unittest.TestSuite() loader = unittest.TestLoader() for test_class in testcases: tests = loader.loadTestsFromTestCase(test_class) suite.addTests(tests) if __name__ == '__main__': import sys runner = unittest.TextTestRunner(verbosity=2) runner.run(suite)
If you take unit tests seriously, you will easily find bugs.
But how can you understand and fix them?
Because python does not need to be compiled, it is really easy to just add printouts in the code for debugging. A lot of people, especially those coming from the research field, are fine with this archaic debugging method.
The problem with this method is that it’s tedious and a real waste of your time:
Here is a small buggy script.
a = list(range(10)) for val in a: if val % 2 == 0: # replacing all even values in a by 0 val = 0 print(val) print(a)
This script is supposed to replace all even values in a by a zero. But if you execute it, you get [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] !
That’s a classic 🙂! Can you guess what’s the problem?
We’re going to investigate with the debugger. You can start it with:
python -m pdb bugged.py
Alternatively, you could add a debugging anchor in the code, where you want debugging to start. For example, you could set the anchor after the if:
a = list(range(10)) for val in a: if val % 2 == 0: # replacing all even values in a by 0 import pdb; pdb.set_trace() #<<<< anchor val = 0 print(val) print(a)
Then, you can just run the script as usual. In the following, I assume you use the first method.
So after starting the script in debug mode, you end up on the pdb prompt, at the beginning of the execution:
-> a = list(range(10)) (Pdb)
The line on which you are has not been executed yet. The main pdb commands are:
List the code. You should get:
(Pdb) l 1 -> a = list(range(10)) 2 3 for val in a: 4 if val % 2 == 0: 5 # replacing all even values in a by 0 6 val = 0 7 print(val) 8 print(a) [EOF]
Add a breakpoint on line 6, and continue to this breakpoint:
(Pdb) b 6 Breakpoint 1 at /Users/cbernet/Google Drive/Colab Notebooks/maldives/bugs/test/bugged.py:6 (Pdb) c > /Users/cbernet/Google Drive/Colab Notebooks/maldives/bugs/test/bugged.py(6)<module>() -> val = 0
We can see that val is equal to zero even before the line above:
(Pdb) val 0
That’s normal since we are looping on array a, which starts with 0. Let’s continue till we hit the same breakpoint again:
(Pdb) c 0 > /Users/cbernet/Google Drive/Colab Notebooks/maldives/bugs/test/bugged.py(6)<module>() -> val = 0 (Pdb) val 2
Obviously, we didn’t see val = 1 since our breakpoint is under the if. Now let’s get some information on array a
(Pdb) p a 2
That's expected since we haven’t assigned val to 0 yet. let’s go to the next line:
(Pdb) n > /Users/cbernet/Google Drive/Colab Notebooks/maldives/bugs/test/bugged.py(7)<module>() -> print(val) (Pdb) p val 0 (Pdb) p a 2
So we have set val to 0 but a is still equal to 2!
My goal here was simply to illustrate the use of the python debugger. But if you don’t know why a has not been set to 0, here is the explanation.
In python, variables are like labels. When we do val=0, it means: take label val and stick it (we say bind it) to value 0. So in the case of our buggy program, what we have done is the following:
In other words, we just moved a label from one value to another, and there is no reason why this would have changed a.
Now let's print the addresses of val, a, and 0 in the debugger:
(Pdb) print(id(val), id(a), id(0)) 4326745488 4326745552 4326745488
We see that val has the same address as 0 since it’s bound to this value. a has a different address corresponding to a different value.
You are now well equipped to fight bugs effectively.
You should accept that you’re going to write bugs, just like everybody else. Never trust yourself, test a lot, and test often.
Unit tests are essential, especially in python. We have seen how easy it is to write them, and how you can integrate unit tests in your development workflow. Now you can simply go ahead and use them!
The python debugger is extremely convenient, and will save you a lot of time with respect to cluttering your code with debug printouts.
Please let me know what you think in the comments! I’ll try and answer all questions.
And if you liked this article, you can subscribe to my newsletter to be notified of new posts (no more than one mail per week I promise.)
You can join my newsletter to learn more about machine learning and data: