Fighting Bugs in Python

Everybody writes bugs. Let's see how to make sure they don't live long with debugging and unit tests. Introduction to pdb and unittest and a few advices for easy and reliable development.

About this post

I must confess: I write bugs.

But I’m not ashamed: bugs have a very short life expectancy in my programs, and I’m usually the only one to see them.

The secret with bugs is to eradicate them as soon as they appear, before they can grow big, hide, and spawn other bugs.

I assume you have bugs too, just like everybody.

In this post, I would like to tell you how to live with them, and how to avoid shipping your bugs to other people.

You will learn how to:

adopt the right frame of mind when you’re coding.
dealing with the lack of a compiler in python
use the unittest package
debug your code with pdb

Running the code

To run the code, you need python 3. If you don't have it, you can get it by installing anaconda for python 3.X .

Before each code block below, a file name is indicated. Create a file with this name and put the code in this file.

The right attitude

Don’t trust yourself.

Accept that you will write bugs, and that they can appear in the most simple piece of code that you will write.

Test very often.

Don't trust yourself.

No compiler, what can I do?

Python is an interpreted language . This means that the code is compiled under the hood just before being executed, at runtime. In fact, you can see the compiled python modules as .pyc files next to your python modules.

Developers coming from a compiled language like C++ are used to the following workflow:

write code
run the compiler
fix compilation errors, catching a lot of issues like syntax errors, wrong types, etc.
run the program

Most often the first objection these developers would do to python is that the lack of a compiler prevents them from using this workflow, and thus to write reliable code.

But that’s actually not an issue, since a similar workflow can be used, to some extent, in python. Here is what a developer would do:

write code
run the program
fix runtime exceptions, to catch a lot of issues like syntax errors, wrong types, etc.

The only problem with the python workflow above is that you’re only going to catch issues in the part of the code that you actually run. For example, if a function is not used by your program, you will not be able to find and fix problems in this function.

That’s why, in python, one needs to run the whole code during development, and that is the role of unit tests.

In compiled languages, unit tests are a must have for serious development. In python, they are absolutely indispensable.

The most basic test

Before we get into unit testing, I would like to comment about the most basic testing that you can do. Let’s say you want to write a class to describe a circle.

You might start like this:

circle.py:

import math

class Circle(object):
    
    def __init__(self, center=(0.,0.), radius=1.):
        self.center = center
        self.radius = radius
        
    def area(self):
        return math.pi * self.radius**2

The first thing you want to do is to try to use your brand new class. To do this, you can add a main section at the end of the module, like this:

if __name__ == '__main__':
    
    c = Circle()
    print(c.center,c.radius) 
    print (c.area())
    c.radius = 2. 
    print (c.area())

The main section is executed when the module is directly executed, and not when it’s imported in another module. You can now execute your script, and you get:

(0.0, 0.0) 1.0
3.141592653589793
12.566370614359172

That’s good, we can visually check that the default arguments are correctly stored in the circle object, and that the area method seems to work properly.

This is a perfectly acceptable way to test your code, and I do use this technique at times. But:

the more features you add to the Circle class, the more complex the main section becomes
as soon as you start to use several modules in your project, you need to think about executing them all every time you want to test.
we rely on visual analysis of the results, so the more tests, the more time it takes to verify the results.

It’s much easier to use unit tests, as we will see now.

You first unit test

The main unit testing framework in python is unittest .

We start with a test that always succeeds, just to try the unittest framework.

test_alwaysok.py:

import unittest

class TestAlwaysOk(unittest.TestCase):
    
    def test_true(self):
        self.assertTrue(True)
        
    def test_false(self):
        self.assertFalse(False)

if __name__ == '__main__':
    unittest.main()

See? it's easy. writing this takes 10s at most.

You can execute the tests in this module by doing:

python test_alwaysok.py 
..
-----------------------------------------------------
Ran 2 tests in 0.000s

OK

Another way to run all tests in a directory and its subdirectories is to do:

python -m unittest discover

Development with unit testing

What I do, except in single-module projects maybe, is to write the tests while I’m developing. I’m using the tests to run the parts of the code I’m writing. And when I’m done, the tests exist, so I can run them later again whenever I want to change anything.

Actually, unit tests make you much more confident. You know that you can engage in a major refactoring of the code without fear. If the unit tests pass. everything's going to be ok.

I usually put the test modules just next to the code they test, with a name starting with test_. But you can feel free to do otherwise. Just make sure to stick to a naming scheme of your choice for your test modules.

Let’s assume our Circle class does not exist yet, and let’s start by writing the test module. The first method to create for the Circle class is the constructor. So we write the test before even writing the constructor:

test_circle.py:

import unittest
from circle import Circle

class TestCircle(unittest.TestCase): 
    
    def test_constructor(self):
        '''simply tests that a circle can be built'''
        c = Circle( center=(0,0), radius=2)       
        
if __name__ == '__main__':
    unittest.main()

You may execute this test.

From now on, I will only mention the test methods that are added, and omit all the boilerplate code.

Now can you think of a way a circle would become unusable or ill defined?

I see an obvious way: if the radius is negative. Probably, the Circle class should be protected against negative radii. So we modify the constructor of the Circle class in this way:

circle.py:

class Circle(object):
    
    def __init__(self, center=(0.,0.), radius=1.):
        if radius < 0: 
            raise ValueError('radius must be >= 0')
        self.center = center
        self.radius = radius

And we modify our test method to check that this exception is indeed raised properly. First we write a test that fails (the exception is not raised).

test_circle.py:

    def test_constructor(self):
        '''tests that a circle can be built, and that negative radii are disallowed'''
        c = Circle( center=(0,0), radius=2)
        with self.assertRaises(ValueError):
            Circle(radius=1)

Run the test. The reason why we write a test that fails is to be sure that the test works in case of problems. Then, we implement the actual test.

test_circle.py

    def test_constructor(self):
        '''simply tests that a circle can be built'''
        c = Circle( center=(0,0), radius=2)
        with self.assertRaises(ValueError):
            Circle(radius=-1)

Run again. You should see that this time the test passes (the exception is indeed raised).

Exercise:

Write a test to check the output of the Circle.area method for several input values. Here is the list of the available assert methods.

Hint: You will want to compare floats, and comparing floats for equality is unreliable. So you should use the assertAlmostEqual method.

Unittest tricks

You already know enough to use unit tests effectively in your projects. In this section, I just want to mention a few useful tricks.

Ordering tests

Sometimes, you care about the order in which tests are executed. For example, you might want to do the easiest and least time-consuming tests first.

In this case, please note that the tests are run according to the lexicographical order of their method names.

Here is a demonstration.

test_order.py:

import unittest

class TestOrder(unittest.TestCase):
    
    def test_1(self):
        print('i run first')
        
    def test_2(self):
        print('i run second')
        
if __name__ == '__main__':
    unittest.main()

Run the test. test_1 runs first because test_1 < test_2 in lexicographical order.

Initializing and finalizing tests

Some tests require an initialization. For example, you might want to create a test input file that you want to use in all tests. You can do this in the setUp method, that is called before every test. Additionally, if you want do do something after every test, you can use the tearDown method.

Here is a typical pattern:

test_setup.py:

import unittest
import os 

class TestSetup(unittest.TestCase): 
    
    def setUp(self): 
        self.testfname = 'testfile.txt'
        if not os.path.isfile(self.testfname):
            with open(self.testfname, 'w') as ifile: 
                print('creating test file')
                testlines = ['first line\n', 'second line\n']
                ifile.writelines(testlines)
                
    def test_nlines(self):
        with open(self.testfname) as ifile:
            self.assertEqual(len(ifile.readlines()),2)
            
    def test_lines(self): 
        with open(self.testfname) as ifile:
            self.assertListEqual(ifile.readlines(),
                                 ['first line\n', 'second line\n'])
               
if __name__  == '__main__':
    unittest.main()

Please note that setUp is called before every test. But the test file is created only once, in case the test file does not already exist. If you rerun these tests, the file is not re-created since it is already there.

This might not look very useful in this simple example, but creating or dowloading test data can take a long time. That’s where you do need to use this kind of technique.

Skipping tests

In some cases, you might want to disable some tests. For example, depending on the availability of an external package: if the package is there, you test the parts of your code depending on this package. On the contrary, if it's not there, you skip these tests.

You can disable tests with a decorator, that you can use to decorate a TestCase class or a test method like this:

test_skip.py

import unittest
import datetime
import getpass

now = datetime.datetime.now().time()
noon = datetime.time(12,0)
evening = datetime.time(19,0)

user = getpass.getuser()

class TestSkip1(unittest.TestCase): 
    
    @unittest.skipIf(now<noon or now>evening, 
                     'only testing in the afternoon')
    def test_1(self): 
        '''tested in the afternoon only'''
        self.assertTrue(True)
        
@unittest.skipIf(user!='cbernet',"these are colin's private tests")
class TestSkip2(unittest.TestCase):
    
    def test_1(self):
        self.assertTrue(True)
    def test_2(self):
        self.assertTrue(True)
        
if __name__=='__main__':
    unittest.main()

Writing a test suite

If you want more control on the tests to run, you can write a test suite, instead of relying on the automatic discovery from

python -m unittest discover

Also, this will allow you to start your tests from a python script if needed. Here is a simple example.

suite.py

import unittest

from test_circle import TestCircle
from test_skip import TestSkip1, TestSkip2
from test_alwaysok import TestAlwaysOk

testcases = [
    TestAlwaysOk,
    TestCircle,
    TestSkip1,
    TestSkip2    
]

suite = unittest.TestSuite()

loader = unittest.TestLoader()
for test_class in testcases:
    tests = loader.loadTestsFromTestCase(test_class)
    suite.addTests(tests)

if __name__ == '__main__':  
    import sys
    runner = unittest.TextTestRunner(verbosity=2)
    runner.run(suite)

Debugging with pdb

If you take unit tests seriously, you will easily find bugs.

But how can you understand and fix them?

Because python does not need to be compiled, it is really easy to just add printouts in the code for debugging. A lot of people, especially those coming from the research field, are fine with this archaic debugging method.

The problem with this method is that it’s tedious and a real waste of your time:

you are forced to edit the code, possibly in several different modules, to print out the information you need.
when you run the program, you usually realize that you need more printouts
when you’re done debugging, you need to think about removing all the printouts

Here is a small buggy script.

bugged.py:

a = list(range(10))

for val in a:
    if val % 2 == 0:
        # replacing all even values in a by 0
        val = 0
        print(val)
print(a)

This script is supposed to replace all even values in a by a zero. But if you execute it, you get [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] !

That’s a classic 🙂! Can you guess what’s the problem?

We’re going to investigate with the debugger. You can start it with:

python -m pdb bugged.py

Alternatively, you could add a debugging anchor in the code, where you want debugging to start. For example, you could set the anchor after the if:

a = list(range(10))

for val in a:
    if val % 2 == 0:
        # replacing all even values in a by 0
        import pdb; pdb.set_trace() #<<<< anchor
        val = 0
        print(val)
print(a)

Then, you can just run the script as usual. In the following, I assume you use the first method.

So after starting the script in debug mode, you end up on the pdb prompt, at the beginning of the execution:

-> a = list(range(10))
(Pdb)

The line on which you are has not been executed yet. The main pdb commands are:

l: list the code around your position
b: set a breakpoint
n: go to next line
c: continue until next break point
p: print something
+ most python commands

List the code. You should get:

(Pdb) l
  1  -> a = list(range(10))
  2     
  3     for val in a:
  4         if val % 2 == 0:
  5             # replacing all even values in a by 0
  6             val = 0
  7             print(val)
  8     print(a)
[EOF]

Add a breakpoint on line 6, and continue to this breakpoint:

(Pdb) b 6 
Breakpoint 1 at /Users/cbernet/Google Drive/Colab Notebooks/maldives/bugs/test/bugged.py:6
(Pdb) c
> /Users/cbernet/Google Drive/Colab Notebooks/maldives/bugs/test/bugged.py(6)<module>()
-> val = 0

We can see that val is equal to zero even before the line above:

(Pdb) val
0

That’s normal since we are looping on array a, which starts with 0. Let’s continue till we hit the same breakpoint again:

(Pdb) c
0
> /Users/cbernet/Google Drive/Colab Notebooks/maldives/bugs/test/bugged.py(6)<module>()
-> val = 0
(Pdb) val
2

Obviously, we didn’t see val = 1 since our breakpoint is under the if. Now let’s get some information on array a

(Pdb) p a[2]
2

That's expected since we haven’t assigned val to 0 yet. let’s go to the next line:

(Pdb) n
> /Users/cbernet/Google Drive/Colab Notebooks/maldives/bugs/test/bugged.py(7)<module>()
-> print(val)
(Pdb) p val
0
(Pdb) p a[2]
2

So we have set val to 0 but a[2] is still equal to 2!

My goal here was simply to illustrate the use of the python debugger. But if you don’t know why a[2] has not been set to 0, here is the explanation.

In python, variables are like labels. When we do val=0, it means: take label val and stick it (we say bind it) to value 0. So in the case of our buggy program, what we have done is the following:

loop on array a, and bind label val to the values in a sequentially
if the value is even, take label val and bind it to value 0.

In other words, we just moved a label from one value to another, and there is no reason why this would have changed a[2].

Now let's print the addresses of val, a[2], and 0 in the debugger:

(Pdb) print(id(val), id(a[2]), id(0))
4326745488 4326745552 4326745488

We see that val has the same address as 0 since it’s bound to this value. a[2] has a different address corresponding to a different value.

Conclusion

You are now well equipped to fight bugs effectively.

Attitude :

You should accept that you’re going to write bugs, just like everybody else. Never trust yourself, test a lot, and test often.

unittest :

Unit tests are essential, especially in python. We have seen how easy it is to write them, and how you can integrate unit tests in your development workflow. Now you can simply go ahead and use them!

pdb :

The python debugger is extremely convenient, and will save you a lot of time with respect to cluttering your code with debug printouts.

Please let me know what you think in the comments! I’ll try and answer all questions.

And if you liked this article, you can subscribe to my mailing list to be notified of new posts (no more than one mail per week I promise.)

Back Home

Learn about Data Science and Machine Learning!

You can join my mailing list for new posts and exclusive content: