The Three of Wands

My free software stuff.

attrs III: Frozen Classes

This is the third post in my series on the inner workings of attrs. Here are the others:

What are Frozen Classes

Frozen, in this context, is a synonym for immutable. The term frozen was chosen because there's precedent in the standard library - frozensets are immutable sets.

Frozen classes are, then, classes that can't be modified after they've been created. In order to have a fully frozen class, all attributes of the class should hold immutable values too.

Attrs supports frozen classes (all examples are Python 3):

import attr

@attr.s(frozen=True)
class C:
    a = attr.ib()
>>> i = C(1)
>>> i
C(a=1)
>>> i.a = 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tin/pg/attrs/src/attr/_make.py", line 240, in _frozen_setattrs
    raise FrozenInstanceError()
attr.exceptions.FrozenInstanceError

What's the Use

Fully frozen classes are generally easier to reason about - they can be treated as simple values, like integers. Once you have obtained an integer somehow, you don't have to care about what happens to it when you pass it to a function as an argument or cache it somewhere, because you know exactly what will happen to it - nothing.

When you write a function that takes an immutable object as an argument there are no questions of ownership (it doesn't matter) or whether it's safe to change the object (you can't).

If your class has internal state and you have to share it somehow, you'll often think whether to share the data directly or return a copy. If you expose frozen data, you don't have to consider this any more - it's safe to share. You wouldn't think twice about returning a string you're holding a reference to, so why think twice about other types?

All of this removes a cognitive burden.

In some languages immutable objects are also useful for thread safety reasons; in Python this is less so because of the GIL. Still, sharing immutable data is always preferable to sharing mutable data.

Your class also needs be fully frozen if you want to use instances of it in sets or as keys in dictionaries. Technically, your class should just have a well-defined __hash__ method, but this method should depend on the values of the attributes, and the value returned from it shouldn't change once you've put your instance in a set or dictionary. If it does the set or dictionary will malfunction, and silently. Observe:

# Don't do this at home.
@attr.s(frozen=False, hash=True)
class W:                # 'W' for wrong.
    a = attr.ib()
>>> i = W(1)
>>> d = {i: 1}
>>> i in d
True
>>> i.a = 2
>>> i in d
False

This means it doesn't really make sense to implement __hash__ without the class being frozen, since __hash__ is almost exclusively used for dictionaries and sets.

There are several drawbacks to using frozen classes though.

When you want to change a frozen instance, you need to make a copy. Creating a new instance is obviously less efficient than changing the attribute of an existing instance. Additionally, attrs frozen classes incur a small penalty to the speed of their __init__ compared to non-frozen classes; more on this later. My personal experience is that frozen classes are worth it in the vast majority of cases. If you're unsure whether the speed penalty is prohibitive, chances are very good it's not.

Creating a new instance is usually more awkward to express in code. Attrs happens to come with attr.evolve to help with this. Just give evolve the instance you want to change, and tell it which attributes to change.

>>> i = C(1)
>>> i
C(a=1)
>>> attr.evolve(i, a=2)
C(a=2)

If your class contains dictionaries, you'll have a bad time. Lists can be replaced with tuples and sets with frozensets, but for some reason Python still lacks a frozendict. There was once talk of adding a frozendict to Python - the rejected PEP416. The PEP itself lists the reasons of its rejection, none of which are especially convincing to me.

Python 3 contains an immutable dict wrapper, types.MappingProxyType. Alas, while MappingProxyType will indeed wrap a dictionary and effectively make it immutable, it won't make it hashable. Therefore your only recourse if your class needs a dictionary attribute is to exempt the attribute from __hash__ (which could lead to subtle bugs) or use a far more awkward data structure, like a tuple of key/value tuples. Sorry.

How attrs Creates Frozen Classes

Note: at the time of writing, the most recent attrs release is v17.2.0.

Attrs sticks two additional methods onto your class to make it frozen (example simplified for brevity):

def _frozen_setattrs(self, name, value):
    raise FrozenInstanceError()

C.__setattr__ = _frozen_setattrs
C.__delattr__ = _frozen_setattrs

This will make your class raise an exception if you try modifying it in a straightforward way.

Very few things are 100% immutable if you're willing to hack at them, in Python or any other language. For example, if your class is a dict class, you can modify the underlying instance dictionary easily.

>>> i = C(1)
>>> i.a = 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tin/pg/attrs/src/attr/_make.py", line 240, in _frozen_setattrs
    raise FrozenInstanceError()
attr.exceptions.FrozenInstanceError
>>> i.__dict__['a'] = 2
>>> i
C(a=2)

Another way to bypass __setattr__, which works for both slot and dict classes, is to use object.__setattr__ directly. This is the approach attrs itself uses in __init__ if the class is frozen. The generated __init__ is equivalent to:

class C:
    def __init__(self, a):
        object.__setattr__(self, 'a', a)

We currently have an open issue dealing with the post-init hook, __attrs_post_init__, in frozen classes; until we figure out an elegant solution just use the object.__setattr__ trick I demonstrated.

This approach carries with it a small speed penalty. Let's compare using the perf library (pip install perf).

$ pyperf timeit -g --rigorous --duplicate 5 -s "import attr; C = attr.make_class('C', ['a'])" "C(1)"
.........................................
487 ns:  4 ###################
491 ns:  2 #########
494 ns:  5 #######################
497 ns:  8 #####################################
500 ns: 12 ########################################################
503 ns:  9 ##########################################
506 ns: 11 ###################################################
509 ns: 17 ###############################################################################
512 ns:  6 ############################
515 ns:  7 #################################
518 ns:  8 #####################################
522 ns:  3 ##############
525 ns:  6 ############################
528 ns: 13 ############################################################
531 ns:  5 #######################
534 ns:  1 #####
537 ns:  0 |
540 ns:  2 #########
543 ns:  0 |
546 ns:  0 |
549 ns:  1 #####

Mean +- std dev: 513 ns +- 13 ns

$ pyperf timeit -g --rigorous --duplicate 5 -s "import attr; C = attr.make_class('C', ['a'], frozen=True)" "C(1)"
.........................................
829 ns:  1 #####
835 ns:  2 #########
840 ns:  7 #################################
846 ns:  6 ############################
851 ns:  9 ##########################################
857 ns: 11 ###################################################
862 ns:  9 ##########################################
868 ns: 17 ###############################################################################
873 ns: 15 ######################################################################
879 ns: 16 ##########################################################################
884 ns:  7 #################################
890 ns:  3 ##############
895 ns:  7 #################################
901 ns:  3 ##############
906 ns:  1 #####
912 ns:  3 ##############
917 ns:  1 #####
923 ns:  0 |
929 ns:  1 #####
934 ns:  0 |
940 ns:  1 #####

Mean +- std dev: 873 ns +- 20 ns

The __init__ is 70% slower. Note two things, however: we're dealing with nanoseconds, and this is a trivial example. If the class has six attributes instead of one, and all the attributes have a very simple converter and validator, the calculation changes to just 12% slower.

Still, we can do better, and we plan to.

For example, we could make frozen dict classes use the instance __dict__ directly in their __init__s; this eliminates the speed penalty completely. But what about slot classes?

Issue #133 is about individual frozen attributes. This is an interesting proposal, but not really at the top of my priority list. While brainstorming a solution, however, I figured out a way to make a fast, Cython-based wrapper for slot descriptors that makes attributes read-only. Since slot classes don't support extra attributes anyway, we could make slot classes frozen in the following way:

  • Recognize a slot class with every attribute individually frozen is equivalent to a frozen class.
  • Replace the ordinary, read/write slot descriptors with the read-only wrappers. That way, there is no overhead for reading, and modifying will throw exceptions.
  • Stash the ordinary, read/write descriptors under a private, "secret" name, like C.__a_frozen. These descriptors become effectively private setters for the public, read-only wrappers. Use these private descriptors in __init__:
class C:
    def __init__(self, a):
        self.__a_frozen = a

There, essentially zero-overhead frozen classes. The only problem is the fact we will require Cython to be fast, which introduces additional complexity. Having Cython available also opens other doors, so it's probably worth it. This is something I'm planning to tackle after attrs v17.3.0 is out. The initial effort can be seen over at PR #172, but work on it is paused at the moment.

Comments is loading...

Comments is loading...