Sets
In Python, a Sets are a unique data type inspired by mathematical sets. They are a collection of
elements, like a list
or dict
, but there is
no ordering of the elements, and there can be no duplicate elements. With that in mind,
it should be clear that the main use of a set
is to have unique set of values, ideal when
dealing with data prone to duplication, and then testing membership
against that set
.
Like mathematical sets, Python sets support operations such as union, insertion, difference, and containment.
Rusty on mathematical sets? Check out these videos!
General Set Theory
Union & Intersection of Sets
Difference of Sets
Set Theory: Containment
Some programmers think of sets in Python as keyless dictionaries because a set must contain unique values, the order is not (yet) guaranteed, the elements are iterable, and they must be hashable.
"Hashable?
A hashable object has a consistent hash value throughout its lifetime, and it can be compared for equality with other objects.
A hash value, also known as a message digest or fingerprint, is a unique, fixed-length string produced by a hash function based on an input data.
Sets are mutable objects in Python, similar to alist
, which, somewhat ironcially,
makes sets non-hashable (see note above) and therefore a set
cannot be an element of a
set
. 😕
Frozenset
Check out the Python Standard Library type FrozenSet if you do require nested sets.
Creating Sets
In Python, a set is declared with curly braces (hence one of the comparison to keyless dictionaries):
Outputs:
Colours: {'green', 'blue', 'red', 'orange'}
Object type of colours variable: <class 'set'>
Chess Pieces: {'pawn', 'knight', 'rook', 'king', 'queen', 'bishop'}
Object type of chess_pieces variable: <class 'set'>
Notice how even though duplicate values were passed in, all the elements of the set
are unique
and duplicates are ignored.
Tip
To create an empty set
, use variable = set()
, as variable = {}
would create an empty
dict
object.
Sets can use most of the familiar control structures in Python iterables such as
for
loops and membership testing.
Other familiar functions for sequences also work such as set.clear()
, set.copy()
, len(set)
,
and so on.
Managing Set Elements
Because of their mathematical underpinnings, sets have functionality that other iterables do not.
Adding Elements
Unlike other iterables, there is no append()
or insert()
methods for sets as the order is not
guaranteed. Instead, sets use add()
, and update()
:
Adding Elements to Sets | |
---|---|
- Adding two sets is referred to as the union of sets. In this case, the union of
s1
ands2
is assigned tos1
. See Merging Dictionaries for usage ofupdate()
method.
Would output:
It is important to understand that adding the duplicate elements to a set
will be ignored
as the unique element already exists.
Removing Elements
Python offers two methods to remove elements from a set, remove()
, and discard()
.
Discarding Elements | |
---|---|
- This will not raise an error if "host13" is not present. See tip below.
Would output:
Remove with Caution!
Trying to use the set.remove()
method on an element that doesn't exist in the set
will
result in a KeyError
exception. This may be useful in some situations, it is good to be
aware of both options.
Comparing Sets
Subsets and Supersets
It is possible to compare sets, returning a boolean
value. When
all the elements of one set are contained in a second set, the first set is called a subset
of the second, and the second is called a superset of the first. Testing for subsets and
supersets is common. Below is a table showing comparison operators for raw sets:
Statement | True/False? | Explanation |
---|---|---|
{2,3} < {2,3,4} |
✅ | The first set is a subset of the second. |
{2,3} < {2,3} |
❌ | The first set is not greater than the second. |
{2,3} <= {2,3,4} |
✅ | The second set is a superset of the first. |
{2,3} >= {2,3,4} |
❌ | The first set is not a superset of the second. |
Rather than having to depend on comparison operators, Python includes useful methods which evaluate
set relationships - issubset()
and issuperset()
:
Yields:
Manager Nodes: {'host7', 'host1', 'host12'}
Cluster Nodes: {'host3', 'host12', 'host5', 'host1', 'host2', 'host10', 'host7', 'host6', 'host8', 'host4', 'host9', 'host11'}
Manager Nodes are present in Cluster Nodes: True
Cluster Nodes contain all Manager Nodes: True
Unions and Intersections
The union of two sets is all the elements of both sets. The intersection of two sets is only
the elements that appear in both sets. It is possible to use logical operators to perform unions
and intersections (union is set1 | set2
, while intersection is set1 & set2
), but Python
provides useful methods for these operations. Below is a (somewhat contrived) example:
Set Unions and Intersections | |
---|---|
- Maybe mechanics and carpenters can get along after all! 😄
Returns:
Intersection: {'screwdriver', 'hammer'}
Union: {'saw', 'level', 'screwdriver', 'chisel', 'jack', 'hammer', 'pliers', 'wrench'}
Difference of Sets
The difference of sets is all the elements in one set that do not exist in another. This can be
calculated using arithmetic operators - set1 - set2
would yield all the elements in set1
that
are not in set2
. Python also offers handy methods which are more explicit than the arithmetic
operators (for most programmers). To continue our contrive example from above:
Difference of Sets | |
---|---|
Returns:
Disjoint
If two sets do not have any elements in common, it is disjoint, meaning their union is empty. In Python, this can be a useful operation to ensure there is no duplication between two sets of data.
Returns: