Python – Using a bytearray rather than a string to store password in memory

python

Using a bytearray datatype to store a password (in memory) has an advantage over using a string datatype for a password in that a bytearray is mutable and can be overwritten with 0x00 values when the password is no longer needed for processing and until the password variable is garbage collected. It's not clear to me, however, whether overwriting 0x00 values will achieve what is intended. When a password is no longer needed, it should be wiped from memory, prior to GC. Is this possible in python 3.4 using bytearray?

Best Answer

At least using CPython, overwriting a bytearray is the way forward. Like most systems, CPython does not clear memory prior to garbage collection, so a manual erase is required. It also does not make copies of data unless specifically requested.

Be wary of other Python implementations, however. They often differ in specifics of how memory is managed and may, for example, use either a copying or compacting garbage collector. In such a case, manually erasing the password may not be enough, as the garbage collector may have moved the object reference, leaving behind a copy of the array's contents at the old location.

Related Solutions

Documentation – Why Use ‘Equivalent to’ Instead of ‘Is’?

Because the standard writers don't want to actually assert an implementation. They want to define what it does, but not necessarily how it does it. So, for example, if you look at the GNU C++ version of find_if, you will see that the implementation is slightly different from what you give, which is based on the C++ standard:

template<typename _InputIterator, typename _Predicate>
inline _InputIterator
__find_if(_InputIterator __first, _InputIterator __last,
    _Predicate __pred, input_iterator_tag)
{
    while (__first != __last && !bool(__pred(*__first)))
     ++__first;
       return __first;
}

This is functionally equivalent to what the standard has, but not exactly the same. This gives compiler writers flexibility. There may be a better way to do it for a particular platform. The implementor may wish to use a different coding style.

This is particularly true for scripting languages like python in that the implementor may decided to implement in a completely different language for performance reasons. Someone implementing python may, for instance, write itertools.chain(*iterables) in C++. This is perfectly fine if the standard says "equivalent to" as long as the code does the same as the provided python. If the standard said "is" instead, then implementors would be required to either implement in that language, or not meet the standard.

In summary:

Because they don't want to prevent an implementation from writing better code than the standard provided
Because they don't want to prevent an implementation from using an entirely different language, to improve performance

Teaching – Why Do Textbooks Use Pseudocode Rather Than Real Languages?

No. The point of pseudo-code is that it doesn't have to compile. I can quickly gloss over irrelevant details. In contrast, even languages that look like pseudocode at the first glance can have very non-intuitive details that would just detract from the algorithm. Let's take for example Quicksort in Haskell:

qs :: Ord a => [a] -> [a]
qs [] = []
qs (pivot:xs) = (qs smaller) ++ pivot:(qs larger)
  where smaller = [x | x <- xs, x <= pivot]
        larger  = [x | x <- xs, x > pivot]

or the same in Python:

def qs(array):
  if not array:
    return []
  pivot = array[0]
  xs = array[1:]
  smaller = [x for x in xs if x <= pivot]
  larger  = [x for x in xs if x > pivot]
  return qs(smaller) + [pivot] + qs(larger)

The advantage in both cases is that this is executable code, and as such can be tested, typechecked, and toyed with by students. However, they both include syntactic details that are distracting. Students would usually be better served by pseudocode that illustrates the intention of the algorithm, not implementation details:

algorithm QUICKSORT(array)
  return [] if array is empty
  pivot ← array[0]
  xs ← array[1, ...] -- the rest of the array without the pivot
  smaller ← [x | x ∈ xs, x <= pivot] -- all smaller or equal elements
  larger ← [x | x ∈ xs, x  > pivot] -- all larger elements
  return [QUICKSORT(smaller)..., pivot, QUICKSORT(larger)...]

Notable differences:

I can just make up a list comprehension syntax that looks like maths rather than having to explain why Python has a for and if here.
I don't have to explain that language's syntax for list concatenation. Why does Python use + addition? What is : in Haskell? I can just choose a syntax that gets the point across more clearly.
the type signature Ord a => [a] -> [a] is just an implementation detail. While possibly helpful in this case, the type signatures sometimes required by Haskell can get absurd.
I don't have to explain why Python considers empty collections to be false, and what array[1:] is supposed to mean.
I avoid clever students pointing out that I should really use yield in the Python example.
Haskell sucks for explaining mutable data structures like Hash Tables, RB trees, ….
Things start getting very language-specific once we need complex records to express our algorithms. E.g. Python's object system has a few surprises that are just distracting.

That said, it can be very valuable to use one of these languages in addition to pseudocode, just carefully label what is what.

Best Answer

Related Solutions

Documentation – Why Use ‘Equivalent to’ Instead of ‘Is’?

Teaching – Why Do Textbooks Use Pseudocode Rather Than Real Languages?

Related Topic