C++ – vector::erase and reverse_iterator

citeratorstlvector

I have a collection of elements in a std::vector that are sorted in a descending order starting from the first element. I have to use a vector because I need to have the elements in a contiguous chunk of memory. And I have a collection holding many instances of vectors with the described characteristics (always sorted in a descending order).

Now, sometimes, when I find out that I have too many elements in the greater collection (the one that holds these vectors), I discard the smallest elements from these vectors some way similar to this pseudo-code:

grand_collection: collection that holds these vectors
T: type argument of my vector
C: the type that is a member of T, that participates in the < comparison (this is what sorts data before they hit any of the vectors).

std::map<C, std::pair<T::const_reverse_iterator, std::vector<T>&>> what_to_delete;
iterate(it = grand_collection.begin() -> grand_collection.end())
{
     iterate(vect_rit = it->rbegin() -> it->rend())
     {
         // ...
          what_to_delete <- (vect_rit->C, pair(vect_rit, *it))
          if (what_to_delete.size() > threshold)
               what_to_delete.erase(what_to_delete.begin());
         // ...  
     }
}

Now, after running this code, in what_to_delete I have a collection of iterators pointing to the original vectors that I want to remove from these vectors (overall smallest values). Remember, the original vectors are sorted before they hit this code, which means that for any what_to_delete[0 - n] there is no way that an iterator on position n - m would point to an element further from the beginning of the same vector than n, where m > 0.

When erasing elements from the original vectors, I have to convert a reverse_iterator to iterator. To do this, I rely on C++11's §24.4.1/1:

The relationship between reverse_iterator and iterator is
&*(reverse_iterator(i)) == &*(i- 1)

Which means that to delete a vect_rit, I use:

vector.erase(--vect_rit.base());

Now, according to C++11 standard §23.3.6.5/3:

iterator erase(const_iterator position); Effects: Invalidates
iterators and references at or after the point of the erase.

How does this work with reverse_iterators? Are reverse_iterators internally implemented with a reference to a vector's real beginning (vector[0]) and transforming that vect_rit to a classic iterator so then erasing would be safe? Or does reverse_iterator use rbegin() (which is vector[vector.size()]) as a reference point and deleting anything that is further from vector's 0-index would still invalidate my reverse iterator?

Edit:

Looks like reverse_iterator uses rbegin() as its reference point. Erasing elements the way I described was giving me errors about a non-deferenceable iterator after the first element was deleted. Whereas when storing classic iterators (converting to const_iterator) while inserting to what_to_delete worked correctly.

Now, for future reference, does The Standard specify what should be treated as a reference point in case of a random-access reverse_iterator? Or this is an implementation detail?

Thanks!

Best Answer

From a standardese point of view (and I'll admit, I'm not an expert on the standard): From §24.5.1.1:

namespace std {
    template <class Iterator>
    class reverse_iterator ...
    {
        ...
            Iterator base() const; // explicit
        ...
        protected:
            Iterator current;
        ...
    };
}

And from §24.5.1.3.3:

Iterator base() const; // explicit
    Returns: current.

Thus it seems to me that so long as you don't erase anything in the vector before what one of your reverse_iterators points to, said reverse_iterator should remain valid.

Of course, given your description, there is one catch: if you have two contiguous elements in your vector that you end up wanting to delete, the fact that you vector.erase(--vector_rit.base()) means that you've invalidated the reverse_iterator "pointing" to the immediately preceeding element, and so your next vector.erase(...) is undefined behavior.

Just in case that's clear as mud, let me say that differently:

std::vector<T> v=...;
...
// it_1 and it_2 are contiguous
std::vector<T>::reverse_iterator it_1=v.rend();
std::vector<T>::reverse_iterator it_2=it_1;
--it_2;

// Erase everything after it_1's pointee:

// convert from reverse_iterator to iterator
std::vector<T>::iterator tmp_it=it_1.base();

// but that points one too far in, so decrement;
--tmp_it;

// of course, now tmp_it points at it_2's base:
assert(tmp_it == it_2.base());

// perform erasure
v.erase(tmp_it);  // invalidates all iterators pointing at or past *tmp_it
                  // (like, say it_2.base()...)

// now delete it_2's pointee:
std::vector<T>::iterator tmp_it_2=it_2.base(); // note, invalid iterator!

// undefined behavior:
--tmp_it_2;
v.erase(tmp_it_2);

In practice, I suspect that you'll run into two possible implementations: more commonly, the underlying iterator will be little more than a (suitably wrapped) raw pointer, and so everything will work perfectly happily. Less commonly, the iterator might actually try to track invalidations/perform bounds checking (didn't Dinkumware STL do such things when compiled in debug mode at one point?), and just might yell at you.

Setting a bit

Use the bitwise OR operator (|) to set a bit.

number |= 1UL << n;

That will set the nth bit of number. n should be zero, if you want to set the 1st bit and so on upto n-1, if you want to set the nth bit.

Use 1ULL if number is wider than unsigned long; promotion of 1UL << n doesn't happen until after evaluating 1UL << n where it's undefined behaviour to shift by more than the width of a long. The same applies to all the rest of the examples.

Clearing a bit

Use the bitwise AND operator (&) to clear a bit.

number &= ~(1UL << n);

That will clear the nth bit of number. You must invert the bit string with the bitwise NOT operator (~), then AND it.

Toggling a bit

The XOR operator (^) can be used to toggle a bit.

number ^= 1UL << n;

That will toggle the nth bit of number.

Checking a bit

You didn't ask for this, but I might as well add it.

To check a bit, shift the number n to the right, then bitwise AND it:

bit = (number >> n) & 1U;

That will put the value of the nth bit of number into the variable bit.

Changing the nth bit to x

Setting the nth bit to either 1 or 0 can be achieved with the following on a 2's complement C++ implementation:

number ^= (-x ^ number) & (1UL << n);

Bit n will be set if x is 1, and cleared if x is 0. If x has some other value, you get garbage. x = !!x will booleanize it to 0 or 1.

To make this independent of 2's complement negation behaviour (where -1 has all bits set, unlike on a 1's complement or sign/magnitude C++ implementation), use unsigned negation.

number ^= (-(unsigned long)x ^ number) & (1UL << n);

unsigned long newbit = !!x;    // Also booleanize to force 0 or 1
number ^= (-newbit ^ number) & (1UL << n);

It's generally a good idea to use unsigned types for portable bit manipulation.

number = (number & ~(1UL << n)) | (x << n);

(number & ~(1UL << n)) will clear the nth bit and (x << n) will set the nth bit to x.

It's also generally a good idea to not to copy/paste code in general and so many people use preprocessor macros (like the community wiki answer further down) or some sort of encapsulation.

C++ – What are the differences between a pointer variable and a reference variable in C++

A pointer can be re-assigned:

int x = 5;
int y = 6;
int *p;
p = &x;
p = &y;
*p = 10;
assert(x == 5);
assert(y == 10);

A reference cannot be re-bound, and must be bound at initialization:

int x = 5;
int y = 6;
int &q; // error
int &r = x;

A pointer variable has its own identity: a distinct, visible memory address that can be taken with the unary & operator and a certain amount of space that can be measured with the sizeof operator. Using those operators on a reference returns a value corresponding to whatever the reference is bound to; the reference’s own address and size are invisible. Since the reference assumes the identity of the original variable in this way, it is convenient to think of a reference as another name for the same variable.
```
int x = 0;
int &r = x;
int *p = &x;
int *p2 = &r;

assert(p == p2); // &x == &r
assert(&p != &p2);
```

You can have arbitrarily nested pointers to pointers offering extra levels of indirection. References only offer one level of indirection.

int x = 0;
int y = 0;
int *p = &x;
int *q = &y;
int **pp = &p;

**pp = 2;
pp = &q; // *pp is now q
**pp = 4;

assert(y == 4);
assert(x == 2);

A pointer can be assigned nullptr, whereas a reference must be bound to an existing object. If you try hard enough, you can bind a reference to nullptr, but this is undefined and will not behave consistently.

/* the code below is undefined; your compiler may optimise it
 * differently, emit warnings, or outright refuse to compile it */

int &r = *static_cast<int *>(nullptr);

// prints "null" under GCC 10
std::cout
    << (&r != nullptr
        ? "not null" : "null")
    << std::endl;

bool f(int &r) { return &r != nullptr; }

// prints "not null" under GCC 10
std::cout
    << (f(*static_cast<int *>(nullptr))
        ? "not null" : "null")
    << std::endl;

You can, however, have a reference to a pointer whose value is nullptr.

Pointers can iterate over an array; you can use ++ to go to the next item that a pointer is pointing to, and + 4 to go to the 5th element. This is no matter what size the object is that the pointer points to.
A pointer needs to be dereferenced with * to access the memory location it points to, whereas a reference can be used directly. A pointer to a class/struct uses -> to access its members whereas a reference uses a ..
References cannot be put into an array, whereas pointers can be (Mentioned by user @litb)
Const references can be bound to temporaries. Pointers cannot (not without some indirection):
```
const int &x = int(12); // legal C++
int *y = &int(12); // illegal to take the address of a temporary.
```
This makes const & more convenient to use in argument lists and so forth.