initializer_list proxy function – insert iterator version

So, the last blog entry focused on optimizing initializer_list insertions by using a proxy function. Oh, and about that – proxy functions as I term them can be thought of as an intermediary or convenience functions (if it helps).  Anyway, I realized after I wrote that entry that, in terms of good programming practices, the code wasn’t exactly up to par.

When using iterators to insert items into a container, the preference is to always use insert iterators.  The reason is quite obvious – they are there to insert items into a container!  Plus, there’s a guarantee of safety with these iterators.  They don’t get invalidated – which is what would happen to that vector proxy function if it failed to take a difference of the distance between the iterator position and the container start. (and again here, std::distance() could have been used).  Nonetheless, I accounted for that and also made sure to reassign the iterator after every insert.  The net result was that it worked, but it might not have been clear as to what it did.  So.. lets consider the 3 types of insert iterators, which are very clear about what they do..

  1. Back inserter (back_inserter_iterator): This iterator does just one simple thing – it calls a container’s push_back() function every time an assignment operator is used with it.
  2. Front inserter (front_insert_iterator): Like back_inserter, this does one thing as well – it calls a containers push_front() function with each assignment.
  3. General inserter (insert_iterator): This iterator is a little different. It does call only one function – insert() – however, it also increments the returned iterator by 1.  This is just like the code in the last proxy function (++pos).

Each of the above iterators use a simple, common interface, just like every other iterator. *it, it++, ++it, and **it = data are what we use to interact with them, though what happens behind the scenes is a little different.  For example, given this code:

std::vector<std::string> myVec;

std::back_insert_iterator<std::vector<std::string>> myIns(myVec);

*myIns++ = "strA";
*myIns++ = "strB";

We might expect that for each *myIns++, the item at the inserter position is dereferenced, assigned to, and then the inserter is moved on to the next position.  But with all insert iterators, only one operator here makes any changes – the assignment operator.  The pre-increment (++it), post-increment (it++), and even the dereference (*it) operators all do nothing but return a reference to the insert iterator.  Which means the below code gives exactly the same results:

std::back_insert_iterator<std::vector<std::string>> myIns(myVec);

myIns= "strA"; // calls push_back()
myIns= "strB"; // and again

I’ve gone off topic as you may have noticed, but mainly because I find it interesting to plumb into the depths of C++’s inner workings.  Insert iterators are clever in that they allow code to be used in template functions that expect the typical ‘*it++ = data‘ to work in a consistent manner.  And it does, of course.. just not in the way some might imagine.  So that’s just a bit of neat trivia for you.. insert iterators are mainly single-function container calls disguised as iterator objects.  Well, with the exception of insert_iterator – which also does an increment.  In fact, that code generally looks like this:

insIter& operator=(const T& val) // (note - theres also an RValue version)
{
   it = container->insert(it, val);
   ++it;
   return *this;
}

As you might notice, iterators store a pointer to a container.  That’s how and why they can be used wherever a template function takes an iterator. =)

Oookaay, so since I’ve wasted your precious time with that background on insert iterators, let’s just go ahead and throw the new initializer_list proxy function out there:

// initializer_list Proxy - Using Insert Iterators
template <typename InsIter, typename LType>
void insertInitListProxy(InsIter inserter, std::initializer_list<LType> il)
{
   // construct & move the items into the container
   for (auto it = il.begin(); it != il.end(); ++it)
      *inserter++ = *it;
}

Looks simple enough?  In theory we can remove the *dereference and ++ increment operators from inserter, but then that would make the code less understandable – and more importantly, less flexible.  There’s not telling what other things could be thrown at it, or what new types of insert iterators could be used.  And that’s one of the great things about template functions – we offer generic support for both known and unknown types of objects that might be thrown its way.

Ah, and another nice thing with insert iterators – there’s convenience functions available to make passing them to other functions that much easier.  These functions are back_inserter, inserter, and front_inserter – and what they do is obvious, they create xyz_insert_iterator objects.  So the call to insert objects using my proxy function with a back inserter would now look like this:

insertInitListProxy(std::back_inserter(myVec), {"str1", "str2", "str3"});

Simple enough?  The only other thing I’d like to note is that, although push_back, insert, and push_front are used, the C++11 standard gives us RValue versions of these.  So what happens here is the strings are constructed and then moved into the container.  Waste not, want not!

Okay, as is the norm, I have code up @ Coliru demonstrating the above, using that StringHolder object from last time. And the code @ PasteBin.

Ah, and as always – one last thing to add. ostream_iterator and ostreabuf_iterator also ignore the * and ++ operations, and only call a function when an assignment operator is used. (istream_iterator and istreambuf_iterator work a little differently, however)

initializer_list frustrations – Solving the deep-copy issue

So I’ve messed about a bit with C++11’s initializer_list objects and thought they were a great idea.  Since I’ve created my own container, I included them to mimic the new standard functions for constructor, assignment, and insert() functions.  However, there’s one (big) flaw I find with them – for types that we might use to construct objects in a container, the objects themselves are constructed before they are placed into the initializer_list object.

So for example, say we have a container – a vector – of strings.  Nothing fancy. Now, if we’d say, want to append a group of strings at the end of the vector using string literals, we’d use something like this:

std::vector<std::string> MyStrVec;

MyStrVec.insert(MyStrVec.end(), {"str1", "str2"});

Relatively harmless, yes? So, if we consider this for a moment, the most efficient way to insert those elements into the container is to construct them in-place, or to move-construct them with temporary objects (RValues).  But why, why why did the C++11 standards committee decide, especially in the light of the significance of RValues, to make everything copy-constructed?!

What happens above is this sequence:

  1. First, “str1” and “str2” are used to construct temporary string objects
  2. The insert function gets called with an initializer_list consisting of pointers to const strings.  Note the const here – it means there is no chance of moving anything.
  3. In the insert function, the objects in the initializer_list are traversed and inserted into the vector one by one using copy construction
  4. Following the call, the temporary string objects are destroyed.

So, even though a huge aspect of the C++11 standard has been focused on minimizing wasteful copies by using RValues (moves) and in-place construction (see emplace, which uses variadic templates), here we have initializer_list’s fouling everything up.  By using this new convenient feature, we just went where C++11 generally tries not to take us – the deep-copy route.  In the above code, the temporary string objects allocate memory and initialize their members with strings, then the insert code constructs new string objects which also allocate and initialize their data with strings (using copying).  Therefore by using that initializer_list object we basically doubled the amount of memory (and time) needed to insert strings into a vector. W. T. F. right?

This is all very vexing, and perplexing.  Why waste memory and time when we have much better means of addressing the situation?  Meh, there’s nothing we can do – its now-standard.  However!  Having said all that, this doesn’t preclude us from creating our own workarounds, which despite my negative feelings towards them, use initializer_lists.  How so?  By using a proxy function of course!

Now, keep in mind that initializer_lists can only take an object of a single type – this is very different than variadic templates, which can take any number of different types.  In fact, that could be one way to solve the problem of inserting items initialized by different object types, but it gets annoying using recursion functions.  Plus, you’re still limited to one parameter per constructed object.  So, no, for me I find it much better to focus on a simple initializer_list proxy function.

So given the above code, and the concept that we want to not waste any time or memory, we can create a function like such:

template <typename VecT, typename Iterator, typename LType>
void insertInitListProxy(std::vector<VecT> & Cnt, Iterator pos, std::initializer_list<LType> il)
{
   // save iterator 'distance' from begin() in case a resize operation happens
   auto diff = pos - Cnt.begin();
   // pre-allocate space
   Cnt.reserve(Cnt.size() + il.size());
   // reset iterator based on any changes that occurred due to a resize
   // from here on out, we don't need to worry about resizes
   pos = Cnt.begin() + diff;
   // construct the items in-place
   for (auto it = il.begin(); it < il.end(); ++it)
   {
      pos = Cnt.emplace(pos, *it);
      ++pos;  // move to position *after* what was just inserted
   }
}

Simple enough, yes?  We allocate the space and construct the items in-place using whatever type ‘LType’ is – in our case, ‘const char*’ strings. There’s some management code in there to deal with resizes causing iterators to be invalidated, which is why we need to calculate the offset of the iterator based on the vector beginning. The vector may change location in memory, so we save that offset, then change it back into a normal iterator after the resize. So, given the above, here’s what the call would look like now:

insertInitListProxy(myVec, myVec.end(), {"str1", "str2"});

That’s not quite as convenient, but its close enough!  And it wastes no memory or time. “str1” and “str2” are passed as pointers and then constructed in-place in the vector.  So there’s my proposed workaround =)  However, keep in mind that all the objects in those curly braces must be of the same type, or promotable to a common type.  If not, you’ll either need to use casts or somesuch to let the compiler know that these are all the same type. (Or just explicitly specify the template parameters).  Oh, I might as well include a more generic container version for everything other than vectors:

template <typename Container, typename Iterator, typename LType>
void insertInitListProxy(Container & Cnt, Iterator pos, std::initializer_list<LType> il)
{
   for (auto it = il.begin(); it < il.end(); ++it)
   {
      pos = Cnt.emplace(pos, *it);
      ++pos;  // move to position *after* what was just inserted
   }
}

That last one should probably appear before the vector-specific one in your source code, so as not to confuse the compiler.  Note that you could in fact use just the latter one (which is admittedly simpler), but you can’t actually call the reserve() function to optimize things even further.

Since it might be hard to see what is going on without a complete example, I’ve created one at Coliru – the Initializer_List Proxy Test!  (also available at PasteBin) You’ll be able to see all the copy-construction going on after the “Now invoking initializer_list insertion” line. which should be contrasted with what happens after the “Now invoking initializer_list proxy” line.

Anyway, hope that helps someone.  If not, it helped me!

*edit: Note: an insert_iterator would be the preferred way to insert items in a list, but emplace here increases the container size by 1 and returns an iterator to the inserted element, so its safe to use ++pos to index the next location to insert (i.e. its legal to insert at end()).

*edit2: its important to note it’s not safe to write ‘pos + 1‘ as it was originally, since that will only work with random access iterators! (oops – that’s what I get for focusing on vectors). ++pos is guaranteed to work though.