initializer_list frustrations – Solving the deep-copy issue

So I’ve messed about a bit with C++11’s initializer_list objects and thought they were a great idea.  Since I’ve created my own container, I included them to mimic the new standard functions for constructor, assignment, and insert() functions.  However, there’s one (big) flaw I find with them – for types that we might use to construct objects in a container, the objects themselves are constructed before they are placed into the initializer_list object.

So for example, say we have a container – a vector – of strings.  Nothing fancy. Now, if we’d say, want to append a group of strings at the end of the vector using string literals, we’d use something like this:

std::vector<std::string> MyStrVec;

MyStrVec.insert(MyStrVec.end(), {"str1", "str2"});

Relatively harmless, yes? So, if we consider this for a moment, the most efficient way to insert those elements into the container is to construct them in-place, or to move-construct them with temporary objects (RValues).  But why, why why did the C++11 standards committee decide, especially in the light of the significance of RValues, to make everything copy-constructed?!

What happens above is this sequence:

  1. First, “str1” and “str2” are used to construct temporary string objects
  2. The insert function gets called with an initializer_list consisting of pointers to const strings.  Note the const here – it means there is no chance of moving anything.
  3. In the insert function, the objects in the initializer_list are traversed and inserted into the vector one by one using copy construction
  4. Following the call, the temporary string objects are destroyed.

So, even though a huge aspect of the C++11 standard has been focused on minimizing wasteful copies by using RValues (moves) and in-place construction (see emplace, which uses variadic templates), here we have initializer_list’s fouling everything up.  By using this new convenient feature, we just went where C++11 generally tries not to take us – the deep-copy route.  In the above code, the temporary string objects allocate memory and initialize their members with strings, then the insert code constructs new string objects which also allocate and initialize their data with strings (using copying).  Therefore by using that initializer_list object we basically doubled the amount of memory (and time) needed to insert strings into a vector. W. T. F. right?

This is all very vexing, and perplexing.  Why waste memory and time when we have much better means of addressing the situation?  Meh, there’s nothing we can do – its now-standard.  However!  Having said all that, this doesn’t preclude us from creating our own workarounds, which despite my negative feelings towards them, use initializer_lists.  How so?  By using a proxy function of course!

Now, keep in mind that initializer_lists can only take an object of a single type – this is very different than variadic templates, which can take any number of different types.  In fact, that could be one way to solve the problem of inserting items initialized by different object types, but it gets annoying using recursion functions.  Plus, you’re still limited to one parameter per constructed object.  So, no, for me I find it much better to focus on a simple initializer_list proxy function.

So given the above code, and the concept that we want to not waste any time or memory, we can create a function like such:

template <typename VecT, typename Iterator, typename LType>
void insertInitListProxy(std::vector<VecT> & Cnt, Iterator pos, std::initializer_list<LType> il)
{
   // save iterator 'distance' from begin() in case a resize operation happens
   auto diff = pos - Cnt.begin();
   // pre-allocate space
   Cnt.reserve(Cnt.size() + il.size());
   // reset iterator based on any changes that occurred due to a resize
   // from here on out, we don't need to worry about resizes
   pos = Cnt.begin() + diff;
   // construct the items in-place
   for (auto it = il.begin(); it < il.end(); ++it)
   {
      pos = Cnt.emplace(pos, *it);
      ++pos;  // move to position *after* what was just inserted
   }
}

Simple enough, yes?  We allocate the space and construct the items in-place using whatever type ‘LType’ is – in our case, ‘const char*’ strings. There’s some management code in there to deal with resizes causing iterators to be invalidated, which is why we need to calculate the offset of the iterator based on the vector beginning. The vector may change location in memory, so we save that offset, then change it back into a normal iterator after the resize. So, given the above, here’s what the call would look like now:

insertInitListProxy(myVec, myVec.end(), {"str1", "str2"});

That’s not quite as convenient, but its close enough!  And it wastes no memory or time. “str1” and “str2” are passed as pointers and then constructed in-place in the vector.  So there’s my proposed workaround =)  However, keep in mind that all the objects in those curly braces must be of the same type, or promotable to a common type.  If not, you’ll either need to use casts or somesuch to let the compiler know that these are all the same type. (Or just explicitly specify the template parameters).  Oh, I might as well include a more generic container version for everything other than vectors:

template <typename Container, typename Iterator, typename LType>
void insertInitListProxy(Container & Cnt, Iterator pos, std::initializer_list<LType> il)
{
   for (auto it = il.begin(); it < il.end(); ++it)
   {
      pos = Cnt.emplace(pos, *it);
      ++pos;  // move to position *after* what was just inserted
   }
}

That last one should probably appear before the vector-specific one in your source code, so as not to confuse the compiler.  Note that you could in fact use just the latter one (which is admittedly simpler), but you can’t actually call the reserve() function to optimize things even further.

Since it might be hard to see what is going on without a complete example, I’ve created one at Coliru – the Initializer_List Proxy Test!  (also available at PasteBin) You’ll be able to see all the copy-construction going on after the “Now invoking initializer_list insertion” line. which should be contrasted with what happens after the “Now invoking initializer_list proxy” line.

Anyway, hope that helps someone.  If not, it helped me!

*edit: Note: an insert_iterator would be the preferred way to insert items in a list, but emplace here increases the container size by 1 and returns an iterator to the inserted element, so its safe to use ++pos to index the next location to insert (i.e. its legal to insert at end()).

*edit2: its important to note it’s not safe to write ‘pos + 1‘ as it was originally, since that will only work with random access iterators! (oops – that’s what I get for focusing on vectors). ++pos is guaranteed to work though.

Advertisements