Range-Limited Range-For Loops

There’s one feature of C++11 that I haven’t been making good use of, and that’s the range-based for loop.  The problem I’ve found with using that is it iterates through an entire container or array (or initializer_list). And to be honest – I sometimes use ‘old’ compilers like Microsoft’s Visual C++ 2010, which doesn’t support it. Plus the 2012 version has issues using range-based for with initializer_lists (which require the Nov 2012 CTP compiler preview – a buggy alpha build).

But a recent remark I saw the other day about using a range limit support function with range-for got me to thinking – maybe I could and probably should start to use it more.

For a simple example of where I could have been using them, you can just look at the code from the last blog entry, and see how it can easily be transformed:

  // Regular for-loop version:
  for (auto it = il.begin(); it != il.end(); ++it)
     *inserter++ = *it;

  // Range-based for loop version:
  for (const auto& elem : il)
     *inserter++ = elem;

The clean minimalistic syntax of range-for is nice, and the fact it saves typing is even better. (Note that most references recommend you use auto & or const auto & to avoid copying and to maintain const-ness where needed.)

Range-for is all good and well, until you find you need to work on elements x through y, and then its back to using the old for-loop syntax.  Unless, that is, we find a workaround. And as always, templates and helper functions can alleviate some of that problem.  What we want now is a range-for that looks like this:

  for(auto &elem : range(first, last))
     elem = val;

That has a nice simple syntax, and actually makes it pretty clear what we are working on, no?  The method of achieving that result actually took me maybe 15 minutes to work out and code, because the rules for what range-based for works on are rather nice.  If it’s not using an initializer_list or an array, a range-for loop must take an object that has both begin() and end() member functions – or an object on which the std::begin() and std::end() global functions could be used.  With that bare minimum requirement, all that is needed then is a simple interim object class:

template <typename Iter>
class IterRange
   IterRange();    // prevent this class from being constructed without any parameters
   Iter _beg, _end;
   IterRange(Iter beg_, Iter end_) : _beg(beg_), _end(end_) {}

   // The necessary functions:
   Iter begin() { return _beg; }
   Iter end() { return _end; }

The above class only holds 2 members, the beginning and end of the range, and for the most part, acts as a struct, where the data is assigned and returned in the same way.  Both copy and move constructors and assignment operators are unneeded, as the default behavior of the compiler is adequate. The default empty constructor, however, is made impossible to use.  We don’t want an object with undefined members, and in fact the iterators types are required to actually implement the class.. so maybe its not necessary at all?  Meh, I prefer to err on the side of caution.

The only thing remaining now would be that ‘range()’ function.  Well, given the class, wouldn’t you expect it to be quite simple?  Turns out it is:

template <typename Iter>
IterRange<Iter> range(Iter beg, Iter end)
   return IterRange<Iter>(beg, end);

With the above function and class object, range-limited range-based for loops become a reality.  And in fact, quite intuitive. Probably the best part of this is that compilers optimize out both the class object and the function when used as part of a range-for loop (as tested with G++ and MSVC).

One of the perks of working with C++ is that you learn more about the language while creating workarounds to perceived hurdles to productivity.  And, I suppose, that’s one of the drawbacks too – we waste time adding things that maybe should be part of the language standard?  Then again, we know that the C++ standard is actively evolving, and some things they didn’t get to implement in C++11 will probably see the light of day in the next standard release..

Hm, I don’t think I need to provide a full working example for this post, do I?  It’s a simple matter of plugging in range(start, end) on the right-hand side of a range-for loop. So, I’ll leave it at that for now =)

initializer_list proxy function – insert iterator version

So, the last blog entry focused on optimizing initializer_list insertions by using a proxy function. Oh, and about that – proxy functions as I term them can be thought of as an intermediary or convenience functions (if it helps).  Anyway, I realized after I wrote that entry that, in terms of good programming practices, the code wasn’t exactly up to par.

When using iterators to insert items into a container, the preference is to always use insert iterators.  The reason is quite obvious – they are there to insert items into a container!  Plus, there’s a guarantee of safety with these iterators.  They don’t get invalidated – which is what would happen to that vector proxy function if it failed to take a difference of the distance between the iterator position and the container start. (and again here, std::distance() could have been used).  Nonetheless, I accounted for that and also made sure to reassign the iterator after every insert.  The net result was that it worked, but it might not have been clear as to what it did.  So.. lets consider the 3 types of insert iterators, which are very clear about what they do..

  1. Back inserter (back_inserter_iterator): This iterator does just one simple thing – it calls a container’s push_back() function every time an assignment operator is used with it.
  2. Front inserter (front_insert_iterator): Like back_inserter, this does one thing as well – it calls a containers push_front() function with each assignment.
  3. General inserter (insert_iterator): This iterator is a little different. It does call only one function – insert() – however, it also increments the returned iterator by 1.  This is just like the code in the last proxy function (++pos).

Each of the above iterators use a simple, common interface, just like every other iterator. *it, it++, ++it, and **it = data are what we use to interact with them, though what happens behind the scenes is a little different.  For example, given this code:

std::vector<std::string> myVec;

std::back_insert_iterator<std::vector<std::string>> myIns(myVec);

*myIns++ = "strA";
*myIns++ = "strB";

We might expect that for each *myIns++, the item at the inserter position is dereferenced, assigned to, and then the inserter is moved on to the next position.  But with all insert iterators, only one operator here makes any changes – the assignment operator.  The pre-increment (++it), post-increment (it++), and even the dereference (*it) operators all do nothing but return a reference to the insert iterator.  Which means the below code gives exactly the same results:

std::back_insert_iterator<std::vector<std::string>> myIns(myVec);

myIns= "strA"; // calls push_back()
myIns= "strB"; // and again

I’ve gone off topic as you may have noticed, but mainly because I find it interesting to plumb into the depths of C++’s inner workings.  Insert iterators are clever in that they allow code to be used in template functions that expect the typical ‘*it++ = data‘ to work in a consistent manner.  And it does, of course.. just not in the way some might imagine.  So that’s just a bit of neat trivia for you.. insert iterators are mainly single-function container calls disguised as iterator objects.  Well, with the exception of insert_iterator – which also does an increment.  In fact, that code generally looks like this:

insIter& operator=(const T& val) // (note - theres also an RValue version)
   it = container->insert(it, val);
   return *this;

As you might notice, iterators store a pointer to a container.  That’s how and why they can be used wherever a template function takes an iterator. =)

Oookaay, so since I’ve wasted your precious time with that background on insert iterators, let’s just go ahead and throw the new initializer_list proxy function out there:

// initializer_list Proxy - Using Insert Iterators
template <typename InsIter, typename LType>
void insertInitListProxy(InsIter inserter, std::initializer_list<LType> il)
   // construct & move the items into the container
   for (auto it = il.begin(); it != il.end(); ++it)
      *inserter++ = *it;

Looks simple enough?  In theory we can remove the *dereference and ++ increment operators from inserter, but then that would make the code less understandable – and more importantly, less flexible.  There’s not telling what other things could be thrown at it, or what new types of insert iterators could be used.  And that’s one of the great things about template functions – we offer generic support for both known and unknown types of objects that might be thrown its way.

Ah, and another nice thing with insert iterators – there’s convenience functions available to make passing them to other functions that much easier.  These functions are back_inserter, inserter, and front_inserter – and what they do is obvious, they create xyz_insert_iterator objects.  So the call to insert objects using my proxy function with a back inserter would now look like this:

insertInitListProxy(std::back_inserter(myVec), {"str1", "str2", "str3"});

Simple enough?  The only other thing I’d like to note is that, although push_back, insert, and push_front are used, the C++11 standard gives us RValue versions of these.  So what happens here is the strings are constructed and then moved into the container.  Waste not, want not!

Okay, as is the norm, I have code up @ Coliru demonstrating the above, using that StringHolder object from last time. And the code @ PasteBin.

Ah, and as always – one last thing to add. ostream_iterator and ostreabuf_iterator also ignore the * and ++ operations, and only call a function when an assignment operator is used. (istream_iterator and istreambuf_iterator work a little differently, however)

initializer_list frustrations – Solving the deep-copy issue

So I’ve messed about a bit with C++11’s initializer_list objects and thought they were a great idea.  Since I’ve created my own container, I included them to mimic the new standard functions for constructor, assignment, and insert() functions.  However, there’s one (big) flaw I find with them – for types that we might use to construct objects in a container, the objects themselves are constructed before they are placed into the initializer_list object.

So for example, say we have a container – a vector – of strings.  Nothing fancy. Now, if we’d say, want to append a group of strings at the end of the vector using string literals, we’d use something like this:

std::vector<std::string> MyStrVec;

MyStrVec.insert(MyStrVec.end(), {"str1", "str2"});

Relatively harmless, yes? So, if we consider this for a moment, the most efficient way to insert those elements into the container is to construct them in-place, or to move-construct them with temporary objects (RValues).  But why, why why did the C++11 standards committee decide, especially in the light of the significance of RValues, to make everything copy-constructed?!

What happens above is this sequence:

  1. First, “str1” and “str2” are used to construct temporary string objects
  2. The insert function gets called with an initializer_list consisting of pointers to const strings.  Note the const here – it means there is no chance of moving anything.
  3. In the insert function, the objects in the initializer_list are traversed and inserted into the vector one by one using copy construction
  4. Following the call, the temporary string objects are destroyed.

So, even though a huge aspect of the C++11 standard has been focused on minimizing wasteful copies by using RValues (moves) and in-place construction (see emplace, which uses variadic templates), here we have initializer_list’s fouling everything up.  By using this new convenient feature, we just went where C++11 generally tries not to take us – the deep-copy route.  In the above code, the temporary string objects allocate memory and initialize their members with strings, then the insert code constructs new string objects which also allocate and initialize their data with strings (using copying).  Therefore by using that initializer_list object we basically doubled the amount of memory (and time) needed to insert strings into a vector. W. T. F. right?

This is all very vexing, and perplexing.  Why waste memory and time when we have much better means of addressing the situation?  Meh, there’s nothing we can do – its now-standard.  However!  Having said all that, this doesn’t preclude us from creating our own workarounds, which despite my negative feelings towards them, use initializer_lists.  How so?  By using a proxy function of course!

Now, keep in mind that initializer_lists can only take an object of a single type – this is very different than variadic templates, which can take any number of different types.  In fact, that could be one way to solve the problem of inserting items initialized by different object types, but it gets annoying using recursion functions.  Plus, you’re still limited to one parameter per constructed object.  So, no, for me I find it much better to focus on a simple initializer_list proxy function.

So given the above code, and the concept that we want to not waste any time or memory, we can create a function like such:

template <typename VecT, typename Iterator, typename LType>
void insertInitListProxy(std::vector<VecT> & Cnt, Iterator pos, std::initializer_list<LType> il)
   // save iterator 'distance' from begin() in case a resize operation happens
   auto diff = pos - Cnt.begin();
   // pre-allocate space
   Cnt.reserve(Cnt.size() + il.size());
   // reset iterator based on any changes that occurred due to a resize
   // from here on out, we don't need to worry about resizes
   pos = Cnt.begin() + diff;
   // construct the items in-place
   for (auto it = il.begin(); it < il.end(); ++it)
      pos = Cnt.emplace(pos, *it);
      ++pos;  // move to position *after* what was just inserted

Simple enough, yes?  We allocate the space and construct the items in-place using whatever type ‘LType’ is – in our case, ‘const char*’ strings. There’s some management code in there to deal with resizes causing iterators to be invalidated, which is why we need to calculate the offset of the iterator based on the vector beginning. The vector may change location in memory, so we save that offset, then change it back into a normal iterator after the resize. So, given the above, here’s what the call would look like now:

insertInitListProxy(myVec, myVec.end(), {"str1", "str2"});

That’s not quite as convenient, but its close enough!  And it wastes no memory or time. “str1” and “str2” are passed as pointers and then constructed in-place in the vector.  So there’s my proposed workaround =)  However, keep in mind that all the objects in those curly braces must be of the same type, or promotable to a common type.  If not, you’ll either need to use casts or somesuch to let the compiler know that these are all the same type. (Or just explicitly specify the template parameters).  Oh, I might as well include a more generic container version for everything other than vectors:

template <typename Container, typename Iterator, typename LType>
void insertInitListProxy(Container & Cnt, Iterator pos, std::initializer_list<LType> il)
   for (auto it = il.begin(); it < il.end(); ++it)
      pos = Cnt.emplace(pos, *it);
      ++pos;  // move to position *after* what was just inserted

That last one should probably appear before the vector-specific one in your source code, so as not to confuse the compiler.  Note that you could in fact use just the latter one (which is admittedly simpler), but you can’t actually call the reserve() function to optimize things even further.

Since it might be hard to see what is going on without a complete example, I’ve created one at Coliru – the Initializer_List Proxy Test!  (also available at PasteBin) You’ll be able to see all the copy-construction going on after the “Now invoking initializer_list insertion” line. which should be contrasted with what happens after the “Now invoking initializer_list proxy” line.

Anyway, hope that helps someone.  If not, it helped me!

*edit: Note: an insert_iterator would be the preferred way to insert items in a list, but emplace here increases the container size by 1 and returns an iterator to the inserted element, so its safe to use ++pos to index the next location to insert (i.e. its legal to insert at end()).

*edit2: its important to note it’s not safe to write ‘pos + 1‘ as it was originally, since that will only work with random access iterators! (oops – that’s what I get for focusing on vectors). ++pos is guaranteed to work though.

C++11’s Chrono: Learning about C++11’s Time(ing) features

So, the <chrono> header classes and functions are a bit of a confusing mess.  Different clock types, different duration formats, and weird-a$$ syntax (which is more common as C++ gets more features) all combine to make it a confusing mess to actually use calculations in code.

However, since it’s pretty important to get used to new C++11 features that can make cross-platform development easier, this is one area that deserves further study…

Luckily, one of my favorite reference books (which includes C++11 coverage) has a section that’s actually available online that helps illuminate this mysterious area: 5.7 Clocks and Timers | The C++ Standard Library – InformIt (Nicolai M. Josuttis).  Still, that’s not for the faint of heart.  It requires a few rereads..

Alright, so using my understanding of how this all works, lets break down the 3 logical components of time in <chrono>:

  • duration: a measurement of time as in “length of n ‘ticks’ over a particular time unit”. This can be represented as milliseconds, seconds, minutes, etc.  The calculations of duration revolve around ratios (i.e. 1/1000 for millisecond) and an integral or floating point type used to formally represent the duration when queried. (For more on ratios, also see the above link’s previous page, 5.6 Compile-Time Fractional Arithmetic with Class ratio<>.
  • timepoint: a point in time expressed in n ‘ticks’ of duration from a special base point (“beginning of time”, or the earliest representable form of time on the computer [based on which clock is being referred to]). This class is templatized by both a clock and a duration.
  • clock: a logical “clock” that represents the ever-changing motion of time.  There are 3 of these, one of which represents the ‘official’ O/S clock (system_clock), another which is guaranteed to move forward at a predictable rate [steady_clock] (unlike the official clock which can be manipulated by outside forces), and a low-level measurement of time useful for fine-grained measurements (high_resolution_clock).
    Since these are the foundation of every calculation and manipulation, they contain representations of a duration and time_point, and of course a function indicating what time it is ‘now()’.

So, hopefully that is understandable.  We need at least a basic grasp of these features to utilize time-based measurements and manipulation in our programs.

The next step would be to provide some real-world use of this stuff to get this ‘visually’.  So, for example, say we wanted to I dunno, time a piece of code.  We’d need a good unchangeable clock to do this, of course, so steady_clock and high_resolution_clock are our only options.  Since it is well, code, we’ll need a fine-grained measurement, so high_resolution_clock, here we come!

Now, on with the ugly syntax!  The predefined class for our chosen clock the high_resolution clock is:


To make things simpler in code, its probably best to use type aliasing.  So let’s make it ‘hrClock_t’:

typedef std::chrono::high_resolution_clock hrClock_t;

Now, since we’ll be needing a starting time point, we’ll have to ask that clock what the time is ‘now’:

auto clkStart = hrClock_t::now();

While ‘auto’ helps make coding easier, it may hinder understanding of what’s going on.  So, let me reassure you: we are getting a time_point:

// more verbose version of the above:
std::chrono::high_resolution_clock::time_point = std::chrono::high_resolution_clock.now();

Okay, that’s all well and good.  We now have a starting time point, and will eventually need an end time point.  So, after running through the test code, we’ll gather a second time_point:

auto clkEnd = hrClock_t::now();

Next up: Actually figuring out how much time went by, in a format we desire.  So here comes duration. We’re going to typedef this one so it doesn’t give us carpal tunnel, and then do the assignment:

typedef std::chrono::duration<double, std::milli> millisec_t;
millisec_t clkDuration(std::chrono::duration_cast<millisec_t>(clkEnd - clkStart));

Okay.. wow. Wtf is up with that code?  Well, yes that’s a bit confusing at first, so let’s see what’s happening here:

Each time_point is specific to it’s clock – in this case, the high_resolution_clock.  What we want, however, is that time converted to milliseconds (1/1000th of a second).  To do that, we first make the time_point calculation (end-start) and put it into a special template function that will do what we TTIT (‘template-tell it to’?).

And what we are telling it with ‘duration_cast‘ is that we need a duration of the form <double, milli> (millisec_t remember).  That tells the compiler to generate code that converts ‘time_duration’ into something representable in thousandths of a second (1/1000), using floating point math to achieve the result.  Of course, the floating point return stored in clkDuration is double as that was the first template parameter to our millisec_t duration type.  The only thing left to do is actually grab that value from clkDuration.  One simple line retrieves the ‘elapsed time’ double from clkDuration:

double fElapsed = clkDuration.count();

Are we any closer to understanding how this works?  I know I am.  But hey,  I’m typing up a blog post while working out the details.. so maybe I’m a bit biased.

But, as examples are one of the best forms of understanding things, let me direct your attention to a simple ‘cout vs. printf benchmark’ @ Coliru. (also available at Pastebin)

Oh, just a couple of more notes:

  1. There are some predefined duration typedefs, which you may prefer to use over your own.  For our example, there is a std::chrono::milliseconds typedef in the headers which may or may not suit you – it’s templated as duration<int_type, milli> so you lose floating point precision in using it.  The list of the rest is available on the link above to section 5.7 of the C++ Standard Template Library excerpt.
  2. There’s a handy little feature for inserting ‘sleeps’ in your code by using sleep_for with durations.  It’s normally covered in C++11 ‘concurrency’ articles & references, but it may prove useful in other scenarios.  If you program in Windows you are probably most certainly familiar with ‘Sleep’.  Well, the cross-platform compatible version of that would be (note this requires header <thread>):

    with an example being:

    // sleep (pause execution) for 10 ms

So, there it is.  Hope that helped somebody understand one of the new useful (but cryptic) features of C++11!

(by the way, there’s other feature of clocks and durations that I didn’t cover – calculating future times, waiting until a specific time, etc.. but that’s something you can discover on your own =)

Variadic Templates – Transformations!

Wow, I just accidentally exposed another feature of Variadic Templates – you can transform them, and then create a new Variadic Template on-the-fly!  This is so cool!

Let me explain:

Given this call to TestVariadic:

  TestVariadic(0, 4, 'a', '\0', p, 0, "string");

and given this function:

template <typename... Args>
void TestVariadic(const Args&... args)
    std::cout << "vals (in reverse):" << std::endl;

We call (for each argument) ‘boolizer()’, which does nothing but return a true/false boolean value:

template <typename T>
bool boolizer(const T& val)
    std::cout <<  val << "	";
    return (val != 0);

Nothing fancy eh?  And note we’re working in reverse-order again (see my edit on last post), so we’ll see the list printed out right-to-left.  However, the cool thing is – we call a ‘bool_variadic()’ function with the return values from boolizer().  And this ain’t no ‘dummy()’ function.. it’s special =).  Here it is:

template <typename... Args>
void bool_variadic(const Args&... args)
    std::cout << "\nbool states (in FORWARD order):\n" << std::boolalpha;
    //dummy(reportBoolState(args)...);  // this'd be reverse order (right-to-left)
    std::cout << std::endl;

Okay, I know – more variadic template functions – annoying!!  But check it out – ‘reportBoolState()’ is a templated recursive function (well, not such a new thing), that ONLY operates on bool values (now we’re getting somewhere).  So here’s the two necessary functions required of any recursive variadic template design (non-variadic overload before the main val+variadic recursive func):

int reportBoolState(bool bVal)
    std::cout << bVal << "	";
    return 0;

template <typename... Args>
int reportBoolState(bool bVal, const Args&... args)
    std::cout << bVal << "	";
    return 0;

So, take a moment to gather this in.. We’ve literally transformed a set of varying types (that’d be the Args pack) into something completely different – a simple list of booleans!  That’s just awesome!  Oh.. and note that with the recursive variadic functions, we do actually move left-to-right, so the booleans are in the same order as the original variadic template list.(this is why the output doesn’t match up).  You could uncomment that ‘dummy’ line to see things in right-to-left order, however.  Man, this stuff is funn =)

Oh, and here’s the full example from above at Coliru’s online compiler.  Also available on Pastebin.com.

P.S. initializer_list’s are one of my other favorite new features – but quite limited in they only accept a compile-time-defined list of one type, and can’t be used dynamically with, say, a boolean list of variadic templates.. =P  (it’d be cool if someone proved me wrong though)

Variadic template fun

So I’ve been messing around with Variadic Templates in C++11 recently, and have been blown away but what I can do with this feature.  Its not just about those lame recursive ‘print’ examples, or in-place construction (see emplace) – nooo, there’s far more to it than that! Once you understand a little about what can be injected into the ‘unpacking’ of these variadic template parameters, you’ll start thinking up way cool stuff.

For example, look at the below snippet:

inline void dummy(...) {}

void TestVariadic(const Args&... args)
    int n = 0;
    dummy(repeater(n, args)...);

    std::cout << "\nValue of n:" << n << std::endl;

That ‘dummy’ function, for some reason or another, is required in order to use this special form of ‘…’.  When ‘…’ is outside of the function call, it means the given function is called once for each and every argument (and type).  This is as opposed to using the ‘…’ inside of the function (by the args var), which just expands and forwards unpacked variadic template variables as extra parameters. (sorry if I’m not too clear on the wording, its f’ing confusing).

As I mentioned, the ‘dummy’ function needs to be there to convince the compiler to do multiple function calls.  An additional requirement is that the ‘repeater’ function must also return a value so that the compiler sees that you are actually passing something to dummy().  It’s pretty annoying, and I’d very much like to do repeater(n, args)…; but its not possible with the standard the way it is.

Anyway, the unique bonus of doing a repeater function is that you can pass pointers or references to local variables, etc, for each repetition.  This is where I see variadic templates being the most useful, especially to me.  For the above, the repeater could be as simple as this:

int repeater(int &n, const T& val)
    std::cout << '.';
    // Silence compiler warnings about unused val:
    return 0;

Here I’m just incrementing ‘n’ and returning a value that will be thrown away by ‘dummy’.  This also prints a silly ‘.’ output each time its called.  The nice thing to notice though is that there is only one template argument required for each call to this function, and that I am actively modifying the calling function’s ‘n’ value.

Perhaps it’s not quite clear from this context just how powerful this is, but once you start down this path, hopefully you’ll begin imagining all kinds of wild tricks that can be done.

The ‘trick’ I personally got to work with variadic templates is a type of comma-separated ‘printf’ call that creates all the “%xx” format strings, puts them in an array along with the matching values, and makes one call to vprintf.  And suprisingly, with Mirosoft VC++ 2012 (November CTP) compiler, the code that is generated is super efficient – as in, nearly as good as typing out the complete printf() call yourself, and very close to the space needed for cout calls.

MSVC++ takes my template metaprogramming mess and optimizes away any calls and directly embeds a complete string-assembly sequence plus value stack assembly inline!  Plus, in timed results on both GCC and MSVC, variadic-printf beats out cout calls quite consistently.  It’s really quite cool.

Anyway, if you’d like to start experimenting, I’ve put a simple but more complete example of the above on the ‘Coliru Editor’ page, which is an actual live C++ editor linked to a G++ 4.7 C++11 compiler (running on a unix box). Fun stuff! (example also now available on Pastebin.com)

Now.. back to obsessing about other things..

Edit:  Note, one of the issues with using that dummy(…) function is that the compiler will work through the arguments backwards..  the reason for this is that a function accepting a variable number of arguments is called with each argument pushed from right-to-left.

That means that the return from the right-most parameter must be retrieved before it can be pushed – which typically winds up calling that function first.  It’s not recommended to rely on this, though – as the C/C++ standards say that the evaluation of each parameter can happen in any order.  Only the order they are pushed on the stack is guaranteed.