C++ – Fastest way to get a positive modulo in C/C++

cperformance

Often in my inner loops I need to index an array in a "wrap-around" way, so that (for example) if the array size is 100 and my code asks for element -2, it should be given element 98. In many high level languages such as Python, one can do this simply with my_array[index % array_size], but for some reason C's integer arithmetic (usually) rounds toward zero instead of consistently rounding down, and consequently its modulo operator returns a negative result when given a negative first argument.

Often I know that index will not be less than -array_size, and in these cases I just do my_array[(index + array_size) % array_size]. However, sometimes this can't be guaranteed, and for those cases I would like to know the fastest way to implement an always-positive modulo function. There are several "clever" ways to do it without branching, such as

inline int positive_modulo(int i, int n) {
    return (n + (i % n)) % n;
}

inline int positive_modulo(int i, int n) {
    return (i % n) + (n * (i < 0));
}

Of course I can profile these to find out which is the fastest on my system, but I can't help worrying that I might have missed a better one, or that what's fast on my machine might be slow on a different one.

So is there a standard way to do this, or some clever trick that I've missed that's likely to be the fastest possible way?

Also, I know it's probably wishful thinking, but if there's a way of doing this that can be auto-vectorised, that would be amazing.

Best Answer

The standard way I learned is

inline int positive_modulo(int i, int n) {
    return (i % n + n) % n;
}

This function is essentially your first variant without the abs (which, in fact, makes it return the wrong result). I wouldn't be surprised if an optimizing compiler could recognize this pattern and compile it to machine code that computes an "unsigned modulo".

Edit:

Moving on to your second variant: First of all, it contains a bug, too -- the n < 0 should be i < 0.

This variant may not look as if it branches, but on a lot of architectures, the i < 0 will compile into a conditional jump. In any case, it will be at least as fast to replace (n * (i < 0)) with i < 0? n: 0, which avoids the multiplication; in addition, it's "cleaner" because it avoids reinterpreting the bool as an int.

As to which of these two variants is faster, that probably depends on the compiler and processor architecture -- time the two variants and see. I don't think there's a faster way than either of these two variants, though.

Related Solutions

Unix – the fastest way to get the value of π

The Monte Carlo method, as mentioned, applies some great concepts but it is, clearly, not the fastest, not by a long shot, not by any reasonable measure. Also, it all depends on what kind of accuracy you are looking for. The fastest π I know of is the one with the digits hard coded. Looking at Pi and Pi[PDF], there are a lot of formulae.

Here is a method that converges quickly — about 14 digits per iteration. PiFast, the current fastest application, uses this formula with the FFT. I'll just write the formula, since the code is straightforward. This formula was almost found by Ramanujan and discovered by Chudnovsky. It is actually how he calculated several billion digits of the number — so it isn't a method to disregard. The formula will overflow quickly and, since we are dividing factorials, it would be advantageous then to delay such calculations to remove terms.

enter image description here

where,

enter image description here

Below is the Brent–Salamin algorithm. Wikipedia mentions that when a and b are "close enough" then (a + b)² / 4t will be an approximation of π. I'm not sure what "close enough" means, but from my tests, one iteration got 2 digits, two got 7, and three had 15, of course this is with doubles, so it might have an error based on its representation and the true calculation could be more accurate.

let pi_2 iters =
    let rec loop_ a b t p i =
        if i = 0 then a,b,t,p
        else
            let a_n = (a +. b) /. 2.0 
            and b_n = sqrt (a*.b)
            and p_n = 2.0 *. p in
            let t_n = t -. (p *. (a -. a_n) *. (a -. a_n)) in
            loop_ a_n b_n t_n p_n (i - 1)
    in 
    let a,b,t,p = loop_ (1.0) (1.0 /. (sqrt 2.0)) (1.0/.4.0) (1.0) iters in
    (a +. b) *. (a +. b) /. (4.0 *. t)

Lastly, how about some pi golf (800 digits)? 160 characters!

int a=10000,b,c=2800,d,e,f[2801],g;main(){for(;b-c;)f[b++]=a/5;for(;d=0,g=c*2;c-=14,printf("%.4d",e+d/a),e=d%a)for(b=c;d+=f[b]*a,f[b]=d%--g,d/=g--,--b;d*=b);}

C++ – How to get the list of files in a directory using C or C++

UPDATE 2017:

In C++17 there is now an official way to list files of your file system: std::filesystem. There is an excellent answer from Shreevardhan below with this source code (This code may throw):

#include <string>
#include <iostream>
#include <filesystem>
namespace fs = std::filesystem;

int main()
{
    std::string path = "/path/to/directory";
    for (const auto & entry : fs::directory_iterator(path))
        std::cout << entry.path() << std::endl;
}

Old Answer:

In small and simple tasks I do not use boost, I use dirent.h. It is available as a standard header in UNIX, and also available for Windows via a compatibility layer created by Toni Ronkko.

DIR *dir;
struct dirent *ent;
if ((dir = opendir ("c:\\src\\")) != NULL) {
  /* print all the files and directories within directory */
  while ((ent = readdir (dir)) != NULL) {
    printf ("%s\n", ent->d_name);
  }
  closedir (dir);
} else {
  /* could not open directory */
  perror ("");
  return EXIT_FAILURE;
}

It is just a small header file and does most of the simple stuff you need without using a big template-based approach like boost(no offence, I like boost!).

Best Answer

Related Solutions

Unix – the fastest way to get the value of π

C++ – How to get the list of files in a directory using C or C++

Related Topic