Performance

This page shows performance tests and their results. For element-by-element pipelines, the performance as good as ad-hoc crafted Python code written without the same flexibility. For chunking pipelines (see numpy_chunking), the performance is better than that of either ad-hoc crafted C code that processes element-by-element or NumPy

CSV Element By Element

The following shows the performance of three implementations finding the correlation between two CSV columns pruned for outliers:

no-chunks performance

The ad-hoc crafted CSV code using csv.reader is

r = csv.reader(open(_f_name, 'r'))

fields = r.next()
ind0, ind1 = fields.index('0'), fields.index('1')

sx, sxx, sy, syy, sxy, n = 0, 0, 0, 0, 0, 0
try:
    while True:
        row = r.next()
        x, y = float(row[ind0]), float(row[ind1])
        if x < 0.5 and y < 0.5:
            sx += x
            sxx += x * x
            sy += y
            sxy += x * y
            syy += y * y
            n += 1
except StopIteration:
c = (n * sxy - sx * sy) / math.sqrt(n * sxx - sx * sx) / math.sqrt(n * syy - sy * sy)

The ad-hoc crafted CSV code using csv.DictReader is

def _dict_corr():
    r = csv.DictReader(open(_f_name, 'r'), ('0', '1'))

    sx, sxx, sy, syy, sxy, n = 0, 0, 0, 0, 0, 0
    for row in r:
        x, y = float(row['0']), float(row['1'])
        if x < 0.5 and y < 0.5:
            sx += x
            sxx += x * x
            sy += y
            sxy += x * y
            syy += y * y
            n += 1
    c = (n * sxy - sx * sy) / math.sqrt(n * sxx - sx * sx) / math.sqrt(n * syy - sy * sy)

The pipeline code is:

def _csv_pipes_corr():
c = csv_vals(open(_f_name, 'r'), ('0', '1')) | \
    filt(pre = lambda (x, y) : x < 0.5 and y < 0.5) | \
    corr()

CSV Chunking

The following shows the performance of two implementations finding the mean of a CSV column using direct !Numpy and dagpype.

chunks CSV performance

The !NumPy implementation processing all data in a single chunk is:

x = numpy.genfromtxt(_f_name, usecols = (0), delimiter = ',')
a = numpy.mean(x)

The pipeline implementation is:

c = np.chunk_stream_vals(_f_name, '0') | np.mean()

Binary File Chunking

Correlation

The following shows the performance of three implementations finding the correlation between data in binary format using C code, direct !Numpy, and dagpype:

chunks performance

The C implementation is:

double c_corr_prune(const char f_name[])
{
    FILE *const pf = fopen(f_name, "rb");
    assert(pf != NULL);
    double sx = 0, sxx = 0, sy = 0, syy = 0, sxy = 0;
    size_t n = 0;
    while(1)
    {
        double x, y;
        if(fread(&x, sizeof(double), 1, pf) != 1 || fread(&y, sizeof(double), 1, pf) != 1 || feof(pf))
        {
            fclose(pf);
            break;
        }
        if(x >= 0.25 || y >= 0.25)
            continue;
        sx += x;
        sxx += x * x;
        sy += y;
        sxy += x * y;
        syy += y * y;
        ++n;
    }

    // printf("C %ld values\n", n);

    return (n * sxy - sx * sy) / sqrt(n * sxx - sx * sx) / sqrt(n * syy - sy * sy);
}

The !NumPy implementation processing all data in a single chunk is:

s = open(_f_name, 'rb').read()
a = numpy.fromstring(s)
xy = a.reshape(a.shape[0] / 2, 2)

s = numpy.sum(xy, axis = 0)
sx = s[0]
sy = s[1]

c = numpy.dot(xy.T, xy)

sxx = c[0, 0]
sxy = c[0, 1]
syy = c[1, 1]

n = xy.shape[0]
# print 'numpy core', n, 'values'
res = (n * sxy - sx * sy) / math.sqrt(n * sxx - sx * sx) / math.sqrt(n * syy - sy * sy)

The pipeline implementation is:

c = np.chunk_stream_bytes(_f_name, num_cols = 2) | np.corr()

Pruned Correlation

The following shows the performance of three implementations finding the correlation between data in binary format, pruning pairs with values larger than 0.25, using C code, direct !Numpy, and dagpype:

chunks + pruning performance

The C implementation is:

double c_corr_prune(const char f_name[])
{
    FILE *const pf = fopen(f_name, "rb");
    assert(pf != NULL);
    double sx = 0, sxx = 0, sy = 0, syy = 0, sxy = 0;
    size_t n = 0;
    while(1)
    {
        double x, y;
        if(fread(&x, sizeof(double), 1, pf) != 1 || fread(&y, sizeof(double), 1, pf) != 1 || feof(pf))
        {
            fclose(pf);
            break;
        }
        if(x >= 0.25 || y >= 0.25)
            continue;
        sx += x;
        sxx += x * x;
        sy += y;
        sxy += x * y;
        syy += y * y;
        ++n;
    }

    // printf("C %ld values\n", n);

    return (n * sxy - sx * sy) / sqrt(n * sxx - sx * sx) / sqrt(n * syy - sy * sy);
}

The !NumPy implementation processing all data in a single chunk is:

s = open(_f_name, 'rb').read()
a = numpy.fromstring(s)
xy = a.reshape(a.shape[0] / 2, 2)
xy = xy[numpy.logical_and(xy[:, 0] < 0.25, xy[:, 1] < 0.25), :]

s = numpy.sum(xy, axis = 0)
sx = s[0]
sy = s[1]

c = numpy.dot(xy.T, xy)

sxx = c[0, 0]
sxy = c[0, 1]
syy = c[1, 1]

n = xy.shape[0]
# print 'numpy core', n, 'values'
res = (n * sxy - sx * sy) / math.sqrt(n * sxx - sx * sx) / math.sqrt(n * syy - sy * sy)

single

The pipeline implementation is:

c = np.chunk_stream_bytes(_f_name, num_cols = 2) | \
    filt(lambda a : a[numpy.logical_and(a[:, 0] < 0.25, a[:, 1] < 0.25), :]) | \
    np.corr()

Truncated Correlation

The following shows the performance of three implementations finding the correlation between data in binary format, truncating values at 0.25, using C code, direct !Numpy, and dagpype:

chunks + performance

The C implementation is:

double c_corr_trunc(const char f_name[])
{
    FILE *const pf = fopen(f_name, "rb");
    assert(pf != NULL);
    double sx = 0, sxx = 0, sy = 0, syy = 0, sxy = 0;
    size_t n = 0;
    while(1)
    {
        double x, y;
        if(fread(&x, sizeof(double), 1, pf) != 1 || fread(&y, sizeof(double), 1, pf) != 1 || feof(pf))
        {
            fclose(pf);
            break;
        }
        x = fmin(x, 0.25);
        y = fmin(x, 0.25);
        sx += x;
        sxx += x * x;
        sy += y;
        sxy += x * y;
        syy += y * y;
        ++n;
    }

    // printf("C %ld values\n", n);

    return (n * sxy - sx * sy) / sqrt(n * sxx - sx * sx) / sqrt(n * syy - sy * sy);
}

The NumPy implementation processing all data in a single chunk is:

s = open(_f_name, 'rb').read()
a = numpy.fromstring(s)
xy = a.reshape(a.shape[0] / 2, 2)
xy = numpy.where(xy < 0.25, xy, 0.25)

s = numpy.sum(xy, axis = 0)
sx = s[0]
sy = s[1]

c = numpy.dot(xy.T, xy)

sxx = c[0, 0]
sxy = c[0, 1]
syy = c[1, 1]

n = xy.shape[0]
# print 'numpy core', n, 'values'
res = (n * sxy - sx * sy) / math.sqrt(n * sxx - sx * sx) / math.sqrt(n * syy - sy * sy)

The pipeline implementation is:

c = np.chunk_stream_bytes(_f_name, num_cols = 2) | \
    filt(lambda a : numpy.where(a < 0.25, a, 0.25)) | \
    np.corr()

Table Of Contents

Previous topic

Augmenting Pipelines

This Page