Object-oriented – Object oriented vs vector based programming

design-patternsobject-orientedpythonspeed

I am torn between object oriented and vector based design. I love the abilities, structure and safety that objects give to the whole architecture. But at the same time, speed is very important to me, and having simple float variables in an array really helps in vector based languages/ libraries like Matlab or numpy in Python.

Here is a piece of code I wrote to illustrate my point

Problem: Adding Tow volatility numbers. If x and y are two volatility numbers, the sum of the volatility is (x^2 + y^2)^0.5 (assuming certain mathematical condition but that's not important here).

I want to perform this operation very fast, and at the same time I need to ensure that people don't just add the volatility in the wrong way (x+y). Both of these are important.

The OO based design would be something like this:

from datetime import datetime 
from pandas import *

class Volatility:
    def __init__(self,value):
       self.value = value

    def __str__(self):
       return "Volatility: "+ str(self.value)

    def __add__(self,other):
        return Volatility(pow(self.value*self.value + other.value*other.value, 0.5))

(Aside: For those who are new to Python, __add__ is just a function that overrides the + operator)

Let's say I add tow lists of volatility values

n = 1000000
vs1 = Series(map(lambda x: Volatility(2*x-1.0), range(0,n)))
vs2 = Series(map(lambda x: Volatility(2*x+1.0), range(0,n))) 

(Aside: Again, a Series in Python is sort of a list with an index)
Now I want to add the two:

t1 = datetime.now()
vs3 = vs1 + vs2
t2 = datetime.now()
print t2-t1

Just the addition runs in 3.8 seconds on my machine, the results I have given doesn't include the object initializaion time at all, its only the addition code that has been timed. If I run the same thing using numpy arrays:

nv1 = Series(map(lambda x: 2.0*x-1.0, range(0,n)))
nv2 = Series(map(lambda x: 2.0*x+1.0, range(0,n)))

t3 = datetime.now()
nv3 = numpy.sqrt((nv1*nv1+nv2*nv2))
t4 = datetime.now()
print t4-t3

It runs in 0.03 seconds. That's more than 100 times faster!

As you can see, the OOP way gives me a lot of security that people won't be adding Volatility the wrong way, but the vector method is just so crazy fast! Is there a design in which I can get both? I am sure a lot of you have run into similar design choices, how did you work it out?

The choice of language here is immaterial. I know a lot of you would advise that use C++ or Java, and the code may run faster than vector based languages anyway. But that's not the point. I need to use Python, because I have a host of libraries not available in other languages. That's my constraint. I need to optimize within it.

And I know, that a lot of people would suggest parallelization, gpgpu etc. But I want to maximize single core performance first, and then I can parallelize both the versions of code.

Thanks in advance!

Best Answer

As you can see, the OOP way gives me a lot of security that people won't be adding Volatility the wrong way, but the vector method is just so crazy fast! Is there a design in which I can get both? I am sure a lot of you have run into similar design choices, how did you work it out?

Design bigger objects. A Pixel object has no breathing room for a parallelized loop or GPU image transformations or anything like that. An Image does provided it doesn't have to go through the barrier of a teeny Pixel object to get at the data.

Related Topic