Electronic – Which filter should I use? Just want to keep high count values

filtersignal processing

I am asking this question because I am not quite sure which filter I should be using.

Mine is simply a signal made up of discrete values as such s = [1 2 2 2 3 4 2 4 3 4 5 3 2 3 3]. Then I would like, per window size, a filtered signal. So e.g. if I use a window size of 5 for s then I would get; s_filtered = [2 2 2 2 2 4 4 4 4 4 3 3 3 3 3]. Hence, I want to keep the values that statistically occur most often per window size.

Currently I am using just a median filter, but I do no think this is the correct approach.

Here is some python code to demonstrate what I am doing (but as said, which I think is wrong).

import numpy as np
import pylab *
from scipy.signal import medfilt

test = np.random.randint(10, size=1000)

fig, ax1 = plt.subplots(1,sharey=True, sharex=True, figsize=(15,5))
ax1.plot(test)
ax1.plot(medfilt(test,[99]),'r')
plt.show()

Random data with median filtered signal in red.

Where the red line is the filtered signal for a window size of 99.

Best Answer

If I understand your requirements, you are correct that a median filter won't fulfil them.

Consider the following sequence of values (sorted as per the median filter algorithm with a window of length 9.

(1 2 3 4 5 6 6 6 9)

The median value is 5 but the highest frequency value (statistically) is 6.

Now if you did a population count

Bin    0 1 2 3 4 5 6 7 8 9
Count  - 1 1 1 1 1 3 - - 1

you can identify the highest frequency value as 6. I don't know if this approach has a formal name, but "mode filter" seems to describe it well, by analogy with "median filter".

This could get computationally expensive.

However if you used a sliding window (say, length 9) then as each new value enters the window, you increment its count, and as it leaves the window you decrement its count, then maintaining the population counts is cheap. In other words, before calculating sample N, you add sample (N+4) and remove sample (N-4) from the popcount bins.

Then you loop over the bins to find the bin with the maximum count.

How you resolve cases where two or more bins have the same max count, e.g. in the sequence

(1 2 3 3 5 6 6 8 9)

giving counts

Bin    0 1 2 3 4 5 6 7 8 9
Count  - 1 1 2 - 1 2 - 1 1

where you could take the answer as either 3 or 6 ... that's up to you.