R – How to run a high pass or low pass filter on data points in R

frequency-analysisrsignal processing

I am a beginner in R and I have tried to find information about the following without finding anything.

The green graph in the picture is composed by the red and yellow graphs. But let's say that I only have the data points of something like the green graph. How do I extract the low/high frequencies (i.e. approximately the red/yellow graphs) using a low pass/high pass filter?

low frequency sinus curve with high frequency sinus curve modulated onto

Update: The graph was generated with

number_of_cycles = 2
max_y = 40

x = 1:500
a = number_of_cycles * 2*pi/length(x)

y = max_y * sin(x*a)
noise1 = max_y * 1/10 * sin(x*a*10)

plot(x, y, type="l", col="red", ylim=range(-1.5*max_y,1.5*max_y,5))
points(x, y + noise1, col="green", pch=20)
points(x, noise1, col="yellow", pch=20)

Update 2: Using the Butterworth filter in the signal package suggested I get the following:

Original picture with filtered graphs added

library(signal)

bf <- butter(2, 1/50, type="low")
b <- filter(bf, y+noise1)
points(x, b, col="black", pch=20)

bf <- butter(2, 1/25, type="high")
b <- filter(bf, y+noise1)
points(x, b, col="black", pch=20)

The calculations was a bit work, signal.pdf gave next to no hints about what values W should have, but the original octave documentation at least mentioned radians which got me going. The values in my original graph was not chosen with any specific frequency in mind, so I ended up with the following not so simple frequencies: f_low = 1/500 * 2 = 1/250, f_high = 1/500 * 2*10 = 1/25 and the sampling frequency f_s = 500/500 = 1. Then I chose a f_c somewhere inbetween the low and high frequencies for the low/high pass filters (1/100 and 1/50 respectively).

Best Answer

I bumped into similar problem recently and did not find the answers here particularly helpful. Here is an alternative approach.

Let´s start by defining the example data from the question:

number_of_cycles = 2
max_y = 40

x = 1:500
a = number_of_cycles * 2*pi/length(x)

y = max_y * sin(x*a)
noise1 = max_y * 1/10 * sin(x*a*10)
y <- y + noise1

plot(x, y, type="l", ylim=range(-1.5*max_y,1.5*max_y,5), lwd = 5, col = "green")

enter image description here

So the green line is the dataset we want to low-pass and high-pass filter.

Side note: The line in this case could be expressed as a function by using cubic spline (spline(x,y, n = length(x))), but with real world data this would rarely be the case, so let's assume that it is not possible to express the dataset as a function.

The easiest way to smooth such data I have came across is to use loess or smooth.spline with appropriate span/spar. According to statisticians loess/smooth.spline is probably not the right approach here, as it does not really present a defined model of the data in that sense. An alternative is to use Generalized Additive Models (gam() function from package mgcv). My argument for using loess or smoothed spline here is that it is easier and does not make a difference as we are interested in the visible resulting pattern. Real world datasets are more complicated than in this example and finding a defined function for filtering several similar datasets might be difficult. If the visible fit is good, why to make it more complicated with R2 and p values? To me the application is visual for which loess/smoothed splines are appropriate methods. Both of the methods assume polynomial relationships with the difference that loess is more flexible also using higher degree polynomials, while cubic spline is always cubic (x^2). Which one to use depends on trends in a dataset. That said, the next step is to apply a low-pass filter on the dataset by using loess() or smooth.spline():

lowpass.spline <- smooth.spline(x,y, spar = 0.6) ## Control spar for amount of smoothing
lowpass.loess <- loess(y ~ x, data = data.frame(x = x, y = y), span = 0.3) ## control span to define the amount of smoothing

lines(predict(lowpass.spline, x), col = "red", lwd = 2)
lines(predict(lowpass.loess, x), col = "blue", lwd = 2)

enter image description here

Red line is the smoothed spline filter and blue the loess filter. As you see results differ slightly. I guess one argument of using GAM would be to find the best fit, if the trends really were this clear and consistent among datasets, but for this application both of these fits are good enough for me.

After finding a fitting low-pass filter, the high-pass filtering is as simple as subtracting the low-pass filtered values from y:

highpass <- y - predict(lowpass.loess, x)
lines(x, highpass, lwd =  2)

enter image description here

This answer comes late, but I hope it helps someone else struggling with similar problem.

Related Solutions

R – How to join (merge) data frames (inner, outer, left, right)

By using the merge function and its optional parameters:

Inner join: merge(df1, df2) will work for these examples because R automatically joins the frames by common variable names, but you would most likely want to specify merge(df1, df2, by = "CustomerId") to make sure that you were matching on only the fields you desired. You can also use the by.x and by.y parameters if the matching variables have different names in the different data frames.

Outer join: merge(x = df1, y = df2, by = "CustomerId", all = TRUE)

Left outer: merge(x = df1, y = df2, by = "CustomerId", all.x = TRUE)

Right outer: merge(x = df1, y = df2, by = "CustomerId", all.y = TRUE)

Cross join: merge(x = df1, y = df2, by = NULL)

~~Just as with the inner join, you would probably want to explicitly pass "CustomerId" to R as the matching variable.~~ I think it's almost always best to explicitly state the identifiers on which you want to merge; it's safer if the input data.frames change unexpectedly and easier to read later on.

You can merge on multiple columns by giving by a vector, e.g., by = c("CustomerId", "OrderId").

If the column names to merge on are not the same, you can specify, e.g., by.x = "CustomerId_in_df1", by.y = "CustomerId_in_df2" where CustomerId_in_df1 is the name of the column in the first data frame and CustomerId_in_df2 is the name of the column in the second data frame. (These can also be vectors if you need to merge on multiple columns.)

Python – How to implement band-pass Butterworth filter with Scipy.signal.butter

You could skip the use of buttord, and instead just pick an order for the filter and see if it meets your filtering criterion. To generate the filter coefficients for a bandpass filter, give butter() the filter order, the cutoff frequencies Wn=[low, high] (expressed as the fraction of the Nyquist frequency, which is half the sampling frequency) and the band type btype="band".

Here's a script that defines a couple convenience functions for working with a Butterworth bandpass filter. When run as a script, it makes two plots. One shows the frequency response at several filter orders for the same sampling rate and cutoff frequencies. The other plot demonstrates the effect of the filter (with order=6) on a sample time series.

from scipy.signal import butter, lfilter


def butter_bandpass(lowcut, highcut, fs, order=5):
    nyq = 0.5 * fs
    low = lowcut / nyq
    high = highcut / nyq
    b, a = butter(order, [low, high], btype='band')
    return b, a


def butter_bandpass_filter(data, lowcut, highcut, fs, order=5):
    b, a = butter_bandpass(lowcut, highcut, fs, order=order)
    y = lfilter(b, a, data)
    return y


if __name__ == "__main__":
    import numpy as np
    import matplotlib.pyplot as plt
    from scipy.signal import freqz

    # Sample rate and desired cutoff frequencies (in Hz).
    fs = 5000.0
    lowcut = 500.0
    highcut = 1250.0

    # Plot the frequency response for a few different orders.
    plt.figure(1)
    plt.clf()
    for order in [3, 6, 9]:
        b, a = butter_bandpass(lowcut, highcut, fs, order=order)
        w, h = freqz(b, a, worN=2000)
        plt.plot((fs * 0.5 / np.pi) * w, abs(h), label="order = %d" % order)

    plt.plot([0, 0.5 * fs], [np.sqrt(0.5), np.sqrt(0.5)],
             '--', label='sqrt(0.5)')
    plt.xlabel('Frequency (Hz)')
    plt.ylabel('Gain')
    plt.grid(True)
    plt.legend(loc='best')

    # Filter a noisy signal.
    T = 0.05
    nsamples = T * fs
    t = np.linspace(0, T, nsamples, endpoint=False)
    a = 0.02
    f0 = 600.0
    x = 0.1 * np.sin(2 * np.pi * 1.2 * np.sqrt(t))
    x += 0.01 * np.cos(2 * np.pi * 312 * t + 0.1)
    x += a * np.cos(2 * np.pi * f0 * t + .11)
    x += 0.03 * np.cos(2 * np.pi * 2000 * t)
    plt.figure(2)
    plt.clf()
    plt.plot(t, x, label='Noisy signal')

    y = butter_bandpass_filter(x, lowcut, highcut, fs, order=6)
    plt.plot(t, y, label='Filtered signal (%g Hz)' % f0)
    plt.xlabel('time (seconds)')
    plt.hlines([-a, a], 0, T, linestyles='--')
    plt.grid(True)
    plt.axis('tight')
    plt.legend(loc='upper left')

    plt.show()

Here are the plots that are generated by this script:

Frequency response for several filter orders

enter image description here

Best Answer

Related Solutions

R – How to join (merge) data frames (inner, outer, left, right)

Python – How to implement band-pass Butterworth filter with Scipy.signal.butter

Related Topic