Time Series Analysis #2: Recognizing Patterns within a Time Series

February 18, 2014 Caleb Welton

featured-TimeSeriesThe previous blog post in this series provided a brief introduction to the SQL Window Function construct, which can be used as a basis for many sorts of ordered calculations within SQL. This post elaborates on how this query capability can be used for a specific type of problem that frequently shows up in time series analysis, which is the recognition of simple patterns of movement within a series. Time series data can be found in many real world applications, including click stream processing, financial analysis, and sensor data.

Within a financial time series, one common problem of interest is identifying a V pattern, e.g. a strictly falling price followed by a strictly rising price. To identify this pattern within a SQL query it is useful to build up to a solution using several layers of subquery to facilitate the processing. The first stage of the query marks each row as either rising or falling, which is done using the window function lag() to compare the current row to the preceding row.

SELECT
  ticker, ts, price,
  price - lag(price, 1) OVER (w) > 0 as rising,
  price - lag(price, 1) OVER (w) < 0 as falling
FROM market_data
WINDOW w as (PARTITION BY ticker ORDER BY ts)
ORDER BY ticker, ts;

 ticker |             ts             | price | rising | falling
--------+----------------------------+-------+--------+---------
 abcd   | 2014-01-14 11:00:44.863071 | 57.98 |        |
 abcd   | 2014-01-14 11:00:47.727358 | 57.96 | f      | t
 abcd   | 2014-01-14 11:00:50.631657 | 57.96 | f      | f
 abcd   | 2014-01-14 11:00:53.624203 | 57.95 | f      | t
 abcd   | 2014-01-14 11:00:56.584564 | 57.93 | f      | t
 abcd   | 2014-01-14 11:00:58.776706 | 57.94 | t      | f
 abcd   | 2014-01-14 11:01:01.99315  | 57.95 | t      | f
 abcd   | 2014-01-14 11:01:03.289345 | 57.94 | f      | t

In this example, the ticker starts at 57.98, drops down to 57.93, and then rises to 57.95 before it drops again. This query identifies which trades are moving in which direction. The next step is to use this initial query as a subquery and filter out instances where the stock is holding steady, which can be done by adding an outer where clause, which will restrict the input where the stock is either rising or falling. It is also possible to begin the next stage of processing–detecting if next record is moving in the same direction as the current record.

WITH
  market_rising_falling as (
    SELECT
      ticker, ts, price,
      price - lag(price, 1) OVER (w) > 0 as rising,
      price - lag(price, 1) OVER (w) < 0 as falling
    FROM market_data
    WINDOW w as (PARTITION BY ticker ORDER BY ts)
  )
SELECT ticker, ts, price, rising, falling,
    rising != lead(rising, 1) OVER (w2) as change
FROM market_rising_falling
WHERE rising is null or rising or falling
WINDOW w2 as (PARTITION BY ticker ORDER BY ts);

 ticker |             ts             | price | rising | falling | change
--------+----------------------------+-------+--------+---------+--------
 abcd   | 2014-01-14 11:00:44.863071 | 57.98 |        |         |
 abcd   | 2014-01-14 11:00:47.727358 | 57.96 | f      | t       | f
 abcd   | 2014-01-14 11:00:53.624203 | 57.95 | f      | t       | f
 abcd   | 2014-01-14 11:00:56.584564 | 57.93 | f      | t       | t
 abcd   | 2014-01-14 11:00:58.776706 | 57.94 | t      | f       | f
 abcd   | 2014-01-14 11:01:01.99315  | 57.95 | t      | f       | t
 abcd   | 2014-01-14 11:01:03.289345 | 57.94 | f      | t       |

In this query peaks are now marked as true; 57.93 is the low point of the V pattern and 57.95 is the right side peak. There are also two rows that are NULL with respect to the calculated change field, which are our first and last records within the window. For these rows it is not possible to know if they are peaks as the calculation of a peak requires knowing the value of the previous/next record in the series; these records will need some special case handling.

Download the Pivotal Data Science Lab datasheet.

The next step is to make one more pass to filter out rows that are not changing direction, and then determine the low and high marks for direction change. This is accomplished using lag and lead to retrieve the previous/next peak after filtering out the non-peak records.

 WITH
  market_rising_falling as (
    SELECT
      ticker, ts, price,
      price - lag(price, 1) OVER (w) > 0 as rising,
      price - lag(price, 1) OVER (w) < 0 as falling
    FROM market_data
    WINDOW w as (PARTITION BY ticker ORDER BY ts)
  ),
  market_change as (
    SELECT ticker, ts, price, rising, falling,
      rising != lead(rising, 1) OVER (w2) as change
    FROM market_rising_falling
  WHERE rising is null or rising or falling
  WINDOW w2 as (PARTITION BY ticker ORDER BY ts)
  )
SELECT ticker, ts,
  lag(price, 1) OVER (w3) as prev_peak,
  price,
  lead(price, 1) OVER (w3) as next_peak
FROM market_change
WHERE change is null or change
WINDOW w3 as (PARTITION BY ticker ORDER BY ts);

 ticker |             ts             | prev_peak | price | next_peak
--------+----------------------------+-----------+-------+-----------
 abcd   | 2014-01-14 11:00:44.863071 |           | 57.98 |     57.93
 abcd   | 2014-01-14 11:00:56.584564 |     57.98 | 57.93 |     57.95
 abcd   | 2014-01-14 11:01:01.99315  |     57.93 | 57.95 |     57.94
 abcd   | 2014-01-14 11:01:03.289345 |     57.95 | 57.94 |

This gives both “strictly rising then strictly falling” and “strictly falling then strictly rising” patterns. A simple filter will restrict this to just the “strictly falling then strictly rising” V pattern. A decision also needs to be made how to handle the edge conditions at the beginning and end of the window. In this next example the record will be kept so long as one of prev_peak or next_peak indicates that the record is the bottom of a V pattern.

WITH
  market_rising_falling as (
    SELECT
      ticker, ts, price,
      price - lag(price, 1) OVER (w) > 0 as rising,
      price - lag(price, 1) OVER (w) < 0 as falling
    FROM market_data
    WINDOW w as (PARTITION BY ticker ORDER BY ts)
  ),
  market_change as (
    SELECT ticker, ts, price, rising, falling,
      rising != lead(rising, 1) OVER (w2) as change
    FROM market_rising_falling
  WHERE rising is null or rising or falling
  WINDOW w2 as (PARTITION BY ticker ORDER BY ts)
  ),
  market_peaks as (
    SELECT ticker, ts,
      lag(price, 1) OVER (w3) as prev_peak,
      price,
      lead(price, 1) OVER (w3) as next_peak
    FROM market_change
    WHERE change is null or change
    WINDOW w3 as (PARTITION BY ticker ORDER BY ts)
  )
SELECT * FROM market_peaks
WHERE price < prev_peak or price < next_peak
ORDER BY ticker, ts;

 ticker |             ts             | prev_peak | price | next_peak
--------+----------------------------+-----------+-------+-----------
 abcd   | 2014-01-14 11:00:56.584564 |     57.98 | 57.93 |     57.95
 abcd   | 2014-01-14 11:01:03.289345 |     57.95 | 57.94 |
 

If, instead, only the complete V patterns are desired, e.g. excluding the last partial result, then the “price < prev_peak or price < next_peak” expression can be changed to an AND expression “price < prev_peak AND price < next_peak”.

This concludes the introduction to Time Series Pattern Matching, part two of the Time Series Analysis blog posts. Stay tuned for the third article in the series, where resampling and interpolation will be discussed.

About the Author

Biography

Previous
Using Hologram with Rails to Auto-Generate Styleguides
Using Hologram with Rails to Auto-Generate Styleguides

Styleguides are handy because they document conventions for projects; making communication easier across th...

Next
The 3R System: How Retailers Can Minimize Customer Acquisition Costs
The 3R System: How Retailers Can Minimize Customer Acquisition Costs

This post originally appeared on the Mobile Retail Blog.   It’s never been more important for retailers to ...