Draft Technical Memorandum

 

DATE:     December 17, 2015

TO:         Congestion Management Process Committee

FROM:    Ryan Hicks and Scott Peterson

                Metropolitan Planning Organization Staff

RE:          Massachusetts Department of Transportation and National Performance Management Research Dataset Roadway-Monitoring Dataset Analysis

 

1        Purpose and bACKGROUND

1.1    Purpose

The purpose of this memorandum is to examine the merits of two travel time datasets that could act as cost-effective alternatives to the more costly INRIX travel time data. The two travel time datasets being examined are the National Performance Management Research Dataset (NPMRDS), provided by the Federal Highway Administration (FHWA), and the Massachusetts Department of Transportation (MassDOT) roadway-monitoring dataset, both of which were obtained by the Boston Region MPO at no cost. Each dataset was examined to determine the suitability for future Congestion Management Process (CMP) and freight-planning work in comparison to the 2012 INRIX dataset, which previously was purchased by the Boston Region Metropolitan Planning Organization (MPO). This analysis was requested by Boston Region MPO staff and approved by the CMP committee for the 2015 Federal Fiscal Year.

1.2    Background

In 2013, the Boston Region MPO purchased a roadway-monitoring dataset from INRIX. This dataset contains travel times for every minute in 2012 for most major roadways in the Boston region. MPO staff were able to successfully use this dataset for several tasks, including the CMP and the Long-Range Transportation Plan (LRTP). MPO staff were able to create two interactive dashboards that represent an average of the AM and PM peak period congestion on various roadways.

In early 2015, the Boston Region MPO was able to acquire both the NPMRDS and the MassDOT dataset at no cost. The analysis discussed herein was conducted to see if either of these datasets would be useful for the CMP or freight-planning work. Two sample roadways were identified for comparison: a freeway and a major arterial roadway. Both the northbound and southbound sections of I-93 between Route 60 and Route 129 were chosen to represent the freeway; both the eastbound and westbound sections of Route 9 from Brookline Avenue to I-95 were analyzed to gauge data consistency of a major arterial roadway. The average annual daily traffic (AADT) for the I-93 corridor is between 112,000 and 192,000 vehicles. The AADT for the Route 9 corridor is between 43,000 and 63,000 vehicles. The hours between 6:30 AM and 9:30 AM were chosen for analysis.

2        Description of datasets

2.1    INRIX

INRIX is a private company that collects and processes vehicle-probe data1. Once data are processed, INRIX sells the data to government organizations and other transportation entities to use for planning purposes. INRIX provides travel times in both real time and aggregated form. In 2013, the Boston Region MPO purchased a dataset from INRIX at the cost of $53,900. The purchased dataset contains records displaying travel for every minute of 2012 from most roadways in the Boston region, including collector roadways, arterial roadways, and freeways. INRIX data records are provided in one-minute increments for 24 hours per day, 365 days per year. This dataset is currently used for the CMP, the LRTP, and the Travel Demand Model.

2.2    NPMRDS

In 2013, the FHWA entered into an agreement with HERE, a private company, to provide travel time data in the form of the NPMRDS to MPOs at no cost. The contract between the FHWA and HERE consists of four one-year options, which will continue until June 2017. NPMRDS data are provided monthly for download from the HERE website. Data records are available in five-minute increments for 24 hours per day, 365 days per year. The NPMRDS contains data that was collected between July 2013 and the present. NPMRDS records contain three travel time values: all vehicles, passenger vehicles only, and freight vehicles only. However, the NPMRDS does not provide the sample size that was recorded for each five-minute interval. The NPMRDS is the only dataset examined in this analysis that separated freight data from passenger-vehicle data. Passenger data are obtained from several sources, including mobile phones, individual vehicles, portable navigation devices, and vehicle transponders. Freight probe data are obtained from the American Transportation Research Institute (ATRI), leveraging embedded fleet systems. ATRI freight data is a database that is separate from the NPMRDS database. It monitors probe points from several hundred thousand freight fleets, including tractor-trailer combination trucks, medium-to-large fleet trucks, dry van trailers, and flatbed trailers.2

2.3    MassDOT

Over the last few years, MassDOT has installed BlueTOAD Bluetooth readers to monitor travel time on the Massachusetts freeways. The Bluetooth readers produce time-stamp information for Bluetooth devices by tracking their unique MAC addresses. Data are then smoothed into five-minute travel time averages by Bluetooth pair IDs3. Data are collected for the purpose of providing real-time traffic monitoring (RTTM) travel time information on dynamic messaging signs throughout Massachusetts. MassDOT Bluetooth data were provided to the Boston Region MPO in aggregated form at no cost. Currently, the MassDOT dataset has the smallest data coverage network of the three analyzed datasets.

3        Segment definition of each dataset

Each analyzed dataset uses a different definition for a measured roadway segment. Table 1 shows the average length of each dataset’s roadway segments for the analyzed corridors. Both INRIX and NPMRDS use the Traffic Messaging Channel (TMC)4 network to determine the locations and length of the roadway segments. One notable difference between INRIX and NPMRDS is that INRIX separates center TMCs from noncenter TMCs5. For INRIX freeway monitoring, MPO staff kept the center and noncenter TMCs separated. However, the center and noncenter TMCs were combined on the INRIX arterial roadway network to eliminate data disparities that may exist at certain intersections. As a result, the INRIX dataset has the same arterial segment length as the NPMRDS, but the NPMRDS has freeway segment lengths that are roughly twice the length of the INRIX freeway segment lengths. MPO staff created MassDOT segments in a geographic information system (GIS). A MassDOT segment is measured between Bluetooth readers, as determined by a pair ID. Unfortunately, Bluetooth readers for MassDOT datasets are located four miles apart, on average, which is too long to measure congestion thoroughly.

 Table 1

Average Roadway Segment Length for Analyzed Corridors

Data Source

Average Segment Length (Freeway)

Average Segment Length (Arterial Roadway)

INRIX

0.64 miles

0.47 miles

NPMRDS

1.22 miles

0.47 miles

MassDOT

4.05 miles

N/A

 

4        Roadway Network Coverage

Figures 1, 2, and 3 in the Appendix show data coverage for the INRIX, NPMRDS, and MassDOT datasets, respectively. Each dataset varies in network coverage size. INRIX has the most widespread coverage; most roadways that are designated as collectors or above in roadway classification are covered by the dataset. The NPMRDS has coverage of all roadways that are a part of the National Highway System (NHS), which consist of all freeways and most arterial roadways in the Boston region. The MassDOT dataset has the smallest roadway network of the three datasets. MassDOT data coverage currently only covers I-93, I-90, and portions of I-95, I-495, Massachusetts Route 3, and Massachusetts Route 24.

5        Number AND Quality of samples 

Table 2 shows the number of samples in the analyzed corridors for the INRIX dataset in May 2012 and for the MassDOT and NPMRDS datasets in May 2014. INRIX data are provided in one-minute increments, and both the NPMRDS and MassDOT datasets are provided in five-minute increments. For the dataset to be considered statistically valid to use for congestion monitoring in the Boston Region MPO, data must yield at least 15 valid AM peak period samples per roadway segment for the month analyzed. For this analysis, all INRIX records used needed to meet the previously established CMP criteria: there must be a confidence score6 of 307 and a C-value8 of 759 or above. No outliers were excluded in the NPMRDS or MassDOT datasets for the initial analysis. However, Section 7.4 (below) presents and discusses the results of potential outlier removal methods for the NPMRDS.

5.1    Freeways

Table 2 shows the number of samples that were used from each dataset to analyze the I-93 corridor between Route 60 and Route 129. Overall, INRIX provided the greatest number of samples of any dataset. The MassDOT has the second highest number of samples: 360 five-minute samples were provided in May 2014. NPMRDS data for all vehicles provides 334 to 360 five-minute samples. The NPMRDS freight-only dataset provided significantly fewer samples than the main NPMRDS.

Table 2

Samples for I-93 Between Route 60 and Route 129

Data Source

Number of Samples

INRIX (75+ C-values)

1854 to 2134 samples in one-minute increments

NPMRDS (all vehicles)

334 to 360 samples in five-minute increments

NPMRDS (passenger vehicles)

329 to 360 samples in five-minute increments

NPMRDS (freight vehicles)

18 to 186 samples in five-minute increments

MassDOT

360 samples in five-minute increments

 

5.2    Arterial Roadways

Table 3 shows the number of samples that were used from each dataset to analyze the Route 9 corridor from I-95/Route 128 to Brookline Avenue. Overall, the INRIX dataset provided the greatest number of samples for the arterial corridor. NPMRDS data yielded between 88 and 329 five-minute AM peak period samples, depending on the TMC. The NPMRDS freight-only database provided a maximum of 17 samples, with several TMCs having no samples. MassDOT data is currently not collected for arterial roadways.

 

Table 3

Samples for Route 9 from I-95/Route 128 to Brookline Avenue

Data Source

Number of Samples

INRIX (75+ C-values)

1230 to 1835 samples in one-minute Increments

NPMRDS (all vehicles)

88 to 329 samples in five-minute Increments

NPMRDS (passenger vehicles)

86 to 329 samples in five-minute Increments

NPMRDS (freight vehicles)

0 to 17 samples in five-minute Increments

MassDOT

N/A

 

6        Methodology for Analysis

6.1    Steps

  1. The INRIX, NPMRDS, and MassDOT datasets were uploaded to Google BigQuery so that MPO staff could have easy access to specific data records that are needed for the analysis. Google Big Query is an SQL- based remote querying service that allows the querying of trillion-record databases in seconds. Google BigQuery was selected by MPO staff as a solution to the MPO’s issues with querying large databases in a timely manner. 
  2. For both the INRIX and NPMRDS datasets, a roadway network shapefile, which was created by the dataset providers, was opened in ArcMap. Each TMC that represented a portion of the corridors selected for this analysis was exported into a separate shapefile. Any inaccurate directional or line work was corrected in this step. A list of TMCs for each dataset for the sampled roadways was recorded for the querying of the data.
  3. The locations of MassDOT Bluetooth readers were opened in ArcMap. A CSV file that has the longitude and latitude of the Bluetooth reader locations was provided by MassDOT. Using this information, MPO staff depicted roadway links between the Bluetooth readers. These newly drawn roadway links represent the MassDOT roadway network for this analysis.
  4. Data were queried for each of the three datasets. The criteria used for the query included any records between 6:30 AM and 9:30 AM for all nonholiday Tuesdays, Wednesdays, or Thursdays in May 2014 (May 2012 for the INRIX dataset). The INRIX query also excluded records that did not have a confidence score of 30 and a C-value of 75 of above.
  5. Data from each dataset were saved in Microsoft Excel. Formulas were used to find the average travel times for each dataset for every five minutes between 6:30 AM and 9:30 AM. The number of samples used for each five-minute period was also counted. NPMRDS data contain fields for all vehicles, passenger vehicles only, and freight vehicles only. Averages were calculated for each of these fields.
  6. The free-flow travel times were calculated for both the NPMRDS and MassDOT datasets10. The purpose of calculating the free-flow travel time for this analysis was to determine if each dataset has enough records to accurately calculate free-flow speeds for all roadways. Free-flow speed is necessary to calculate reliability-based performance measures. Speed limits were used to calculate the estimated travel time through a corridor if a vehicle is traveling at the speed limit. It is important that the free-flow travel times for the NPMRDS and MassDOT datasets be slightly less than the estimated speed limit travel times.
  7. Average speeds were also calculated for the entire AM peak period for every roadway segment in each dataset. The average AM peak period speeds were uploaded to ArcGIS. A map showing a side-by-side comparison of the travel speeds for each dataset was created in GIS. These maps are displayed as Figure 4 and Figure 5 in the Appendix.
  8. Graphs were created that compare the travel times for each dataset. The average travel time for every five-minute increment between 6:30 AM and 9:30 AM for each dataset was plotted on the graphs. Additionally, a trend line was added to the graph for each dataset, showing the progression of travel times throughout the peak period. There was one graph for each direction in each corridor.

7        Data Findings

7.1    Freeways

The INRIX, NPMRDS, and MassDOT databases were compared by measuring the travel time on I-93 northbound and southbound between Route 60 and Route 129. Figures 6 and 7 in the Appendix show the travel times for each dataset in the northbound and southbound travel directions, respectively. The travel time for each dataset was averaged for every five minutes between 6:30 AM and 9:30 AM for all nonholiday Tuesdays, Wednesdays, and Thursdays in May 2014 (May 2012 for the INRIX dataset). Data points that represent every five minutes between 6:30 AM and 9:30 AM were plotted for each dataset. A trend line was added to indicate the change in travel time for each dataset over the three-hour period.

Northbound

Southbound (Peak Direction)

Conclusions

7.2    Arterial Roadways

For the arterial roadway analysis, the INRIX and NPMRDS databases were compared by measuring the travel times on Route 9 between I-95/Route 128 and Brookline Avenue. Figures 8 and 9 in the Appendix show the travel times for each dataset in the eastbound and westbound travel directions, respectively. As with the I-93 freeway corridor, the travel time for each dataset was averaged for every five minutes between 6:30 AM and 9:30 AM for all nonholiday Tuesdays, Wednesdays, and Thursdays in May 2012 for INRIX data and in May 2014 for NPMRDS data. A trend line was added to each dataset to indicate the change in travel time for each dataset over the three-hour period. Data points are absent in the graph where the five-minute increments were missing travel time values in the corridors for the NPMRDS.

Eastbound (Peak Direction)

Westbound

Conclusions

7.3    NPMRDS: Passenger-Only Data Versus Freight-Only Data

The NPMRDS includes three fields: all vehicles, passenger vehicles only, and freight vehicles only. An analysis was conducted to determine how these dataset values compare with each other. The corridor analyzed was I-93 in both directions between Route 60 and Route129. The freight-only dataset will not be used for the analysis for arterial roadways because of the small number of samples: only 2.8 percent of the records in the Route 9 corridor contained freight data. Figures 10 and 11 in the Appendix show the travel times for each of the dataset values in the northbound and southbound travel directions, respectively.

The travel time for each dataset value was averaged for every 15 minutes between 6:30 AM and 9:30 AM for all nonholiday Tuesdays, Wednesdays, and Thursdays in May 2014. Fifteen-minute increments had to be used for this specific analysis because of the small number of samples in the dataset. Data points that represent every 15-minute interval between 6:30 AM and 9:30 AM were plotted for each dataset value. A trend line was added to represent each dataset to indicate the change in travel time for each dataset over the three-hour period. Data points are absent on the graphs where a 15-minute increment was missing travel time values in the corridors for the NPMRDS.

Northbound

Southbound (Peak Direction)

Conclusions

7.4    Removing Outliers from the NPMRDS Dataset 

The NPMRDS dataset showed an extreme fluctuation in travel time due to the presence of outliers, particularly with the vehicle-only dataset for the arterial roadways and the freight-only dataset for freeways. Therefore, another analysis was conducted to see what the NPMRDS travel times would be when the outliers were removed from the dataset. MPO staff can label and exclude outliers from a large dataset fairly easily using the outlier removal methods described below. This analysis was conducted to determine which method is more effective for displaying more consistent travel time results from the datasets.

For this analysis, two methods were used to flag and remove outliers from the passenger-only dataset for the Route 9 corridor. The same tests were performed for the freight-only dataset for the I-93 corridor. These two samples of NPMRDS data were selected for the outlier test because they showed the most travel time variability with the five-minute data points in the initial analysis. The outlier test is a good way to indicate if these two data sample sets are salvageable for congestion monitoring.

Outlier Method 1: Remove all data outside 1.5 times the interquartile range (IQR)

For this method, any record that had a value of 1.5 times higher or lower than the interquartile range (Quartile 3 - Quartile 1) for the corresponding TMC was labeled as an outlier and excluded from the analysis. Data plots were then recalculated excluding the outliers. 

Outlier Method 2: Remove all data that has speeds less than five miles per hour or greater than 75 miles per hour

For this method, all data records that displayed speeds less than five miles per hour or greater than 75 miles per hour (10 miles per hour more than the maximum speed limit in Massachusetts) were labeled as outliers and excluded from the analysis. Data plots were then recalculated excluding the outliers. 

Results: Removal of outliers

Route 9 Arterial Corridor: All-Vehicles Dataset

Figures 12 and 13 show the travel times of the NPMRDS all-vehicles dataset on the Route 9 corridor before and after the outliers have been removed. Using Outlier Method 1 resulted in a slightly lower travel time than Outlier Method 2. However, both methods resulted in less skewed and more consistent data. Both methods lowered the travel time by approximately 9 minutes traveling eastbound and by approximately 18 minutes traveling westbound. With the outliers removed, NPMRDS data still show a travel time that is four minutes longer traveling eastbound and two minutes longer traveling westbound than INRIX data. Removing the outliers with either method makes NPMRDS data usable for arterial roadways. Outlier Method 1 was the most effective method for removing outliers from the passenger-only database.  

I-93 Freeway Corridor: Freight-Only Dataset

Figures 14 and 15 show the travel times of the NPMRDS freight-only dataset along the I-93 corridor before and after the outliers have been removed. Outlier Method 1 resulted in slightly lower travel times than Outlier Method 2. Both methods lowered the average travel time by eight minutes traveling northbound and approximately seven minutes traveling southbound. Removing the outliers with both methods reduced the variation between time increments. Removing the outliers with either method makes the NPMRDS freight-only dataset usable for freeways. Outlier Method 1 was the most effective method for removing outliers from the freight-only database.

8        Findings from other Transportation Agencies that haVE used npmrds data

Some MPOs are currently using NPMRDS data with online tools such as Vehicle Probe Project (VPP) Suite to analyze congestion. These analytical tools automatically remove outliers, but require either an additional cost or an affiliation with an entity such as a state Department of Transportation that has a close relationship with the provider. Other transportation agencies use exclusively cumulative distribution functions based on performance measures (e.g., Travel Time Index, Planning Time Index), which would eliminate the need to remove outliers. Cumulative distribution functions use percentiles since data measures certify that a few extreme outliers would have a minimal effect on the performance measure results.

VPP Suite is a dashboard that allows agencies to conduct planning and performance monitoring with vehicle-probe data that is used in combination with other transportation data, including traffic volumes or incident data. VPP Suite was created in collaboration with the I-95 Corridor Coalition and the University of Maryland in order to provide I-95 Corridor Coalition11 members with the ability to access reliable travel time and speed data on roadways. VPP Suite lets the user choose between INRIX, NPMRDS, and TomTom datasets for analysis. The dashboard includes analytics such as maps, system performance reports, and congestion trend analysis. The cost of VPP Suite for the Boston MPO model region for one year, depending on the dataset, varies from $387,000 to $478,000. The cost of VPP suite is too high for an individual MPO; however, the tool would become more cost-effective if purchased together with multiple agencies (e.g., MassDOT and the city of Boston). 

Some MPOs have been using NPMRDS data to create quarterly reports. These quarterly reports state the congested conditions of the region as a whole by monitoring the performance targets through the Moving Ahead for Progress in the 21st Century Act (MAP-21) legislation.

The most common performance measures are Travel Time Index, Planning Time Index, and Congested Hours.

NPMRDS freight-only data was also used by several MPOs. Because other MPO regions may have more freight traffic than the Boston region, they may be able to obtain datasets containing more samples of freight traffic. Boston is not a major freight vehicle hub, which could explain the small number of available NPMRDS freight samples on Boston roadways.

9        Recommendations

9.1    Freeway Data

Overall, it is strongly recommend that the Boston Region MPO continue to use INRIX data for freeway monitoring.

If INRIX data is not available, NPMRDS data should be used as a substitute for congestion monitoring.

It is not recommended that MassDOT data be used to analyze travel times and speeds on freeways.

9.2    Arterial Roadway Data

It is strongly recommend that the Boston Region MPO continue to use INRIX data to monitor arterial roadways.

If the INRIX dataset is not available, it is recommended that NPMRDS data be used to monitor congestion on arterial roadways on the NHS.

9.3      NPMRDS Freight Data

Currently, it is recommended that the NPMRDS freight data be used to analyze freight congestion for freeways only.

It is not recommended that the NPMRDS freight-only data be used to monitor freight congestion on arterial roadways because of the small number of available samples.

10        CONCLUSION

Staff reviewed three travel time datasets for this study: INRIX, NPMRDS, and MassDOT Bluetooth readers. Based on this review, the Boston Region MPO recommends the continued use of INRIX data, if possible. INRIX was superior in its temporal and spatial resolutions across all freeways and with a substantial number of arterials in our model area (164 communities).

Our proposal to update the travel time data includes the following sequence of events.

  1. Attempt to acquire the 2014 INRIX dataset at the same temporal and spatial resolution as the 2012 acquisition via the following options.

  2. If the Boston Region MPO is unable to acquire the 2014 INRIX dataset, the MPO could use NPMRDS data, even though it is a less than ideal substitute for the INRIX dataset, assuming that the FHWA continues to provide it at no cost (unless it is renewed, the current contract between the FHWA and HERE will end in June 2017).

 

RH/rh

 

Appendix

 

1 Vehicle-probe data are speed or travel time statistics that are collected in bulk from vehicle fleets or from individual travelers whose vehicles are equipped with GPS tracking devices. The data are then averaged for a certain time frame by private vendors and are made available to transportation entities for purchase. Vehicle-probe data can be collected by either contracted fleets or volunteers.

2 Oklahoma Department of Transportation, Travel Time Based Oklahoma Congestion Analysis: Pilot Study, available online at http://www.okladot.state.ok.us.

3 A pair ID is a field on which this tabular data is joined to whatever spatial data accompanies it. A pair ID represents the distance and travel time between two Bluetooth readers.

4 The Traffic Messaging Channel (TMC) location code is a common industry convention developed and maintained by the leading electronic map database vendors to uniquely define road segments. For freeways, a TMC location is defined as the segment between two interchanges. Oftentimes, the TMC segment definition varies for arterial roadways. 

5  Center TMC locations are indicated by either a “P” or an “N” in the TMC code. Center TMC locations represent a roadway segment that is located at an interchange or intersection. “P” stands for positive directionality and “N” stands for negative directionality. Center TMC locations that have a “P” in their code are aligned in either a northbound or eastbound direction. TMC locations that have an “N” in their code are aligned in either a southbound or westbound direction. Noncenter TMCs are indicated by a “+” or a “–” in the TMC code: a “+” stands for positive directionality and a “–“ stands for negative directionality. Noncenter TMC locations typically represent a roadway segment that aligns between interchanges or intersections. Noncenter TMC locations that have a “+” in their code are aligned in either a northbound or eastbound direction. Noncenter TMC locations that have a “–” in their code are aligned in either a southbound or westbound direction.

6 The confidence score is a metric that INRIX uses to indicate the source of a data record. The confidence score ranges from 10 to 30. A confidence score of 30 indicates that the data are collected in real time. A confidence score of 20 indicates that the data are both real time and historical. A confidence score of 10 indicates that the data source is exclusively historical. 

7 A minimum confidence score requirement of 30 was determined by MPO staff because of the preference to use exclusively real-time collected data for the CMP.

8 The C-value is a metric that INRIX uses to indicate the reliability of a data record. C-values for data records range from 0 to 100. A low C-value indicates that there may have been a sudden change in speed at a particular location, usually caused by an incident.

9 The threshold of 75, as decided by MPO staff, was an optimal number where data records that are extreme outliers would be excluded, but all TMCs would still be able to retain a statistically valid sample size.

10 The Boston Region MPO definition of free-flow speed and free-flow travel time are calculated using the 85th percentile speed of all records from each respective dataset between 12:00 AM and 2:00 AM (instead of during the AM and PM peak periods).

11 The I-95 Corridor Coalition is a partnership of transportation agencies, toll authorities, public safety, and related organizations, from the State of Maine to the State of Florida, with affiliate members in Canada. More information is available online at http://i95coalition.org (accessed October 27, 2015).