# Abstract

Liquid transportation fuels require costly and time-consuming tests to characterize metrics, such as Research Octane Number (RON) for gasoline. If fuel sale restrictions requiring use of standard Cooperative Fuel Research (CFR) testing procedures do not apply, these tests may be avoided by using multivariate statistical models to predict RON and other quantities. Existing techniques inform these models using information about existing, similar fuels—for example, training a model for gasoline RON with a large number of characterized gasoline samples. While this yields the most accurate predictive models for these fuels, this approach lacks the ability to predict characteristics of fuels outside the training data set. Here we show that an accurate statistical model for the RON of gasoline and gasoline-like fuels can be constructed by ensuring the representation of key functional groups in the spectroscopic data set are used to train the model. We found that a principal component regression model for RON based on IR absorbance and informed using neat and 134 mixtures of n-heptane, isooctane, toluene, ethanol, methylcyclohexane, and 1-hexene could predict RON for the 10 Coordinating Research Council (CRC) Fuels for Advanced Combustion Engine (FACE) gasolines and 12 FACE gasoline blends with ethanol within 34.8±36.1 on average and 51.2 in the worst case. We next studied the effect of adding 28 additional minor components found in the FACE gasolines to the statistical model, and determined that it was necessary to add additional representatives of the branched alkane and aromatics classes to reduce model error. For example, adding 2,3-dimethylpentane and xylene to the previous model allowed it to predict RON for the 22 target fuels within 0.3±4.4 on average and 7.9 in the worst case. However, we determined that the specific choice of fuel in those classes mattered less than ensuring the representation of the relevant functional group. This work builds upon previous efforts by creating models informed by neat and surrogate fuels—rather than complex real fuels—that could predict the performance of complex unknown fuels.