Predicting nancial markets with Go ogle Trends and not so random keywords Damien Challet Chaire de nance quantitative Laboratoire de mathmatiques appliques aux systmes cole Centrale Paris Grande Voie

Predicting nancial markets with Go ogle Trends and not so random keywords Damien Challet Chaire de nance quantitative Laboratoire de mathmatiques appliques aux systmes cole Centrale Paris Grande Voie - Description

We rst review the many subtle and less subtle biases that may aect the backtest of a trading strategy particularly when based on such data Exp ectedly the choice of keywords is crucial by using an industrygrade backtest system we verify that random ID: 29400 Download Pdf

118K - views

Predicting nancial markets with Go ogle Trends and not so random keywords Damien Challet Chaire de nance quantitative Laboratoire de mathmatiques appliques aux systmes cole Centrale Paris Grande Voie

We rst review the many subtle and less subtle biases that may aect the backtest of a trading strategy particularly when based on such data Exp ectedly the choice of keywords is crucial by using an industrygrade backtest system we verify that random

Similar presentations

Tags : rst review the
Download Pdf

Predicting nancial markets with Go ogle Trends and not so random keywords Damien Challet Chaire de nance quantitative Laboratoire de mathmatiques appliques aux systmes cole Centrale Paris Grande Voie

Download Pdf - The PPT/PDF document "Predicting nancial markets with Go ogle ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Presentation on theme: "Predicting nancial markets with Go ogle Trends and not so random keywords Damien Challet Chaire de nance quantitative Laboratoire de mathmatiques appliques aux systmes cole Centrale Paris Grande Voie"— Presentation transcript:

Page 1
Predicting nancial markets with Go ogle Trends and not so random keywords Damien Challet Chaire de nance quantitative, Laboratoire de mathématiques appliquées aux systèmes, École Centrale Paris, Grande Voie des Vignes, 92295 Châtenay-Malabry, France and Encelade Capital SA, Parc Scientique C, EPFL, 1015 Lausanne, Switzerland Ahmed Bel Hadj Ayed Encelade Capital SA, Parc Scientique C, EPFL, 1015 Lausanne, Switzerland We discuss the claims that data from Go ogle Trends contain enough data to predict future nancial index returns. We rst review the many subtle (and less

subtle) biases that may aect the backtest of a trading strategy, particularly when based on such data. Exp ectedly, the choice of keywords is crucial: by using an industry-grade backtest system, we verify that random nance-related keywords do not to contain more exploitable predictive information than random keywords related to illnesses, classic cars and arcade games. However, other keywords applied on suitable assets yield robustly protable strategies, thereby conrming the intuition of [24]. I.INTRODUCTION Taking the pulse of so ciety with unprecedented frequency and accuracy is b

ecoming p ossible thanks to data from various websites. In particular, data from Go ogle Trends (GT thereafter) rep ort historical search volume interest (SVI) of given keywords and have b een used to predict the present [7] (called nowcasting in [5]), that is, to improve estimate of quantities that are b eing created but whose gures are to b e revealed at the end of a given p erio d. They include unemployment, travel and consumer condence gures [7], quarterly company earnings (from searches ab out their salient pro duct)s [8], GDP estimates [5] and inuenza epidemics [15]. Asset prices are

determined by traders. Some traders lo ok for, share and ultimately create information on a variety on websites. Therefore asset prices should b e related to the b ehavior of website users. This syllogism has b een investigated in details in [9]: the price returns of the comp onents of the Russell 3000 index are regressed on many factors, including GT data, and these factors are averaged over all of the 3000 assets. Interestingly, the authors nd interalia a signicant correlation b etween changes in SVI and individual investors trading activity. In addition, on average, variations of SVI are

negatively correlated with price returns over a few weeks during the p erio d studied (i.e, in sample). The need to average over many sto cks is due to the amount of noise in b oth price returns and GT data, and to the fact that only a small fraction of p eople who search for a given keywords do actually trade later. [24]'s claim is much stronger: it states that future returns of the Dow Jones Industrial Average are negatively correlated with SVI surprises related to some keywords, hence that GT data contains enough data to predict nancial indices. Several subtle (and not so subtle) biases

prevent their conclusions from b eing as forceful as they could b e. Using a robust backtest system, we are able to conrm that GT data can b e used to predict future asset price returns, thereby placing their conclusions on a much more robust fo oting. II.DATAANDSTRATEGY Raw asset prices are well describ ed by suitable random walks that contain no predictability whatso ever. However, they may b e predictable if one is able to determine a set of conditions using either only asset returns (see e.g. [21] for conditions based on asset cross-correlations) or external sources of information. Go

ogle Trends provide normalized time series of numb er of searches for given keywords with a weekly time resolution[ ], denoted by . [24] prop ose the following trading strategy: dening the previous base-line search interest as , the SVI surprise is , and the p osition to take on a related asset during week + 1 is +1 sign . Nothing prevents to consider the inverse strategy, but average price reversion over the next one or two weeks with resp ect to a change of SVI was already noticed by other authors [9, 11].;

Page 2
Instead of trying to predict the Dow Jones Industrial Average index, we use the time series of SPY, which mirrors the Standard and Po ors 500 index. This provides a weak form of cross-validation, the two time series b eing highly correlated but not identical. For the same reason, we compute returns from Monday to Friday close prices instead of Monday to Monday, which keeps index returns in sync with GT data (they range from Sundays to Saturdays). III.METHODOLOGICALBIASES Prediction is hard, esp ecially ab out the future. But prediction ab out the future in the past is even

harder. This applies in particular to the backtesting of a trading strategy, that is, to the computation of its virtual gains in the past. It is prone to many kinds of biases that may signicantly alter its reliability, often p ositively [14, 20]. Most of them are due to the regrettable and p ossibly inevitable tendency of the future to creep into the past. A.Toolbias This is the most overlo oked bias. It explains in part why backtest p erformances are often very go o d in the 80s and 90s, but less impressive since ab out 2003, even when one accounts for realistic estimates of total

transaction costs. Finding predictability in old data with mo dern to ols is indeed easier than it ought to b e. Think of applying computationally cpu- or memory-intensive metho ds on pre-computer era data. The b est known law of the computa- tional p ower increase is named after Gordon Mo ore, who noticed that the optimal numb er of transistors in integrated circuits increases exp onentially with time (with a doubling time 2 years) [23]. But other imp ortant asp ects of computation have b een improving exp onentially with time, so far, such as the amount of computing p er unit of energy (Ko

omey' law, years [18]) or the price of storage (Kryder's law, 2 years [19]). Remarkably, these techno- logical advances are mirrored by the evolution of a minimal reaction timescale in nancial data [16]. In addition, the recent ability to summon and unleash almost at once deluges of massive cloud computing p ower on large data sets has changed the ways nancial data can b e analyzed. It is very hard to account for this bias. For educational purp oses, one can familiarize oneself with past computer abilities with virtual machines such as qemu [2] tuned to emulate the sp eed and memory of

computers available at a given time for a given sum of money. The same kind of bias extends to progresses of statistics and machine learning literature, and even to the way one understands market dynamics: using a particular metho d is likely to give b etter results b efore its publication than, say, one or two year later. One can stretch this argument to the historicity of the metho ds tested on nancial data at any given time b ecause they follow fashions. At any rate, this is an asp ect of backtesting that deserves a more systematic study. B.Databiases Data are biased in two ways. First,

when backtesting a strategy that dep ends on external signals, one must ask oneself rst if the signal was available at the dates that it contains. GT data was not reliably available b efore 6 August 2008, b eing up dated randomly every few months [27]. Backtests at previous dates include an inevitable part of science ction, but are still useful to calibrate strategies. The second problem is that data is revised, for several reasons. Raw nancial data often contains gross errors (erroneous or missing prices, volumes, etc.), but this is the data one would have had to use in the past.

Historical data downloaded afterwards has often b een partly cleaned. [10] give go o d advice ab out high-frequency data cleaning. Revisions are also very common for macro-economic data. For example, Gross Domestic Pro duct estimates are revised several times b efore the denitive gure is reached (ab out revision predictability, see e.g. [13]). More p erversely, data revision includes format changes: the typ e of data that GT returns was tweaked at the end of 2012. It used to b e made of real numb ers whose normalization was not completely transparent; it also gave uncertainties on these numb

ers. Quite consistently, the numb ers themselves would change within the given error bars every time one would download data for the same keyword. Nowadays, GT returns integer numb ers b etween 0 and 100, 100 b eing the maximum of the time-series and 0 its minimum; small changes of GT data are therefore hidden by the rounding pro cess; error bars are no more available, but it is fair to assume that a uctuation of should b e considered irrelevant. In passing, the pro cess of rounding nal decimals of prices sometimes intro duces spurious predictability, which is well known for FX data [17].

Revised data also concerns the investible universe. Freely available historical data do es not include deceased sto cks. This is a real problem as assets come and go at a rather steady rate: to day's set of investible assets is not the same as last week's. Accordingly, comp onents of indices also change. Analyzing the b ehavior of the comp onents of to day's
Page 3
keyword t-stat keyword t-stat keyword t-stat keyword t-tstat multiplesclerosis -2.1 ChevroletImpala -1.9 MoonBuggy -2.1 labor -1.5 musclecramps -1.9 Triumph2000 -1.9 Bubbles -2.0 housing -1.2 premenstrualsyndrome -1.8

JaguarE-type -1.7 Rampage -1.7 success -1.2 alopecia 2.2 IsoGrifo 1.7 StreetFighter 2.3 bonds 1.9 gout 2.2 AlfaRomeoSpider 1.7 CrystalCastles 2.4 Nasdaq 2.0 bonecancer 2.4 ShelbyGT500 2.4 MoonPatrol 2.7 investment 2.0 Table I. Keywords and asso ciated t-stats of the p erformance of a simple strategy using Go ogle Trends time series to predict SPY from Monday close to Friday close prices. index comp onents in the past is a common way to force feed it with future information and has therefore an ocial name: survivor(ship) bias. This is a real problem known to bias considerably measures of

average p erformance. For instance [14] shows that it causes an overestimation of backtest p erformance in 90% of the cases of long-only p ortfolios in a well chosen p erio d. This is coherent since by denition, companies that have survived have done well. Early concerns were ab out the p erformance of mutual funds, and various metho ds have b een devised to estimate the strength of this bias given the survival fraction of funds [3, 12] Finally, one must mention that backtesting strategies on untradable indices, such as the Nasdaq Comp osite Index, is not a wise idea since no one could even

try to remove predictability from them. C.Choiceofkeywords What keywords to cho ose is of course a crucial ingredient when using GT for prediction. It seems natural to think that keywords related to nance are more likely to b e related to nancial indices, hence, to b e more predictive. Accordingly, [24] build a keyword list from the Financial Times, a nancial journal, aiming at biasing the keyword set. But this bias needs to b e controlled with a set of random keywords unrelated to nance, which was neglected. Imagine indeed that some word related to nance was the most relevant in the

in-sample window. Our brain is hardwired to nd a story that justies this apparent go o d p erformance. Statistics is not: to test that the average p erformance of a trading strategy is dierent from zero, one uses a T test, whose result will b e called t-stat in the following, and is dened as where stands for the average of strategy returns, their standard deviation and is the numb er of returns; for N > 20 lo oks very much like a Gaussian variable with zero average and unit variance. [24] wisely compute t-stats: the b est keyword, debt , has a t-stat of 2.3. The second b est keyword is

color and has a t-stat of 2.2. Both gures are statistically indistinguishable, but debt is commented up on in the pap er and in the press ;color is not, despite having equivalent predictive p ower. Let us now play with random keywords that were known b efore the start of the backtest p erio d (2004). We collected GT data for 200 common medical conditions/ailments/illnesses, 100 classic cars and 100 all-time b est arcade games (rep orted in app endix A) and applied the strategy describ ed ab ove with = 10 instead of = 5 . Table I rep orts the t-stats of the b est 3 p ositive and negative p

erformance (which can b e made p ositive by inverting the prescription of the strategy) for each set of keywords. We leave the reader p ondering ab out what (s)he would have concluded if bonecancer or MoonPatrol b e more nance-related. This table also illustrates that the b est t-stats rep orted in [24] are not signicantly dierent from what one would obtains by chance: the t-stats rep orted here b eing a mostly equivalent to Gaussian variables, one exp ects 5% of their absolute values to b e larger that 1.95, which explains why keywords such color as have also a go o d t-stat. Finally, debt

is not among the three b est keywords when applied to SPY from Monday to Friday: its p erformance is unremarkable and unstable, as shown in more details b elow. Nevertheless, their rep orted t-stats of nancial-related terms is biased towards p ositive values, which is compatible with the reversal observed in [9, 11], and with results of Table 1. This may show that the prop osed strategy is able to extract some amount of the p ossibly weak information contained in GT data. D.Codingerrors An other explanation for this bias could have b een co ding errors (it is not). Time series prediction is

easy when one mistakenly uses future data as current data in a program, e.g. by shifting incorrectly time series; we give the used co de in app endix. A very simple and eective way of avoiding this problem is to replace all alternatively price
Page 4
returns and external data (GT here) by random time series. If backtests p ersist in giving p ositive p erformance, there are bugs somewhere. E.Noout-of-sample The aim of [24] was probably not to provide us with a protable trading strategy, but to attempt to illustrate the relationship b etween collective searches and future nancial

returns. It is however striking that no in- and out-sample p erio ds are considered (this is surprisingly but decreasingly common in the literature). We therefore cannot assess the trading p erformance of the prop osed strategy, which can only b e judged by its robustness and consistency out-of- sample, or, equivalently, of b oth the information content and viability of the strategy. We refer the reader to [20] for an entertaining account of the imp ortance of in- and out-of-sample p erio ds. F.Keywordsfromthefuture [24] use keywords that have b een taken from the editions of the FT dated from

August 2004 to June 2011, determined ex p ost. This means that keywords from 2011 editions are used to backtest returns in e.g. 2004. Therefore, the set of keywords injects information ab out the future into the past. A more robust solution would have b een to use editions of the FT available at or b efore the time at which the p erformance evaluation to ok place. This is why we considered sets of keywords known b efore 2004. G.Parametertuning/datasnooping Each set of parameters, which include keywords, denes one or more trading strategies. Trying to optimize param- eters or keywords is

called data sno oping and is b ound to lead to unsatisfactory out of sample p erformance. When backtest results are presented, it is often imp ossible for the reader to know if the results suer from data sno oping. A simple remedy is not to touch a fraction of historical data when testing strategies and then using it to assess the consistence of p erformance (cross-validation) [14]. More sophisticated remedies include White's reality check [26] (see e.g. [25] for an application of this metho d). Data sno oping is equivalent as having no out-of-sample, even when backtests are prop erly done

with sliding in- and out-of-sample p erio ds. Let us p erform some in-sample parameter tuning. The strategy prop osed has only one parameter once the nancial asset has b een chosen, the numb er of time-steps over which the moving average is p erformed. Figure 1 rep orts the t-tstat of the p erformance asso ciated with keyword debt as a function of . Its sign is relatively robust against changes over the range of ··· 30 but its typical value in this interval is not particularly exceptional (b etween 1 and 2). Let us take now the absolute b est keyword from the four sets, MoonPatrol . Both the

values and stability range of its t-stat are way b etter than those of debt (see Figure 2), but this is most likely due to pure chance. There is therefore no reason to trust more one keyword than the other. H.Notransactionfees Assuming an average cost of 2bps (0.02%) p er trade, 104 trades p er year and 8 years of trading (2004-2011), transaction fees diminish the p erformance asso ciated to any keyword by ab out 20%. As a b enecial side eect, p erio ds of at fees-less p erformance suddenly b ecome negative p erformance p erio ds when transaction costs are accounted for, which provides more

realistic exp ectations. Cost related to spread and price impact should also included in a prop er backtest. IV.THEPREDICTIVEPOWEROFGOOGLETRENDS Given the many metho dological weaknesses listed ab ove, one may come to doubt the conclusions of [24]. We show here that they are correct. The rst step is to avoid metho dological problems listed ab ove. One of us has used an industrial-grade backtest system and more sophisticated strategies (which therefore cause to ol bias). First, let us compare the resulting cumulated p erformance of the three random keyword sets that we dened, plus the set of

keywords from the Financial Times. For each sets of keywords, we cho ose as inputs the raw SVI, lagged SVI, and various moving averages of SVI, together with past index returns. It turns out that none of the keyword sets
Page 5
20 40 60 80 100 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 tstat 2004 2006 2008 2010 2012 0.5 1.0 1.5 2.0 2.5 3.0 3.5 cumulated performance debt k=3 k=5 k=10 k=20 k=30 k=50 k=100 Figure 1. Left plot: t-stat as a function of the length of the moving average . Right plot: cumulated p erformance for various values of . Transaction costs set to 2bps p er transaction. 50

100 150 200 −1 t−stat debt Moon Patrol Figure 2. T-stats of the p erformance asso ciated with keywords debt and MoonPatrol versus the length of the moving average . Transaction costs set to 2bps p er transaction. brings information able to predict signicantly index movements (see Fig. 4). This is not incompatible with results of [9, 11, 24]. It simply means that the signal is probably to o weak to b e exploitable in practice. The nal part of the p erformances is of course app ealing, but this come from the fact that Monday close to Friday close SPY returns have b een mostly p

ositive during this p erio d: any machine learning algorithm applied on returns alone would likely yield the same result. So far we can only conclude that a given prop er (and not overly stringent) backtest system was not able to nd any exploitable information from the four keyword sets, not that the keyword sets do not contain enough predictive information. To conclude, we use the same backtest system using some GT data with exactly the same parameters and input typ es as b efore. The resulting preliminary p erformance, rep orted in Fig. 4, is more promising and shows that there really is

consistently some predictive information in GT data. It is not particularly impressive when compared to the p erformance of SPY itself, but is nevertheless interesting since the net exp osure is always close to zero (see [6] for more information).
Page 6
2004 2006 2008 2010 2012 1.0 1.5 2.0 2.5 cumulated performance debt without tc with tc Figure 3. Cumulative p erformance asso ciated with keyword debt for =3 with and without transaction costs, set to 2bps. 2006 2008 2010 2012 0.90 0.95 1.00 1.05 cumulated performance cars illness games Fin. Times 2008 2009 2010 2011 2012 2013 1.0

1.1 1.2 1.3 cumulated performance Figure 4. Left plot: cumulated p erformance asso ciated with each of the four keyword sets from 2005-12-23 to 2013-06-14. Right plot: cumulated p erformance of suitable keywords applied on suitable assets. Transaction costs set at 2bps p er trade. V.DISCUSSION Sophisticated metho ds coupled with careful backtest are needed to show that Go ogle Trends contains enough exploitable information. This is b ecause such data include to o many searches probably unrelated to the nancial assets for a given keyword, and even more unrelated to actual trading. When one

restricts the searches by providing more keywords, GT data often only contain information at a monthly time scale, or no information at all. If one go es back to the algorithm prop osed by [24] and the compatible ndings of [9, 11], it is hard to understand why future prices should systematically revert after a p ositive SVI surprise and vice-versa one week later. The reversal is weak and only valid on average. It may b e the most frequent outcome, but protability is much higher if one knows what triggers reversal or trend following. There is some evidence that supplementing GT data with news

leads to much improved trading p erformance (see e.g. [4]). Another pap er by the same group suggests a much more promising source of information: it links the changes in the numb er of visits on Wikip edia pages of given companies to future index returns [22]. Further work will investigate the predictive p ower of this typ e of data.
Page 7
We acknowledge stimulating discussions with Frédéric Ab ergel, Marouanne Anane and Thierry Bo chud. [1] When requesting data restricted to a given quarter, GT returns daily data. [2] Fabrice Bellard. Qemu, a fast and p ortable dynamic translator.

USENIX, 2005. URL [3] Stephen J Brown, William Go etzmann, Roger G Ibb otson, and Stephen A Ross. Survivorship bias in p erformance studies. Review of Financial Studies , 5(4):553580, 1992. [4] Ro chester Cahan. Quant 3.0  harnessing the mo o d of the web, 2012. URL DBQQWAFAFEW201212.pdf . [Online; accessed 16-July-2013]. [5] Jennifer L Castle, Nicholas WP Fawcett, and David F Hendry. Nowcasting is not just contemp oraneous forecasting. National Institute Economic Review , 210(1):7189, 2009. [6] D. Challet. 2013. Encelade Capital Internal Rep

ort. Final version available on request in Septemb er. [7] Hyunyoung Choi and Hal Varian. Predicting the present with Go ogle Trends. Economic Record , 88(s1):29, 2012. [8] Zhi Da, Joseph Engelb erg, and Peng jie Gao. In search of earnings predictability. Technical rep ort, Working Pap er, 2010. [9] Zhi Da, Joseph Engelb erg, and Peng jie Gao. In search of attention. The Journal of Finance , 66(5):14611499, 2011. [10] Michel M. Dacorogna, Ramazan Gencay, Ulrich A. Müller, Richard B. Olsen, and Olivier V. Pictet. An Introduction to High-Frequency Finance . Academic Press, London, 2001. [11]

Michal Dzielinski. Measuring economic uncertainty and its impact on the sto ck market. Finance Research Letters , 9(3): 167175, 2012. [12] Edwin J Elton, Martin Jay Grub er, and Christopher R Blake. Survivor bias and mutual fund p erformance. Review of Financial Studies , 9(4):10971120, 1996. [13] Jon Faust, John H Rogers, and Jonathan H Wright. News and noise in G-7 GDP announcements. Journal of Money, Credit and Banking , pages 403419, 2005. [14] John D Freeman. Behind the smoke and mirrors: Gauging the integrity of investment simulations. Financial Analysts Journal , pages 2631, 1992.

[15] Jeremy Ginsb erg, Matthew H Mohebbi, Ra jan S Patel, Lynnette Brammer, Mark S Smolinski, and Larry Brilliant. Detecting inuenza epidemics using search engine query data. Nature , 457(7232):10121014, 2008. [16] Stephen J Hardiman, Nicolas Bercot, and Jean-Philipp e Bouchaud. Critical reexivity in nancial markets: a Hawkes pro cess analysis. arXiv preprint arXiv:1302.1405 , 2013. [17] N. F. Johnson. private communication. [18] Jonathan G Ko omey, Stephen Berard, Marla Sanchez, and Henry Wong. Implications of historical trends in the electrical eciency of computing. Annals of the

History of Computing, IEEE , 33(3):4654, 2011. [19] Mark H Kryder and Chang So o Kim. After hard drives  what comes next? Magnetics, IEEE Transactions on , 45(10): 34063413, 2009. [20] David J Leinweb er. Stupid data miner tricks: overtting the S&P 500. The Journal of Investing , 16(1):1522, 2007. [21] M. Marsili. Dissecting nancial markets: sectors and states. Quant. Fin. , 2, 2002. [22] Helen Susannah Moat, Chester Curme, Adam Avakian, Dror Y Kenett, H Eugene Stanley, and Tobias Preis. Quantifying Wikip edia usage patterns b efore sto ck market moves. Scientic reports , 3, 2013. [23]

Gordon E Mo ore et al. Cramming more comp onents onto integrated circuits. Electronics , (39):14, 1965. [24] Tobias Preis, Helen Susannah Moat, and H Eugene Stanley. Quantifying trading b ehavior in nancial markets using Go ogle Trends. Scientic reports , 3, 2013. [25] Ryan Sullivan, Allan Timmermann, and Halb ert White. Data-sno oping, technical trading rule p erformance, and the b o otstrap. The Journal of Finance , 54(5):16471691, 1999. [26] Halb ert White. A reality check for data sno oping. Econometrica , 68(5):10971126, 2000. [27] Wikip edia. Go ogle trends  Wikip edia, The Free

Encyclop edia, 2013. URL Trends . [Online; accessed 4-June-2013]. AppendixA:Keywords We have downloaded GT data for the following keywords, without any manual editing. 1.Illnesses Source:, accessed on 27 May 2013 AIDS,Acne,Acutebronchitis,Allergy,Alopecia,Altitudesickness,Alzheimer'sdisease,Andropause,Anorexianervosa, Antisocialpersonalitydisorder,Arthritis,Aspergersyndrome,Asthma,Attentiondeficithyperactivitydisorder,Autism,

Avoidantpersonalitydisorder,Backpain,BadBreath,Bedwetting,Benignprostatichyperplasia,Bipolardisorder,Bladder cancer,Bleeding,Bodydysmorphicdisorder,Bonecancer,Borderlinepersonalitydisorder,Bovinespongiformencephalopathy,
Page 8
BrainCancer,Braintumor,Breastcancer,Burns,Bursitis,Cancer,CankerSores,Carpaltunnelsyndrome,Cervicalcancer, Cholesterol,ChronicChildhoodArthritis,ChronicObstructivePulmonaryDisease,Coeliacdisease,Colorectalcancer,Conjunctivitis, Cradlecap,Crohn'sdisease,Dandruff,Deepveinthrombosis,Dehydration,Dependentpersonalitydisorder,Depression,Diabetes

mellitus,Diabetesmellitustype1,Diaperrash,Diarrhea,Disabilities,Dissociativeidentitydisorder,Diverticulitis, Downsyndrome,Drugabuse,Dysfunctionaluterinebleeding,Dyslexia,EarInfections,EarProblems,EatingDisorders,Eczema, Edwardssyndrome,Endometriosis,Epilepsy,Erectiledysfunction,EyeProblems,Fibromyalgia,Flu,Fracture,Freckle,Gallbladder Diseases,Gallstone,Gastroesophagealrefluxdisease,GeneralizedAnxietyDisorder,Genitalwart,Glomerulonephritis,Gonorrhoea, Gout,GumDiseases,Gynecomastia,HIV,HeadLice,Headache,Hearingimpairment,HeartDisease,Heartfailure,Heartburn,

HeatStroke,HeelPain,Hemorrhoid,Hepatitis,HerniatedDiscs,Herpessimplex,Hiatushernia,Histrionicpersonalitydisorder, Hyperglycemia,Hyperkalemia,Hypertension,Hyperthyroidism,Hypothyroidism,InfectiousDiseases,Infectiousmononucleosis, Infertility,Influenza,Irondeficiencyanemia,IrritableMaleSyndrome,Irritablebowelsyndrome,Itching,JointPain, JuvenileDiabetes,KidneyDisease,Kidneystone,Leukemia,Livertumour,Lungcancer,Malaria,Melena,MemoryLoss,Menopause, Mesothelioma,Migraine,Miscarriage,MucusInStool,Multiplesclerosis,MuscleCramps,MuscleFatigue,MusclePain,Myocardial

infarction,NailBiting,Narcissisticpersonalitydisorder,NeckPain,Obesity,Obsessive-compulsivedisorder,Osteoarthritis, Osteomyelitis,Osteoporosis,Ovariancancer,Pain,Panicattack,Paranoidpersonalitydisorder,Parkinson'sdisease,Penis Enlargement,Pepticulcer,Peripheralarteryocclusivedisease,Personalitydisorder,Pervasivedevelopmentaldisorder, Peyronie'sdisease,Phobia,Pneumonia,Poliomyelitis,Polycysticovarysyndrome,Post-nasaldrip,Post-traumaticstress disorder,Prematurebirth,Premenstrualsyndrome,Propecia,Prostatecancer,Psoriasis,Reactiveattachmentdisorder,Renal

failure,Restlesslegssyndrome,Rheumaticfever,Rheumatoidarthritis,Rosacea,RotatorCuff,Scabies,Scars,Schizoid personalitydisorder,Schizophrenia,Sciatica,Severeacuterespiratorysyndrome,Sexuallytransmitteddisease,Sinusitis, SkinEruptions,Skincancer,Sleepdisorder,Smallpox,Snoring,Socialanxietydisorder,Staphinfection,Stomachcancer, Strepthroat,Suddeninfantdeathsyndrome,Sunburn,Syphilis,Systemiclupuserythematosus,Tenniselbow,Termination OfPregnancy,Testicularcancer,Tinea,ToothDecay,Traumaticbraininjury,Tuberculosis,Ulcers,Urinarytractinfection, Urticaria,Varicoseveins. 2.Classiccars

Source: est-1960_s-cars, accessed on 27 May 2013 1960AstonMartinDB4Zagato,1960Ford,1961Ferrari250SWB,1961Ferrari250GTCalifornia,1963Corvette,1963IsoGriffo A3L,1964Ferrari250GTL(Lusso),1965Bizzarrini5300Strada,1965FordGT40,1965MaseratiMistral,1965ShelbyCobra, 1966Ferrari365P,1966MaseratiGhibli,1967AlfaRomeoStradale,1967Ferrari275GTB/4,1967ShelbyMustangKR500,1968 ChevroletCorvetteL88,1968DeTomasoMangusta,1969PontiacTransAm,1969YenkoChevelle,57Chevy,68Ferrari365GTB/4Daytona

Spyder,69YenkoCamaroZ28,ACCobra,AlfaRomeoSpider,AstonMartinDB5,AustinMiniSaloon1959,BMWE9,BuickRiviera, BuickWildcat,Cane,ChevroletCamaro,ChevroletChevelle,ChevroletImpala,ChevyChevelle,ChryslerValiant,Corvette Stingray,DodgeChallenger,DodgeCharger,DodgeDartSwinger,FacelVegaFacelII,Ferrari250,Ferrari250GTO,Ferrari 250GTO,Ferrari275,FerrariDaytona,Fiat500,FordCorsair,FordCortina,FordGT40,FordMustang,FordRanchero,Ford Thunderbird,FordTorino,FordZephyrMKIII,IsoGrifo,JaguarE-type,JeepCJ,LamborghiniMiura,LamborghiniMiuraSV,

LincolnContinental,LotusElan,MaseratiGhibli,MercedesBenz220SE,Mercedes-Benz300SL,MercuryCougar,PlymouthBarracuda, PontiacGTO,Porsche356,Porsche911,Porsche911,Porsche911classic,RamblerClassic,Rover2000,ShelbyDaytonaCoupe, ShelbyGT350,ShelbyGT500,StudebakerAvanti,SunbeamTiger,Toyota2000GT,Triumph2000,VauxhallVelox1960,Vauxhall Victor1963,Wolseley15/60 3.ArcadeGames Source:, accessed on 27 May 2013 1942,1943,720

,AfterBurner,Airwolf,AlteredBeast,Arkanoid,Asteroids,BadDudesVs.DragonNinja,Bagman,Battlezone, Beamrider,Berzerk,BionicCommando,BombJack,Breakout,BubbleBobble,Bubbles,BurgerTime,Centipede,CircusCharlie, Commando,CrystalCastles,Cyberball,Dangar-UfoRobo,Defender,DigDug,DonkeyKong,DonkeyKong3,DonkeyKongJunior, DoubleDragon,Dragon'sLair,E.T.(Atari2600),ElevatorAction,FinalFight,Flashback,FoodFight,Frogger,FrontLine, Galaga,Galaxian,Gauntlet,GeometryWars,Gorf,Gorf,Gyruss,Hogan'sAlley,IkariWarriors,Joust,Kangaroo,KarateChamp,

KidIcarus,LodeRunner,LunarLander,ManicMiner,Mappy,MarbleMadness,MarioBros.,Millipede,Miner2049er,Missile Command,MoonBuggy,MoonPatrol,Ms.Pac-Man,NaughtyBoy,Pac-Man,Paperboy,Pengo,Pitfall!,PolePosition,Pong,Popeye, Punch-Out!!,Q*bert,Rampage,RedBaron,Robotron:2084,Rygar:TheLegendaryAdventure,SewerSam,SnowBros,Space Invaders,SpyHunter,StarWars,Stargate,StreetFighter,SuperPac-Man,Tempest,Tetris,TheAdventuresofRobbyRoto!, TheSimpsons,TimePilot,ToeJam&Earl,Toki,Track&Field,Tron,WizardOfWor,Xevious AppendixB:Sourcecode Here is a simple implementation in R of the strategy given in [24]. We do mean 

 instead of  <- . c o m p u t e P e r f S t a t s = f u n c t i o n ( f i l e n a m e , k = 1 0 , g e t P e r f =FALSE ) { g t d a t a =l o a d G T d a t a ( f i l e n a m e ) i f ( i s . n u l l ( g t d a t a ) | | l e n g t h ( g t d a t a ) < 1 0 0 ) {
Page 9
r e t u r n ( NULL ) s p y=l o a d Y a h o o D a t a ( ' SPY ' ) s p y _ r e t s = g e t F u t u r e R e t u r n s ( s p y ) #s p y _ r e t s i s a z o o o b j e c t , c o n t a i n s r_ { t +1} g t d a t a _ m e a n= r o l l m e a n r ( g t d a t a , k ) # \ b a r v _ t g t d a t a _ m e a n _ l a g g e d= l a g ( g t d a

t a _ m e a n , 1 ) # \ b a r v_ { t 1} p o s = 2 ( g t d a t a >g t d a t a _ m e a n _ l a g g e d ) p e r f = p o s s p y _ r e t s p e r f = p e r f [ w h i c h ( ! i s . n a ( p e r f ) ) ] i f ( g e t P e r f ) { r e t u r n ( p e r f ) } e l s e { r e t u r n ( t . t e s t ( p e r f ) $ s t a t i s t i c )