How I Develop Trading Strategies | Permutation Tests and Trading Strategy Development with Python

2.6k views4508 WordsCopy TextShare

neurotrader

This is how I develop trading strategies. Code: https://github.com/neurotrader888/mcpt Strategy D...

Video Transcript:

in this video I'll showare the thought process and four steps I always use When developing a new trading strategy the approach is generic and compatible with almost every strategy I've shown bits of this process in my past videos but I wanted to make a standalone dedicated video I can reference back to in the future the four steps are insample excellence in Sample Monte Carlo permutation test walk forward test and walk forward Monte Carlo permutation test I'll show an example going through all four steps and generalize the concepts so you can try with your own strategies first I'll show how I assess a trading strategy use a moving average crossover as an example we load in Candlestick data and compute a fast and slow moving average at each bar we check if the fast moving average is above the slow moving average we create a signal that denotes the position of the strategy at each bar one means we have a long position following that bar and zero means we have no position regardless of what the strategy is one should be able to create a similar signal a value denoting the position of the strategy after each bar now if we compute Clos to close returns and shift them forward by one bar we can multiply the positions signal by the shifted returns to get a return for each bar that is attributable to the strategy these strategy returns at the same granularity of the bars are what I use to compute objective functions such as the profit factor or the sharp ratio by having a return for each bar instead of each trade objective functions are ped much more data and the calculations and results are much more stable the book testing and tuning market trading systems provides convincing Arguments for why these higher granularity returns are superior it's where I got the idea and I'm thoroughly convinced both from what the book says and and my own experience I'll scrap the moving average crossover because it's lame and instead I'll use the slightly more interesting Don and channel breakout strategy I showed in my previous video briefly the strategy goes long when the current close is the highest over a given look back and it goes short when the current close is the lowest over a given look back the idea of this is to trade in the direction of range breakouts so we can ride the trend when extended Trends occur the code for the Donan breakout is short and the output is a signal Vector it will have a value on each bar denoting the strategy's position one or ne1 the strategy needs a look back so we can test several and pick the best here's a grid search to do that we look through a wide variety of lookback values and find the one with the best profit factor I ran the optimization on hourly Bitcoin data from 2016 through 2019 over those four years the best look back was 19 with a profit factor of 1. 08 I first like to look at the ins sample performance since these are log Returns the cumulative sum of the strategy returns will give us a crude back test cool so what are we actually doing here well we're optimizing a mediocre Trend follower but more generally we have an idea for a trading strategy a set of development or ins sample data and a way to fit optimize or select the best version of the strategy for the data these are the generic components for essentially every trading strategy whether we're optimizing a look back for a trend follower selecting the best chart patterns or training a fancy neural network this is how trading strategies are optimized so with that in mind let's look at our insample performance again currently we're in the development stage here I ask myself two questions is this excellent and is it obviously overfit these are ins sample results so they should be pretty damn good maybe you could come up with some threshold for an objective function to decide if it is excellent but really I think it depends on the nature of the strategy I'm inly look if the strategy has periods of inconsistency I would drill down and really look at what is happening when the strategy is working poorly versus working well maybe there are ways one could improve this strategy maybe some YouTuber made a video about improving the very strategy you're looking at but this is the development stage this is the time to study your strategy test potential improvements and refine your optimization process if this was a strategy I was working on I'd study it and try to improve it further I would not say this is excellent I expect a bit more from insample performance than this but to keep the video going I'll say it looks good enough here now the other question is this obviously overfit this one is a bit harder to answer I suppose one gets a feel for it eventually but if your results are suspiciously good like a 100% win rate you're probably over fitting or maybe you just accidentally allowed a future leak which obviously needs to be fixed but if you suspect overfitting you may want to dial back the complexity of the strategy ultimately the answer to this question in the development stage should either be yes or not obviously I'll show an obviously overfit strategy later once we're satisfied with our ample results the question becomes was this excellent ins sample performance found due to patterns intrinsic in the data or was the good and Sample performance found just because our optimization process was powerful enough to find something in noise in other words is data mining bias the main contributor to our excellence in Sample performance the problem with optimization is it works if we compare multiple configurations of a strategy one will be the best but it will always have a data mining or selection bias of course if we optimize even a great strategy there will be some data mining bias but a good strategy's insample performance will mostly be from patterns in the data but if our strategy is trash then its insample performance will be entirely due to data mining bias our null hypothesis is that our strategy is garbage we will use the ins sample Monte Carlo permutation test to disprove our null hypothesis so how does this test work we optimized the Donan breakout on these four years of price data and our optimized strategy has a profit factor of 1. 08 but if we create a random permutation of this data we get something like this and this random permutation any legitimate patterns that existed in the real data are no longer present the permutation is just noise with nearly identical statistical properties if we optimize the Donan breakout on this permeated data we get a profit factor of 1.

02 the optimized strategy did better on real data than it did on permuted data this gives a small amount of evidence that our null hypothesis is false because if the optimized strategy did just as good or better on random data we could presume the main contributor to the good and Sample performance we saw was from data mining bias but this was only one permutation if we created many more and on each permutation we optimized the strategy we could get an idea of how powerful the data mining bias induced by our optimization is if the optimized strategy profit factor of 1. 08 that we found on real data is better than what we found on the vast majority of permutations we can disprove our null hypothesis I don't think there's any value in looking at the equity curves of permutations as we can simply compute the objective function on each of these but I think this helps visualize what we're actually doing and it looks cool now I'll show the algorithm I'm using to generate these permutations of price this function has a few parameters the first is the data it takes either a data frame of open high low close prices or a list of such data frames the option for the list of data frames is for permuting multiple markets I'll talk about this option later start index is where the permutation starts when set to the default zero the function will permute all the data given we will use this parameter later when we talk about the walk forward permutation test since we pass data is either a list of data frames or a single data frame we handle that first if it's a single data frame I put it into a list and set end markets to one if we have multiple markets we ensure that their indexes are identical then we allocate space to store relative prices for each market and bar and the first bar of each market the first bar will be unchanged it has a size of four to handle the Open high low and close price now we compute prices on each bar relative to that Bar's open we Loop through each of the markets and get the logarithmic prices I copy the first bar at the start index here the open is subtracted from the high low and close prices since we're dealing with log prices we're essentially recording the percentage off the open of each of these prices the relative open is the current open minus the prior close the gap then we copy over these prices into the arrays we made earlier now we get the indices of the real data we'll use these to shuffle the relative prices we Shuffle the indices once for the intrabar quantities and again for the gaps the gaps have little effect on crypto data the crypto Market never actually closes so the open of one bar will usually only be at most a few ticks away from the prior bars close but for daily stock data the open can be quite far away from the prior close after shuffling we can now string together our permutation we Loop through each market and allocate space to store the permuted bars we get the log prices of the real data we copy it into the permuted data before the start index if the start index is set to zero nothing happens here then we copy the start bar the first bar of the permutation we loop from the start bar to the end of the data we first set the open price the zero is the index of the open and the three is the index of the close to to get the permuted bars open we add the relative open value to the prior bars close then we add the relative high low and close to that open value to get the rest of the permuted bars prices after the loop we exponentiate the prices to get them to the normal scale and add the bars to a data frame the function will return either a single data frame or a list of data frames the same as what was passed to the function now we can pass in a data frame of real data and we're returned to a data frame of permuted data the first open and last close are exactly the same on both the real and permuted data so the overall trend of the data is preserved but the path the price takes between those two prices is completely different the goal of a permutation algorithm is to create bars that have the same statistical properties as the original if we compute some Clos to close returns we can see that the mean standard deviation skew and cryosis are all nearly identical now let's load in ethereum data for the same time period here's a plot of Bitcoin and ethereum in 2018 and 2019 we can see that they're obviously correlated and if we per them together the correlation between the two markets stays the same I won't cover the multimarket case here Beyond this but if your strategy involves two or more markets the permutation tests can still be applied while the algorithm produces permutations with many similar statistical properties to the original it is not without its flaws as price is not a random walk real prices have volatility clustering and long memory both of which could be a topic for a different time but the permutation algorithm will destroy both of these properties if your strategy is heavily focused on one of these properties or some other property that the permutation algorithm doesn't preserve the Monte Carlo permutation tests can be optimistically biased but this really isn't a horrid problem as if your strategy cannot pass the permutation tests even with a potential optimistic bias then you know your strategy is probably overfitting now that we've gone over the bar permutation algorithm we can return to where we were we optimized the Donan breakout on hourly Bitcoin data from 2016 through 2019 and the best look back gave a profit factor of 1. 0 08 when we optimized the strategy on many different price permutations we found that the results were worse than what we got on real prices we've essentially already done the in Sample permutation test but now I'll show you the code and how to apply it first we load our data and get the four years we're using to train we call the optimized Donan function and this gives us our real profit Factor now we can do the permutation test we set the number of permutations then Loop that many times we get a permutation of the bars and optimize our strategy on them to get a permuted profit Factor if the permutation profit factor is just as good or better than the real profit Factor we increment the permutation was better count after the loop we can calculate a quas I value the number of times the permutation was better divided by the total number of permutations this value is roughly the probability that our real profit Factor was found mainly due to data mining bias this next part isn't really necessary but I like to plot a histogram of the profit factors or whatever objective function we used from the permutations then add a line showing where in the the distribution the real profit Factor fell I ran this test with 1,000 permutations and got this only a couple permutations did better than the original so the P value is very low 0.

3% if a sufficient amount of permutations are done the permutation distribution should be roughly bell shaped if the distribution looks really weird there's probably an issue with your code I like to see the P value below 1% so I would call this a pass now I'll quickly show you a strategy that is overfit this function fits a decision tree we compute three indicators just basic price differences then create a classification Target whether the next 24 hours go up or down then we create a decision Tree Train it with our indicators and Target and return the model notice that I've set the minimum samples per Leaf very low this is one of the key regularization parameters and it's pretty much guaranteed to overfit to test the model we can use this function we compute the same indicators and use them to predict the model then we can use the model's predictions to create a position Vector we'll go long when the tree predicts the price will go up and we'll go short when the tree predicts the price will go down then finally we compute the profer factor of that signal here are the ins sample results for our decision tree again this is when I ask myself is this obviously overfit the answer is yes generally speaking if your back test ever looks like this you have a future leak or you're horribly overfit but if we didn't know any better we can use the ins sample permutation test to crush our dreams and the test does the job the model performs just as good or better on the permutations when you see this it's time to throw your strategy idea in the trash ideally you should use the test with as many permutations as possible I think 1,000 is a reasonable minimum this of course means we have to optimize our strategy 1,000 times and it will probably take some time if optimizing your strategy 1,000 times is simply not feasible you probably have a very complex strategy or a very poorly coded strategy in which case I suppose 100 would be sufficient but I would say that's a hard minimum this test provides a quas IP value that roughly indicates the probability that your end sample results were primarily found from data mining bias I generally don't continue if it is over 1% but don't treat that like a Target this is a measure if a measure becomes a Target it is no longer a good measure basically if you fiddle with your strategy enough you could probably make this test pass on anything so don't overuse it we've now seen the insample permutation test pass the Donan Channel breakout and reject the decision tree nonsense but why even do this couldn't we just try the strategy on 20 20 data if it worked on data that wasn't used for the optimization then the strategy is probably not overfit well sure we could but once out of sample data is used even once it is no longer truly out of sample suppose we optimize strategy a on 2016 through 2019 then test it on 2020 data and we find our out of sample results to be decent but then we come up with another idea strategy B we optimize strategy B just the same on 2016 through 2019 then we also test it on 2020 we find that strategy B did better than strategy a so one might think the new idea was better but now there is a selection bias strategy B did better compared to strategy a the results of B on 2020 data are inflated by selection bias now realistically I've already tested many things on 2020 data and it definitely isn't out of sample for me rather it is a validation set it is a good idea to walk forward and optimize strategy testing it on data it did not use to optimize that is how the strategy will have to trade in reality after all all and if we test a strategy on data it did not use to optimize the results will not benefit from any data mining bias however if we walk forward test 100 different strategies on 2020 data and select the best one there will be a massive selection bias selection bias can allow us to effectively overfit the validation data despite it not being used for strategy optimization every time we reuse out of sample data or rather validation data the selection bias is adding up this is why we use the ample permutation test we can detect that our idea is bad before we waste the out of sample data or stack up even more selection bias on the validation data anyways with all that in mind our optimization of the Donan breakouts look back passed the insample permutation test so now let's walk forward the Donan breakout this function will return the walk forward signal one of the parameters is the train look back how much data to optimize on I have it set to optimize on the last four years by default assuming hourly data is used the train step is how often we reoptimize I set it to 30 days ideally you should retrain strategies as often as is feasible but to make the code run fast enough to accommodate my brain rot I used 30 days we set the index of the next optimization and loop through all the data every time the index is equal to the next train variable we reoptimize compute the new signal and increment the optimization index by the train step this is pretty inefficient code but it's simple and it works here are the results of the Walk forward signal on 2020 data it had a profit factor of 1. 04 which is worse than what we saw in Sample generally that is to be expected as these results do not benefit from any data mining bias the only bias in play here is the potential selection bias if we had already walked forward tested other strategies using this data at this stage I ask myself is this worth trading the answer is subjective it depends on your standards maybe you have higher or lower standards than me for me I wouldn't bother with this it kind of sucks but the line did go up and to keep the video going I'll say this is good enough to get these results we optimized the Donan breakout on these four years of data these four years are the first training fold of the Walk forward after the first training fold the walk forward function can output a signal that we can test since our walk forward results were satisfactory we're assuming that whatever patterns the strategy learned or optimized on from past data were also present in this future unseen data but what if our optimized strategy is actually worthless what is the chance a worthless strategy could have achieved walk forward results just as good as what we found if we generate a permutation of the data after the first training fold any legitimate patterns in this data will no longer be present there are no legitimate patterns in this permutation if we walk forward the same strategy on this permutation and compute its profit Factor we get an estimate of a profit factor that a worthless strategy could produce and if we generate many permutations we get a distribution of what worthless strategies can produce if our real walk forward results are to be attributed to patterns learned from past data reoccurring in this future data then then our real walk forward profit Factor should be better than the vast majority of profit factors produced by worthless strategies this is the walk forward permutation test you will notice the code is very similar to the ins sample permutation test we load in our data and set the train window to 4 years then we compute the walk forward signal with the signal we can compute our real walk forward profit Factor then we set the number of permutations and loop through them we call the same get permutation function but we set the start index to the train window to only permute dat after the first training fold we compute the profit factor of the Walk forward signal in the same way then we compare the profit Factor found on the permutation to the profit Factor we found on real data we compute our quas IP value and make a histogram of the permutation profit factors I ran the test with 200 permutations and got this the P value is 22% roughly meaning there's a 22% chance the walk forward profit factor of 1.