Sunday, January 8, 2017

On the relevance of the probability distribution.

Some time ago, I had already presented some results on simple games. Now, a friend of mine has started to trade and I want to help him understand something.

Imagine you have a game where you bet 100 and can get either 110 or 90 with 50% probability. 

Suppose you play this game 200 times and in each game you bet a fraction f (0<f<1) of your whole equity. What happens to your equity after 200 games?

The simple answer is: you get the initial equity because you win 10 or lose 10 with equal probability. But a more detailed answer is given in the next figure, which shows a simulation of 100k experiments:

As there are probabilities involved, we only can get statistical results. The green line depicts the mean value of the equity, which is 100 irrespective on the fraction that you bet. But the shades of blue also indicate the distribution! They are chosen so that the light blue area represents 90% of the population and so on, as given in the legend. The blue line represents the median: the separation between the top 50% and the bottom 50%. (The frontiers between shades of blue are the percentiles 95, 87.5, 75, 62.5, 50, 37.5, 25, 12.5 and 5)

So, the interesting thing is that even if the mean is 100, in 75% of the experiments your final equity is less than the initial one if you bet your full equity (f=1). If you bet 40% of your equity (f=0.4) you are under the mean only in 62.5% of the cases.

So, be careful with the probability distribution of your equity. The mean may look important, but is meaningless in these long tailed cases!

Friday, January 6, 2017

Getting data for analysis in mySQL

Steps to have the latest data available for analysis

  • Download historicos from megabolsa.com.
  • Extract to a directory mamed historicos.
  • Download additional daily data from here and place them in a directory named diarios.
  • First, treat historic data and create all.txt.cut with these 2 commands:
$ ../atem-0.3.5-static-linux/atem64  --field-separator='|'   --format=-time ../historicos > all.txt
$ cut -d'|' -f -7 all.txt | awk -F'|' '{$(NF)="0|0|0" FS $(NF);}1' OFS='|' | awk -F'|' '{$(3)="0" FS $(3);}1' OFS='|' > all.txt.cut

  • Then combine the daily close files in tmp.txt.cut with these 2 commands
$ awk -F',' '{x=$2;yy=substr(x,1,4);mm=substr(x,5,2);dd=substr(x,7,2); $2=yy"-"mm"-"dd}1' OFS='|' ../diarios/*.txt | sort > tmp.txt
$ awk -F'|' '{$(NF)="0|0|0" FS $(NF);}1' OFS='|' tmp.txt | awk -F'|' '{$(3)="0" FS $(3);}1' OFS='|' > tmp.txt.cut


  • Finally, create the database:

$ rm val.db
$ sqlite3 val.db < 0create.sql

$ sqlite3 val.db < 1fill.sql


The contents of 0create.sql
CREATE TABLE stockprices (
        symbol          varchar(12) not null default '',
        date            date default null,
        previous_close  real default null,
        day_open        real default null,
        day_high        real default null,
        day_low            real default null,
        day_close       real default null,
        day_change      real default null,
        bid             real default null,
        ask             real default null,
        volume          numeric default null
    );
CREATE UNIQUE INDEX stockprices_pkey
        on stockprices (symbol, date);



The contents of 1fill.sql
.import tmp.txt.cut stockprices
.import all.txt.cut_cleo stockprices