I have had a difficult time figuring out when to stop learning and to start trying to work on something of my own. I feel a pretty strong urge to make something original. I downloaded years of data from the Florida Department of Education, but that whole thing seemed so daunting. Also, I don’t feel like I know enough yet to really discover anything. I can make some graphs and do some exploratory data analysis, but I haven’t dived very deep into the statistical thinking or machine learning required to really find insights.
So here is a little tidbit to go along with my own path which has brought me here: a random walk.
From a starting point, in this case the origin, randomly take a step either forward, backward, to the right, or to left. The probability of each choice is the same. And repeat. The image above shows a random walk with one thousand steps. I generated it with some Python code using a visualization package called Bokeh. You can find the code on my Githup page.
Random walks are a common feature in nature. A photon released from a fusion explosion in the center of the sun can take over a million years to reach the sun’s surface as it is absorbed and reemitted in a random direction by billions of hydrogen atoms. Once freed from its walk, the photon reaches earth only eight and a half minutes later. Brownian motion, described by Einstein in one of his “miracle year” papers of 1905, can also be modeled as a random walk.
I decided to explore random walks with Bokeh because the package can create interactive visualizations. I got it all working on my home machine, but I haven’t figured out how to host them on this site. So for now, instead of creating your own walks, you must find satisfaction in this gallery of randomly colored random walks. I made the lines semi-transparent so you can see the overlap. Once I get the interactive version working online, I’ll update this post (it lets you slide through the walk and watch the whole thing unfold. Very cool!).
I did some simple analysis of the distribution of the final distances traveled by the walks. Since each walk beings at the origin, the straight-line distance it has traveled is just found with your favorite equation from algebra: a2 + b2 = c2. I ran 10,000 walks with 100 steps each. Here are the results:
In blue are the empirical results and in green is a normal distribution with identical mean and standard deviation. At first, I though the heavy counts to left of the mean might be noise, but after running the simulations several times, the distribution continued to be skewed the same way. The distances are not normally distributed! You have a little better chance of being closer to the origin than the average distance traveled. Your chances of being on the low side of the mean are better than being on the high side of the mean.
Perhaps that makes sense: to get far from the origin, your steps overall need to be in the same direction (or at least 2 of the same directions, e.g. if you keep going up and to the right). This is less likely than having a roughly equal number of steps in each direction. Maybe I can do some further analysis and count the steps each walk takes and see where that leads me.
I’m sure that this is all well documented in the analytic literature, but it has been fun running through it on my own. Currently, I’m working through some of the stats courses on DataCamp, after which comes the machine learning stuff. Hopefully I’ll have more tools at my disposal soon to actually dive into some real data.