My Random Walk

I have had a difficult time figuring out when to stop learning and to start trying to work on something of my own. I feel a pretty strong urge to make something original. I downloaded years of data from the Florida Department of Education, but that whole thing seemed so daunting. Also, I don’t feel like I know enough yet to really discover anything. I can make some graphs and do some exploratory data analysis, but I haven’t dived very deep into the statistical thinking or machine learning required to really find insights.

So here is a little tidbit to go along with my own path which has brought me here: a random walk.

1_1000
1 walk, 1000 steps

From a starting point, in this case the origin, randomly take a step either forward, backward, to the right, or to left. The probability of each choice is the same. And repeat. The image above shows a random walk with one thousand steps. I generated it with some Python code using a visualization package called Bokeh. You can find the code on my Githup page.

Random walks are a common feature in nature. A photon released from a fusion explosion in the center of the sun can take over a million years to reach the sun’s surface as it is absorbed and reemitted in a random direction by billions of hydrogen atoms. Once freed from its walk, the photon reaches earth only eight and a half minutes later. Brownian motion, described by Einstein in one of his “miracle year” papers of 1905, can also be modeled as a random walk.

I decided to explore random walks with Bokeh because the package can create interactive visualizations. I got it all working on my home machine, but I haven’t figured out how to host them on this site. So for now, instead of creating your own walks, you must find satisfaction in this gallery of randomly colored random walks. I made the lines semi-transparent so you can see the overlap. Once I get the interactive version working online, I’ll update this post (it lets you slide through the walk and watch the whole thing unfold. Very cool!).

100_100
100 walks, 100 steps each
100_1000
100 walks, 1000 steps each
100_2000
100 walks, 2000 steps each
100_2000 _zoom
detail of 100 walks, 2000 steps
100_10000
100 walks, 10,000 steps each
1000_2000
1,000 walks, 2,000 steps eachs
1000_10000
1,000 walks, 10,000 steps each

I did some simple analysis of the distribution of the final distances traveled by the walks. Since each walk beings at the origin, the straight-line distance it has traveled is just found with your favorite equation from algebra: a2 + b2 = c2. I ran 10,000 walks with 100 steps each. Here are the results:

hist
Normalized Histogram of the distances to the origin
ecdf
Cumulative Distribution Function

In blue are the empirical results and in green is a normal distribution with identical mean and standard deviation. At first, I though the heavy counts to left of the mean might be noise, but after running the simulations several times, the distribution continued to be skewed the same way. The distances are not normally distributed! You have a little better chance of being closer to the origin than the average distance traveled. Your chances of being on the low side of the mean are better than being on the high side of the mean.

Perhaps that makes sense: to get far from the origin, your steps overall need to be in the same direction (or at least 2 of the same directions, e.g. if you keep going up and to the right). This is less likely than having a roughly equal number of steps in each direction. Maybe I can do some further analysis and count the steps each walk takes and see where that leads me.

I’m sure that this is all well documented in the analytic literature, but it has been fun running through it on my own. Currently, I’m working through some of the stats courses on DataCamp, after which comes the machine learning stuff. Hopefully I’ll have more tools at my disposal soon to actually dive into some real data.

Until then!

Where to start? How to proceed?

I’ve been toying with all of this for about a month now. I have so many bookmarks for blogs, podcasts, free courses, paid courses, and on and on. I’ve checked out books from the library, bought others on Amazon, and downloaded open source texts. I have to admit that I’m a bit daunted. There are a million places to begin, and I have plenty of work to do before even becoming mildly employable. We’ll see…

Here’s my game plan:

  • Books to read
    • Data Science from Scratch by Joel Grus
    • Python for Data Analysis by Wes McKinney
    • Hands on Machine Learning by Aurélien Géron
    • The Art of Data Science by Elizabeth Matsui and Roger Peng
    • OpenIntro to Statistics by David Diez et. al.
  • Paid Online Courses
    • The entire Data Scientist Career Track from DataCamp.
    • Udemy
      • Python Megacourse
      • Python for Data Analysis & Visualization
      • Python for Machine Learning
      • Deep Learning x 4
  • Free Online Courses
    • Udacity
      • Intro to Computer Science
      • Intro to Data Science
    • Stanford
      • Statistical Inference
      • Prob – Stats
      • Statistical Learning (certified)
      • Mining Massive Datasets (certified)
      • Algorithms 1 & 2 (certified)

This should all take me the better part of a year. I hope to get enough dirt under my nails to start some simple projects soon, which will be posted here for all your viewing pleasure.

Introduction

Hello world! My name is Chad Gardner. I am former AP Physics teacher, with an educational background in astronomy, philosophy, and religion(?!). I am currently a stay-at-home dad, spending what spare time I can muster learning Python for data science. I hope to use this site as a place to dump my brain, share my progress, keep myself accountable, and all those other reason people start blogs. Eventually, this will morph into a portfolio filled with beautiful insights, graphs, and stories from the world of data. I have another, somewhat neglected blog, where I post the odd poem, philosophical insight, or political rant. Here, I hope to stick to data science, Python, and the questions that can be answered with them. I hope to be employed in this new field within a year or so, ideally without having to go back to school. Enjoy the ride!

-Chad