Module 4: Data Analysis

Once you have output data from a simulation, we need to be able to do something with it. This section will discuss some of the basics of working with simulation data, including some examples from real-world studies. We will discuss various methods of data visualization, as well as the basics of how to apply statistical models to your data. Lastly, we will discuss some best practices for preparing data for publication and sharing so that others can interpret (and reproduce) your simulations as easily as possible.

4.1: Data Visualization

The first step in working with data is to visualize the output so that you can assess system behavior over time (or some other variable of choice). This section will walk through several examples of how to use basic Python scripts to visualize a data set in various ways.

Loading data files

Numerical data can be loaded from a data file using the loadtxt function of numpy; i.e., the command is np.loadtxt. You need to make sure the file is in the same directory as your notebook, or provide the full path. The filename (or path plus filename) needs to be between quotes.

Exercise 4.1.#, Loading data and adding a legend

You are provided with the data files containing the mean montly temperature of Holland, New York City, and Beijing. The Dutch data is stored in holland_temperature.dat, and the other filenames are similar. Plot the temperature for each location against the number of the month (starting with 1 for January) all in a single graph. Add a legend by using the function plt.legend(['line1','line2']), etc., but then with more descriptive names. Find out about the legend command using plt.legend?. Place the legend in an appropriate spot (the upper left-hand corner may be nice, or let Python figure out the best place).

! git clone https://github.com/akmadamanchi/ThermoData.git

### if you get the error "fatal: destination path 'ThermoData' already exists and is not an empty directory."
### you can handle this by 1) opening up the menu on the left side of the screen to bring up the table of cotents.
### 2) chose the Files tab in Table of contents.  3) NOTE THIS IS NOT THE File menu at the top of the screen.
### 4) see if there is a folder named ThermoData.
### If there is you can uncomment and run the 'rm -rf ThermoData/' command in the following cell
#rm -rf ThermoData/
holland = np.loadtxt('/content/ThermoData/holland_temperature.dat')
newyork= np.loadtxt('/content/ThermoData/newyork_temperature.dat')
beijing = np.loadtxt('/content/ThermoData/beijing_temperature.dat')
plt.plot(np.linspace(1, 12, 12), holland)
plt.plot(np.linspace(1, 12, 12), newyork)
plt.plot(np.linspace(1, 12, 12), beijing)
plt.xlabel('Number of the month')
plt.ylabel('Mean monthly temperature (Celcius)')
plt.xticks(np.linspace(1, 12, 12))
plt.legend(['Holland','New York','Beijing'], loc='best');

Exercise 4.1.#, Subplots and fancy tick markers

Load the average monthly air temperature and seawater temperature for Holland. Create one plot with two graphs above each other using the subplot command (use plt.subplot? to find out how). On the top graph, plot the air and sea temperature. Label the ticks on the horizontal axis as ‘jan’, ‘feb’, ‘mar’, etc., rather than 0,1,2,etc. Use plt.xticks? to find out how. In the bottom graph, plot the difference between the air and seawater temperature. Add legends, axes labels, the whole shebang.

Colors

If you don’t specify a color for a plotting statement, matplotlib will use its default colors. The first three default colors are special shades of blue, orange and green. The names of the default colors are a capital C followed by the number, starting with number 0. For example

plt.plot([0, 1], [0, 1], 'C0')
plt.plot([0, 1], [1, 2], 'C1')
plt.plot([0, 1], [2, 3], 'C2')
plt.legend(['default blue', 'default orange', 'default green']);

There are five different ways to specify your own colors in matplotlib plotting; you may read about them here. A useful way is to use the html color names. The html codes may be found, for example, here.

color1 = 'fuchsia'
color2 = 'lime'
color3 = 'DodgerBlue'
plt.plot([0, 1], [0, 1], color1)
plt.plot([0, 1], [1, 2], color2)
plt.plot([0, 1], [2, 3], color3)
plt.legend([color1, color2, color3]);

The coolest (and nerdiest) way is probably to use the xkcd names, which need to be prefaced by the xkcd:. The xkcd list of color names is given by xkcd and includes favorites such as ‘baby puke green’ and a number of brown colors vary from poo to poop brown and baby poop brown. Try it out:

plt.plot([1, 2, 3], [4, 5, 2], 'xkcd:baby puke green');
plt.title('xkcd color baby puke green');

Exercise 4.1.#, Pie Chart

At the 2012 London Olympics, the top ten countries (plus the rest) receiving gold medals were ['USA', 'CHN', 'GBR', 'RUS', 'KOR', 'GER', 'FRA', 'ITA', 'HUN', 'AUS', 'OTHER']. They received [46, 38, 29, 24, 13, 11, 11, 8, 8, 7, 107] gold medals, respectively. Make a pie chart (use plt.pie? or go to the pie charts in the matplotlib gallery) of the top 10 gold medal winners plus the others at the London Olympics. Try some of the keyword arguments to make the plot look nice. You may want to give the command plt.axis('equal') to make the scales along the horizontal and vertical axes equal so that the pie actually looks like a circle rather than an ellipse. Use the colors keyword in your pie chart to specify a sequence of colors. The sequence must be between square brackets, each color must be between quotes preserving upper and lower cases, and they must be separated by comma’s like ['MediumBlue','SpringGreen','BlueViolet']; the sequence is repeated if it is not long enough.

Exercise 4.1.#, Fill between

Load the air and sea temperature, as used in Exercise 4, but this time make one plot of temperature vs the number of the month and use the plt.fill_between command to fill the space between the curve and the horizontal axis. Specify the alpha keyword, which defines the transparancy. Some experimentation will give you a good value for alpha (stay between 0 and 1). Note that you need to specify the color using the color keyword argument.

4.2: Statistical Analysis Methods (?)

[In Progress: To be included in update on 09/12/25]

4.3: Model Fitting & Tuning: Examples

[In Progress: To be included in update on 09/12/25]

4.4: Preparing Data for Publication & Sharing

[In Progress: To be included in update on 09/12/25]