Module 4: Data Analysis¶
Once you have output data from a simulation, we need to be able to do something with it. This section will discuss some of the basics of working with simulation data, including some examples from real-world studies. We will discuss various methods of data visualization, as well as the basics of how to apply statistical models to your data. Lastly, we will discuss some best practices for preparing data for publication and sharing so that others can interpret (and reproduce) your simulations as easily as possible.
4.1: Data Visualization¶
The first step in working with data is to visualize the output so that you can assess system behavior over time (or some other variable of choice). This section will walk through several examples of how to use basic Python scripts to visualize a data set in various ways.
Loading data files¶
Numerical data can be loaded from a data file using the loadtxt
function of numpy; i.e., the command is np.loadtxt. You need to
make sure the file is in the same directory as your notebook, or provide
the full path. The filename (or path plus filename) needs to be between
quotes.
Exercise 4.1.#, Loading data and adding a legend¶
You are provided with the data files containing the mean montly
temperature of Holland, New York City, and Beijing. The Dutch data is
stored in holland_temperature.dat, and the other filenames are
similar. Plot the temperature for each location against the number of
the month (starting with 1 for January) all in a single graph. Add a
legend by using the function plt.legend(['line1','line2']), etc.,
but then with more descriptive names. Find out about the legend
command using plt.legend?. Place the legend in an appropriate spot
(the upper left-hand corner may be nice, or let Python figure out the
best place).
! git clone https://github.com/akmadamanchi/ThermoData.git
### if you get the error "fatal: destination path 'ThermoData' already exists and is not an empty directory."
### you can handle this by 1) opening up the menu on the left side of the screen to bring up the table of cotents.
### 2) chose the Files tab in Table of contents. 3) NOTE THIS IS NOT THE File menu at the top of the screen.
### 4) see if there is a folder named ThermoData.
### If there is you can uncomment and run the 'rm -rf ThermoData/' command in the following cell
#rm -rf ThermoData/
holland = np.loadtxt('/content/ThermoData/holland_temperature.dat')
newyork= np.loadtxt('/content/ThermoData/newyork_temperature.dat')
beijing = np.loadtxt('/content/ThermoData/beijing_temperature.dat')
plt.plot(np.linspace(1, 12, 12), holland)
plt.plot(np.linspace(1, 12, 12), newyork)
plt.plot(np.linspace(1, 12, 12), beijing)
plt.xlabel('Number of the month')
plt.ylabel('Mean monthly temperature (Celcius)')
plt.xticks(np.linspace(1, 12, 12))
plt.legend(['Holland','New York','Beijing'], loc='best');
Exercise 4.1.#, Subplots and fancy tick markers¶
Load the average monthly air temperature and seawater temperature for
Holland. Create one plot with two graphs above each other using the
subplot command (use plt.subplot? to find out how). On the top
graph, plot the air and sea temperature. Label the ticks on the
horizontal axis as ‘jan’, ‘feb’, ‘mar’, etc., rather than 0,1,2,etc. Use
plt.xticks? to find out how. In the bottom graph, plot the
difference between the air and seawater temperature. Add legends, axes
labels, the whole shebang.
Colors¶
If you don’t specify a color for a plotting statement, matplotlib
will use its default colors. The first three default colors are special
shades of blue, orange and green. The names of the default colors are a
capital C followed by the number, starting with number 0. For
example
plt.plot([0, 1], [0, 1], 'C0')
plt.plot([0, 1], [1, 2], 'C1')
plt.plot([0, 1], [2, 3], 'C2')
plt.legend(['default blue', 'default orange', 'default green']);
There are five different ways to specify your own colors in matplotlib plotting; you may read about them here. A useful way is to use the html color names. The html codes may be found, for example, here.
color1 = 'fuchsia'
color2 = 'lime'
color3 = 'DodgerBlue'
plt.plot([0, 1], [0, 1], color1)
plt.plot([0, 1], [1, 2], color2)
plt.plot([0, 1], [2, 3], color3)
plt.legend([color1, color2, color3]);
The coolest (and nerdiest) way is probably to use the xkcd names, which
need to be prefaced by the xkcd:. The xkcd list of color names is
given by xkcd and includes favorites
such as ‘baby puke green’ and a number of brown colors vary from poo
to poop brown and baby poop brown. Try it out:
plt.plot([1, 2, 3], [4, 5, 2], 'xkcd:baby puke green');
plt.title('xkcd color baby puke green');
Gallery of graphs¶
The plotting package matplotlib allows you to make very fancy
graphs. Check out the matplotlib gallery to get an overview of many of
the options. The following exercises use several of the matplotlib
options.
Exercise 4.1.#, Pie Chart¶
At the 2012 London Olympics, the top ten countries (plus the rest)
receiving gold medals were
['USA', 'CHN', 'GBR', 'RUS', 'KOR', 'GER', 'FRA', 'ITA', 'HUN', 'AUS', 'OTHER'].
They received [46, 38, 29, 24, 13, 11, 11, 8, 8, 7, 107] gold
medals, respectively. Make a pie chart (use plt.pie? or go to the
pie charts in the matplotlib gallery) of the top 10 gold medal winners
plus the others at the London Olympics. Try some of the keyword
arguments to make the plot look nice. You may want to give the command
plt.axis('equal') to make the scales along the horizontal and
vertical axes equal so that the pie actually looks like a circle rather
than an ellipse. Use the colors keyword in your pie chart to specify
a sequence of colors. The sequence must be between square brackets, each
color must be between quotes preserving upper and lower cases, and they
must be separated by comma’s like
['MediumBlue','SpringGreen','BlueViolet']; the sequence is repeated
if it is not long enough.
Exercise 4.1.#, Fill between¶
Load the air and sea temperature, as used in Exercise 4, but this time
make one plot of temperature vs the number of the month and use the
plt.fill_between command to fill the space between the curve and the
horizontal axis. Specify the alpha keyword, which defines the
transparancy. Some experimentation will give you a good value for alpha
(stay between 0 and 1). Note that you need to specify the color using
the color keyword argument.
4.2: Statistical Analysis Methods (?)¶
[In Progress: To be included in update on 09/12/25]
4.3: Model Fitting & Tuning: Examples¶
[In Progress: To be included in update on 09/12/25]
4.4: Preparing Data for Publication & Sharing¶
[In Progress: To be included in update on 09/12/25]