The Automated Processes

This note describes the automated process of updating daily the Deep Creek Answers website.

Updating the website involves a number of steps, quite a few actually.

There are basically three activities that must be performed to update the web pages on a daily basis:

  1. Scraping - Extract information from website(s) on a regular basis during the day.

    1. Collect lake level and generator status data from the deepcreekhydro website on a regular basis. We started (and are continuing) with collecting (scraping) that data every 10 minutes. It turns out that these items are updated by Brookfield every 5 minutes.
    2. Extract the desired quantities and store the results in a file
  2. Preprocessing - Process all of the data into meaningful web pages and data files

    1. Correct the extracted data for “glitches”
    2. Collect data from other sites that are already available in daily series
    3. Process ALL data and make the appropriate web pages and data sets
  3. Uploading - Upload all of the information onto the web server

    1. The constructed web pages that contain the “Highcharts JS” scripts and data
    2. The constructed web pages and images that are created with R
    3. The php file that updates the randomly selected image and link that’s on the home page All of these processes currently run on a 2 x 3 GHzQuad-Core Intel Xeon Mac with 16GB of memory. The process described under 1 and 2 about take about 1 minute; item 1 is run every 10 minutes, and takes just a few seconds.

There are a variety of ways that the above can be accomplished. The tools that have been used are those that the author has learned how to use and is comfortable with.

The above mentioned three activities make use of the following technologies:

  1. iCal - a free Apple application that schedules events. It is used to trigger the application that controls the performance of various tasks, here Automator.
  2. Automator - a native Apple application that automates all kinds of tasks (more on what tasks later). It is basically used to execute shell scripts and to move files.
  3. Shell Scripts - Shell scripting is wildly useful and a powerful way to manipulate a lot of files and to automate behind-the-scenes tasks in Mac OS X. Shell scripts contain command-line entries to execute Ruby and R scripts and to move files on the local machine.
  4. Ruby - A very popular scripting language that is used to manipulate data records, write web pages and perform mathematical calculations. Execution of Ruby scripts is done via shell script(s).
  5. R - Another fairly popular functional programming language primarily used for statistical analyses and spatial data (involving latitudes and longitudes) manipulations. It has a very strong graphical component that makes the production of beautiful charts very easy. It does have a very steep learning curve. Execution of R scripts is done via shell script(s).
  6. Highcharts JS - is a library of functions written in JavaScript (JS) to make it intuitive to create creative and beautiful charts. It is used to display the data that is regularly collected from various websites. Ruby scripts use the syntax of Highcharts JS to create the web pages that contain Highcharts JS calls. First there is a Ruby script, "lakelevel.rb,” that is put in motion every 10 minutes to retrieve the data for water level and generator status from the deepcreekhydro.com website, from the box with information in the upper right hand corner. The retrieved data is stored in a file called "hydroreadingsRAW.txt" The script that does this is called “lakelevel.rb”. The script then extracts the relevant information and appends it to the “hydroreadings.txt”. A snippet of the file is shown in the figure below: