MediaWiki based online Tasks management for ingest tool

From Biowikifarm Metawiki
Jump to: navigation, search


The issue of MediaWiki view of Tasks management for Ingest Tool naturally close a framework that begin with a medatata submision in MediaWiki and ends with metadata ingestion in Fedora Commons repository. Ingest Tool uses a transactional Tasks Management for all the operation. See also: Implementation_details_of_Java-based_Fedora_Ingestion_code#Task_management

During the ingestion, the work flow is driven by a MySQL database, fedoralogs. There are 5 tables which contain:

  • tasks - concurrent tasks which are acting towards the aims - ingestion, thumbnails generation, harvesting;
  • successful jobs - succeeded jobs;
  • waiting jobs - jobs to do. A harvesting task might generate jobs to do (ingest). An ingest task can also generate jobs to do (thumbnail generation);
  • failure jobs - for some reasons, jobs mights failed ...
  • last harvesting time.

Technically, the online view of Tasks management has two components:

  • Java based xml exporting component which export periodically work database for ingest tool.
  • MediaWiki Extension JSHTMLWidget

Java based xml exporting component

A simple java based application periodically reads fedoralogs database and export all 5th tables as xml. This application can be manually started or can be scheduled into cron tab file.

XML file structure

The xml file has a fixed name, "time.xml", and will be exported while k2n.xml.transmit=true. It will be stored in a place established by file as follows:

# export working database fedoralogs
# storage for xml file generated from fedoralogs

time.xml file structure:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
         <field name="Last_Time">2009-12-28 22:03:38.0</field>
         <field name="IDTask">K2N_28-12-09T22-03-34</field>
         <field name="E_Date_O">2009-09-25 17:32:29.0</field>
         <field name="Name">Interactive_Flora_of_the_British_Isles_(ETI)</field>
         <field name="HarvestedSource"></field>
         <field name="Base"></field>
         <field name="IDTasks">K2N_25-09-09T17-49-20</field>
         <field name="Modified">2009-08-18 14:23:47.0</field>
         <field name="Number">12265</field>
         <field name="Running">6</field>
         <field name="Images">1</field>
         <field name="type">3</field>
         <field name="B_Date_P">2009-09-25 15:06:17.0</field>
          <field name="Name">Database_of_Invertebrate_Pictures_(PMSL)</field>
          <field name="HarvestedSource"></field>
          <field name="Base"></field>
          <field name="IDTasks">K2N_28-12-09T22-03-34</field>
          <field name="Modified">2009-12-23 23:02:31.0</field>
          <field name="Images">0</field>
          <field name="B_Date_P">2009-12-28 22:03:38.0</field>
          <field name="E_Date_O">2000-01-01 00:00:00.0</field>
          <field name="Name">Butterflies_and_Moths_of_the_World_(NHM</field>
          <field name="HarvestedSource"></field>
          <field name="Base"></field>
          <field name="E_Date_P">2009-11-10 09:03:25.0</field>
          <field name="IDTasks">K2N_10-11-09T09-03-23</field>
          <field name="Modified">2009-11-03 00:06:56.0</field>
          <field name="Images">0</field>
          <field name="type">1</field>
          <field name="B_Date_P">2009-11-10 09:03:24.0</field>
          <field name="E_Date_O">2009-12-13 16:40:09.0</field>
          <field name="Name">Postcode_Plants_Database_(NHM)</field>
          <field name="HarvestedSource"></field>
          <field name="Base"></field>
          <field name="E_Date_P">2009-12-13 16:40:36.0</field>
          <field name="IDTasks">K2N_13-12-09T15-47-19</field>
          <field name="Modified">2009-12-10 15:06:06.0</field>
          <field name="Number">5215</field>
          <field name="Images">1</field>
          <field name="type">1</field>
          <field name="B_Date_P">2009-12-13 15:47:41.0</field>

JSHTMLWidget Extension

JSHTMLWidget is an extension that allows the instantiation of a html widget and inclusion of dependent js/css files. Due to security reasons there are special directories where the html files are placed. Also the js and css files that are referenced from the resulting page need to reside in a special location (in order to prevent cross site scripting attacks).

Use of JSHTMLWidget

<JSHTMLWidget htmlbody="body.html" jsincludes="file1.js|file2.js|..|filen.js" cssincludes="file1.css|file2.css|...|filen.css">  

The above markup will be expanded in with the content of the “widget.html” file. Also the jsincludes files and cssinclude files will be referenced in the <head> section of the rendered page.

  • Requirements

In order to be able to include the specified js and css files the $wgAllowUserJs and $wgAllowUserCss variables need to be set to the value “true” in the LocalSettings.php (or whatever configuration file is used).

TaskList Widget

The TaskList widget uses the JSHTMLWidget extension. The widget uses the following files:

  • TaskMonitor/body.html – the html body of the widget.
  • TaskMonitor/refresh.js – the js file that contains the js code that send AJAX requests to the server in order to update the status of the tasks;
  • TaskMonitor/taskmonitor.css – contains css markup used by the html elements in the widget’s body;
  • It also uses the JQuery library for DOM manipulation and JTemplates for manipulation of html templates in javascript.

TaskList widget periodically sends an AJAX request to the server to update the status of the tasks displayed in the page.

Use of TaskList widget

<JSHTMLWidget htmlbody="TaskMonitor/body.html" 

Notice: Special thanks to my friend Cristian Botau who helped me with this extension development. --Lia Veja 12:05, 6 January 2010 (CET)

Installation of online view of tasks management

The following steps are necessary for this application's installation:

  • create the path /usr/share/mediawiki/phase3/js2/Widgets/TaskMonitor on Debian for /lib folder, refresh.js and taskmonitor.css file, so that could be accessed from browser;
  • the javascript refresh.js file tries to access "/metawiki/js2/Widgets/TaskMonitor/time.xml". In order to do this, a symbolic link will give the access to time.xml file from this location.
  • create the symbolic link:
 ln -s /var/www/tools/FedoraIngestEngine/time.xml time.xml
  • java based application will read fedoralogs database and will generate time.xml file in the fixed location: /var/www/tools/FedoraIngestEngine/time.xml.
  • we have not a real explanation, but LocalSettings.php on file should be "touched" and saved again after these steps!
touch /var/www/metawiki/LocalSettings.php