TTV-Tutorial: Web Data Collection – Part 2 (Data Mining)

In our previous post about Web Data Collection (Part 1), we explained how to set up a Template for data extraction when we have a premade list of websites to pull out specific information from. In this post we’re going to highlight (TTV) utilization for data mining, invoking our Workers to query the web for data.
 
For the guiding example of data mining task we have a CSV (Comma Separated Values) spreadsheet full of movie titles and their corresponding posters. You can see a sample of the CSV spreadsheet here: csv
We are looking to gather the following information for each movie title: production company, release year, Director, genre(s) and MPAA rating.
 
Below you can find link to created Template on our platform as well as the screen-capture of it:
 
tipTemplate is available for cloning and further customization as per desired. Simply click on green “Clone” button, edit and save it with your given title
 
Template Preview: (requires Microworkers account login): Collect Information About Popular Movies
 
Screen-capture: (click to enlarge): campaign img

____________________________________________________

 
Let’s now separately analyze each designed part in sequence.

  • Task Objectives, Task Instructions and Helpful Tip(s) Panel

In order to showcase the wide opportunities of Bootstrap styling codes we played a bit more with panels and tables, putting them together inline with text. Sometime you may need to put lots of guidance but you also care to keep things organized in your Template.
 
Below topics display Template’s source code where all vital parts are marked in red.
 
tipWell-designed and organized Templates have proven to significantly increase Workers’ participation as they make task more attractive


info panels

<!-- Instructions -->
<!-- Task Objectives -->
<div class="panel panel-primary">
<div class="panel-heading">
<strong>Task Objectives:</strong></div>
<div class="panel-body">
<h5><strong>Help us to collect the specific movie information from the most popular titles over the past 40 years</strong>
</h5>
<p>&nbsp;
</p>
<!-- End Task Objectives -->
<!-- Task Instructions -->
<div class="panel panel-info">
<div class="panel-heading">
<strong>Task Instructions:</strong></div>
<table class="table">
<tbody>
<tr>
<td class="col-sm-6">

<ul>
<li>Query the web for given movie Title and collect the specific information we're asking</li>
<li>Compare the poster image to ensure you're looking for an exact movie release</li>
<li>Enter as much information as you can find for movie's &quot;Production Company&quot; and/or &quot;Genre(s)&quot;. Otherwise, leave the fields/checkboxes <u>blank</u></li>
</ul>
</td>
<!-- End Task Instructions -->

<!-- Helpful Tip(s) -->
<td class="col-sm-6">
<div class="panel panel-warning">
<div class="panel-heading">
<strong>Helpful Tip(s):</strong></div>
<div class="panel-body">
<ul>
<li>To facilitate the process you may look up for data on these websites: <a href="http://www.imdb.com/" target="_blank">iMDB</a>, <a href="http://www.rottentomatoes.com/" target="_blank">Rotten Tomatoes</a>, <a href="http://www.wikipedia.org" target="_blank">Wikipedia</a>, <a href="http://www.movies.com/" target="_blank">Movies.com</a></li>
</ul>
</div>
</div>
</td>
<!-- End Helpful Tip(s) -->
</tr>
</tbody>
</table>
</div>
</div>
</div>
<!-- End Instructions -->

(Coding Source: BS Panels, BS Tables, BS Grid System, BS List Groups, HTML a href Attribute)

  • Movie Title

To accomplish dynamic behavior and spread out an unique movie title to each Worker we set parameter ${movie_title} which takes values from CSV file mentioned at the beginning, precisely from movie_title column. (Read more about CSV approach here)
 
tipOption to enclose a CSV with Template is given during the campaign setup. In order to ensure CSV format compliance, it’s preferred to utilize “Download Sample CSV” once the Template is ready. The system will generate blank CSV file with Template-matching column(s) for you to enter your values. “Download Sample CSV” link is found on individual Template page after one is saved)
 
img

<!-- Csv Movie Title -->
<div class="well well-sm">
<p class="text-center">
<strong>Movie: </strong>
<mark>${movie_title}</mark></p>
</div>
<!-- End Csv Movie Title -->

(Coding Source: HTML p align Attribute, BS Wells)

  • Movie Poster

The only difference compared to process described slightly above is that instead of text, we now need to load up dynamic parameter for images. This is simply solved by inserting img src tag to the parameter ${poster_url} (i.e. <img src="${poster_url}"). The system now takes values stored under poster_url column of CSV.
 

Visual Editor: Click Here

tipCSV file may have many columns (separate variables), but note that if used within same Template they always show up in row-major order
 
img

<!-- Csv Movie Poster -->
<div class="row">
<div class="col-sm-3"><img src="${poster_url}" />

</div>
<!-- End Csv Movie Poster -->

(Coding Source: BS Grid System, HTML IMG src Attribute)

  • Data Input

Similarly to Web Data Collection (Part 1), we took advantage of quite applicable Bootstrap Table styling, however, with certain changes as to input forms alignment. Considering the type of information that are required, we’ve put text fields for Production Company and Director fields, dropdown for the Release Year, checkboxes for Genre(s) and lastly the radio buttons for Rating field.

Visual Editor (“Text Fields”): Click Here

Visual Editor (“Dropdown”): Click Here

Visual Editor (“Checkboxes”): Click Here

Visual Editor (“Radio Buttons”): Click Here

 
tipQuestion forms require an adequate name and/or value input upon inserting. Otherwise, your collected results could easily overlap — keep in mind that generated CSV results come out under given labels (look for a CSV example at the bottom)
 
info panels

<!-- Collected Data -->
<div class="col-sm-9">
<div class="panel panel-success">
<div class="panel-heading">
<strong>Collected Data:</strong></div>
<div class="panel-body">
<!-- Production Company -->
<div class="row">
<div class="col-sm-3">
<p class="form-control-static">
<strong>Production Company:</strong>
</p>
</div>
<div class="col-sm-3">
<div class="form-group">
<input class="form-control input-sm" name="production_co" placeholder="Production Co:" type="text" />

</div>
</div>
<div class="col-sm-3">
<div class="form-group">
<input class="form-control input-sm" name="production_co_2" placeholder="Production Co:" type="text" />

</div>
</div>
<div class="col-sm-3">
<div class="form-group">
<input class="form-control input-sm" name="production_co_3" placeholder="Production Co:" type="text" />

</div>
</div>
</div>
<!-- End Production Company -->
<!-- Release Year -->
<div class="row">
<div class="col-sm-3">
<p class="form-control-static">
<strong>Release Year:</strong>
</p>
</div>
<div class="col-sm-3">
<div class="form-group">
<select class="form-control" name="year_released"><option value="1975">1975
</option><option value="1976">1976</option><option value="1977">1977</option><option value="1978">1978</option><option value="1979">1979</option><option value="1980">1980</option><option value="1981">1981</option><option value="1982">1982</option><option value="1983">1983</option><option value="1984">1984</option><option value="1985">1985</option><option value="1986">1986</option><option value="1987">1987</option><option value="1988">1988</option><option value="1989">1989</option><option value="1990">1990</option><option value="1991">1991</option><option value="1992">1992</option><option value="1993">1993</option><option value="1994">1994</option><option value="1995">1995</option><option value="1996">1996</option><option value="1997">1997</option><option value="1998">1998</option><option value="1999">1999</option><option value="2000">2000</option><option value="2001">2001</option><option value="2002">2002</option><option value="2003">2003</option><option value="2004">2004</option><option value="2005">2005</option><option value="2006">2006</option><option value="2007">2007</option><option value="2008">2008</option><option value="2009">2009</option><option value="2010">2010</option><option value="2011">2011</option><option value="2012">2012</option><option value="2013">2013</option><option value="2014">2014</option><option value="2015">2015</option>
</select>
</div>
</div>
</div>
<!-- End Release Year -->
<!-- Director -->
<div class="row">
<div class="col-sm-3">
<p class="form-control-static">
<strong>Director:</strong>
</p>
</div>
<div class="col-sm-9">
<div class="form-group">
<input class="form-control input-sm" name="director" placeholder="Directed by" type="text" />

</div>
</div>
</div>
<div class="row">
<div class="col-sm-3">
<p class="form-control-static">
<strong>Genre(s):</strong>
</p>
</div>
<!-- End Director -->
<!-- Genre(s) -->
<div class="col-sm-3">
<div class="checkbox"><label>
<input name="genre" type="checkbox" value="Sci-Fi" />Sci-Fi
</label>
</div>
<div class="checkbox"><label>
<input name="genre" type="checkbox" value="Thriller" />Thriller
</label>
</div>
<div class="checkbox"><label>
<input name="genre" type="checkbox" value="Comedy" />Comedy
</label>
</div>
<div class="checkbox"><label>
<input name="genre" type="checkbox" value="Adventure" />Adventure
</label>
</div>
</div>
<div class="col-sm-3">
<div class="checkbox"><label>
<input name="genre" type="checkbox" value="Family" />Family
</label>
</div>
<div class="checkbox"><label>
<input name="genre" type="checkbox" value="Mystery" />Mystery
</label>
</div>
<div class="checkbox"><label>
<input name="genre" type="checkbox" value="Drama" />Drama
</label>
</div>
<div class="checkbox"><label>
<input name="genre" type="checkbox" value="Animation" />Animation
</label>
</div>
</div>
<div class="col-sm-3">
<div class="checkbox"><label>
<input name="genre" type="checkbox" value="Fantasy" />Fantasy
</label>
</div>
<div class="checkbox"><label>
<input name="genre" type="checkbox" value="Crime" />Crime
</label>
</div>
<div class="checkbox"><label>
<input name="genre" type="checkbox" value="Romance" />Romance
</label>
</div>
</div>
</div>
<!-- End Genre(s) -->
<!-- Rating -->
<div class="row">
<div class="col-sm-3">
<p class="form-control-static">
<strong>Rating
<abbr title="Motion Picture Association of America's (MPAA) film-rating system"> (MPAA):
</abbr></strong>
</p>
</div>
<div class="col-sm-9">
<div class="form-group"><label class="radio-inline">
<input name="rating" type="radio" value="g" />
&nbsp;<u>G</u>
</label> <label class="radio-inline">
<input name="rating" type="radio" value="pg" />&nbsp;<u>PG</u></label> <label class="radio-inline">
<input name="rating" type="radio" value="pg-13" />
&nbsp;<u>PG-13</u></label> <label class="radio-inline">
<input name="rating" type="radio" value="r" />
&nbsp;<u>R</u></label>
</div>
</div>
</div>
<!-- End Rating -->
</div>
</div>
</div>
</div>
<!-- End Collected Data -->

(Coding Source: BS Panels, BS Grids, BS Form Inputs, BS Labels, HTML abbr tag)
 
 
____________________________________________________
 
 
Once your Template is ready and saved, you need to configure Question(s) behavior ticking and/or unticking the checkboxes besides each one your Template includes.
 
campaign img
 
When “Required” option is ticked, it means such question will be made mandatory in task.
 
tipConsidering the use of Checkbox elements, these type of questions come up unticked by default
____________________________________________________
 

If you need to download your campaign reports you can do it from individual campaign page clicking on ‘Results in CSV’. When you export a campaign reports, we generate a spreadsheet with all its relevant information (please take a look on an improvised example below).
 
campaign img
 
 
Example of CSV with results (click to enlarge):
 
campaign img
 




For any help in managing your (TTV) demands please do not hesitate to send us a message with your concerns. We always enjoy helping you out.
 
‘Till the next article stay tuned!



You might also be interested in these TTV Tutorial Articles:
 
TTV-Tutorial: Data Extraction
TTV-Tutorial: Template With Images
TTV-Tutorial: Transcribe Data From an Image
TTV Tutorial: Embedding Videos In Template

No Comments so far.

Your Reply

Leave a Reply

Your email address will not be published. Required fields are marked *