Personal DH Methods Blog | A blog by Anderson Evans

London

By Anderson Evans | Published: April 1, 2013

Just before spring break I went to London (great timing, I know). While I was there I took a lot of pictures and made a webpage with which to display them. I used JavaScript and some basic HTML/CSS to build the page, these are not necessarily the DH Methods we have been looking at, I think such methods are relevant to the DH crowd. I contend that any basic HTML use, simple or hard, should be practiced regularly. If one wants to be able to really have control over how one displays his or her DH work, CMSes (while time-saving) are not always the answer.

London Photos:

http://www.weekendpublisher.com/LondonPhotos/photos2.html

The JavaScript slider I utilized for the display can be found here. It is a great slider, and if you are using an iPad or tablet you can easily full screen the display:

http://www.scriptiny.com/2008/12/javascript-slideshow/

This week’s assigned tasks were based around further exploration of Spatial Analysis. I’m hoping that use of examined GeoSoftware, and the revisitation of mapping software we used earlier in the semester can get me to a point where I am comfortable using my higher-res copies of the photographs. I’m certain these techniques will lend themselves to building a more dynamic website that gives an idea geographically as to where exactly each one of the photographed locations can be found.

Posted in Uncategorized | Comments closed

Cruel Java

By Anderson Evans | Published: March 8, 2013

Heaps of praise are not what one will find being thrown at any portion of the Java programming language lately. See for instance these recent articles:

What Is Java, Is It Insecure, and Should I Use It? [Lifehacker]
Thanks, Oracle: New Java malware protection undone by old-school attack [Ars Technica]
Google search hijack may be a symptom of bad Java [Blog.Chron.Com]

And these were just a few cherry picked examples. No, Java is not finding itself in the good graces of tech bloggers for the time being, and I, too, have been struggling with issues I attribute to the language.

I have been trying to up my skillset in relation to XML and XSLT, and in reading the book XSLT 2.0 and XPath 2.0 Programmer’s Reference by Michael Kay, I found myself in my own nightmarish run-in with Java. Early in the reading the author describes the software he will be using to perform exercises and a suggested piece of software called Kernow(a Graphical Interface for XML/XSLT transformations) needs Java Runtime to operate. I did not think this would be a problem, as several weeks ago I struggled for an entire weekend to optimize Java on this machine to run the data-vis program, Mondrian.

I thought my installation of Kernow had been an immediate success, however this was not so. I had no problem getting the introductory splash graphic to load, but the program never moved beyond this static image. After getting up close and personal with the command-line for a while, cutting and pasting specific errors into Google, it seems my headaches were most properly attributed to villainous Java: thanks to it’s hodgepodge of security flaws, many of its manifestations have been yanked out of the online software libraries, and the often touted alternative “OpenJDK” with the “IcedTea” plugin, which is running on my machine seems incompatible with the Kernow software.

Once this issue was realized I began what is so far a fruitless attempt to change the OpenJDK to “Sun Java” ( Oracle Java? Commercial Java.) and while I have no problem finding resources to address this need, these resources have not yet proved at all helpful.

EDIT: I was finally able to run the program after using the “Easy Way” method from MakeTechEasier.com. It seems OpenJDK was definitely the problem when attempting to run Kernow.

EDIT(2): This just takes the whole “Java is bad” credo and pushes it to the next level: http://java-0day.com/ (thanks to http://labs.untropy.net/ for the link!)

Posted in XML | Tagged java, Kernow, Michael Kay, OpenJDK, Sun JDK, XML, XSLT | Comments closed

HelloWorld.XML + XSLT + SAXON XSLT Processor = HelloWorld.HTML

By Anderson Evans | Published: March 2, 2013

I have logged a lot of hours this semester trying to learn some things that make the concepts we’ve been discussing in our Digital Humanities classes somewhat practical. Save for the few computer science people peppered in the MALS roster, I have little doubt these steps seem dubious to the bulk of the class. Last week I attempted to write rather hefty blog post on my initial experience with XML and XSLT, but for some reason the blog post would not correctly display, and remains in draft form. I hope to fix this issue at some point.

A simple XSLT stylesheet in oXygen 14.2

This week I attempted something somewhat more involved than strictly adding XSLT styling to a small sample of a dataset encoded with XML, and I have turned an XML file into an HTML file using an XSLT processor. What I found most difficult about this process was a combination of optimizing java on my OS X 10.8.2 machine as well as using this machine’s command line. After a long day of trying to get it right, I was finally able to execute a simple “Hello, World!” HTML file.

This image via http://today.java.net/images/2004/05/figure2.gif looks relevant enough to me!

I am currently trying to slog through an O’Reilly book on XSLT, and it suggests I utilize an XSLT processor called XALAN, which seems to have been left behind sometime in 2007. I then realized the book I was reading has a 2001 publication date, so I worry I might be learning a slightly outmoded way of doing things. While I was not able to get an archived version of XALAN up and running, I was able to configure another open source XSLT processor called SAXON to run via the terminal.

SAXON running via Terminal… It may be super tiny, but at least it isn’t Comic Sans, amIright?

I am now going to attempt to take a dataset that is significantly larger than “Hello, World!” and use XSLT to create a dataset.

Now while I explained what I did in 6 – 8 hours in a few short paragraphs, the process was far from smooth. Anyone else who is dipping their toes into these waters without a lot of swimming lessons will probably find that many of the resources they are finding seem outdated. XML seems to be something many programmers are currently trying to move beyond. Twitter’s API (Application Programming Interface) only recently went from having their data available in XML format and JSON format to strictly JSON. JSON is a format you start seeing across the board when it comes to practical uses of API, however in the Digital Humanities realm, this is not the case… or is it?

Once I started seeing the coorelation between XML’s vanishing traces and JSON’s sweeps I began searching specifically for JSON and TEI together, and found some interesting thoughts on this subject, such as:

JSON is a format that stands in relative opposition to TEI. Since TEI can be reduced to XML[3], which is arguably a very strong format across many kinds of computing, it is no wonder that an XML based format would be a natural choice for marking up texts. However, JSON can trace its roots back to Javascript and, ultimately, conventions used by the entire C-family of programming languages.[4] In this, JSON is arguably more universal than XML. via Strange Bedfellows
Unfortunately, XML is not well suited to data-interchange, much as a wrench is not well-suited to driving nails. It carries a lot of baggage, and it doesn’t match the data model of most programming languages. When most programmers saw XML for the first time, they were shocked at how ugly and inefficient it was. It turns out that that first reaction was the correct one. There is another text notation that has all of the advantages of XML, but is much better suited to data-interchange. That notation is JSON. via JSON.org

Yet, all of the DH projects I found were done specifically using JSON alongside XML, rather than as an alternative to:

William Godwin’s Diary via Oxford University
Concordia – Online Resource for Ancient Studies via King’s College London

That Godwin/Wollstonecraft/Shelley family sure could write a damn good novel.

I’ve also noticed both in the DH Methods class and in Lev Manovich’s Big Data class I am having to use java applets for necessary software (Mondrian and now XSLT processors). I’d like to talk more about Java, though I do not get the impression that it is something I should be diving into just because it runs specific applets that I need to do other things.

One final thought: I found THIS MESSAGEBOARD discussing the possibility of an “XSLT Zen Garden” which I would love to see. In fact I’d love to see a TON of Zen gardens, the CSS Zen Garden was fairly responsible for me learning CSS almost a decade ago. Definitely check that out if you haven’t seen it. Would such a thing be possible with XSLT?

Posted in XML | Tagged java, json, mondrian, oxygen, saxon, XML, XSLT | Comments closed

Basic XML Exploration and Thoughts on Moving Beyond

By Anderson Evans | Published: February 21, 2013

I’ve been practicing with XML and XSLT stylesheets. Using this list from the LA Times I’ve been able to create a very basic data set complete with book covers. I believe this is a very good first step toward being able to create something engaging with a relevant data set. I’m not sold on this theme. I believe the difficulties inherent in trying to do textual analysis of postmodern prose with all of its experimental punctuation would prove more tiring than beneficial. It is certainly the kind of data set that is worthy of further investigation at some point, but not for me, not right now.

Figure 1: XML markup

Figure 2: XML file after XSLT styling implemented

I’ve been watching the paradigm shifting House of Cards on Netflix, and there are several aspects of the show I find interesting and prescient. The series follows a high ranking US government politician acting as a fourth wall breaking antihero in a black comedy about government, a play on a trope that goes back at least as far as Robert Penn Warren’s All the King’s Men and perhaps even as far back as Shakespeare’s Richard III. How does dominating cultural literature and its audio-visual offspring inform us in regard to our cultural understanding of government and politics? We often analyze the 24 hour news cycle, but is there something perhaps even more telling in our more abstracted forms of mass expression? How “real” are our political figures in this country, and how much are they disembodied icons we can only understand in the capacity of encoded media communication? And while our metanarrative of around-the-clock “news” cycles pushes on, we often neglect the study of cinema’s commentary on political theater.

Figure 3: House of Cards Promo Poster

And I do think I should emphasize cinema here, because House of Cards walks the line between a 13 hour feature film and a television series in a way that is completely new and as the most watched show on Netflix, the series’ performance far exceeds what must have been expected. The success of this new embodiment of long form narrative insists that the medium has a message worth considering: this new rapidity of content delivery makes one question even more what is real and what is unreal. These are the pseudo-conclusions I have after initial close readings/viewings of material, and I believe a kind of digitally mediated distant reading supplemented with computational analysis and more in-depth close reading might bring certain trends to light, in both fictional worlds and the this other world that professes to be tangible.

NOTE:

The assigned text An Even Gentler Introduction to XML was helpful in my initial XML exploration

I was able to move into even more constructive territory after viewing this XML video tutorial:

Posted in XML | Tagged All the King's Men, House of Cards, postmodern novels, Richard III, XML, XSLT | Comments closed

NES Instruction Manuals

By Anderson Evans | Published: February 10, 2013

For experimental use of OCR software and text encoding I have decided to switch from attempting to pull out text from comic books (which has proven difficult), and to pull from databases of video game instruction manuals, specifically NES. There is still a lot of cleanup to do, but the issues are far less taxing than trying to pull from Uncle Scrooge.

Example from Adventure Island Operation Manual (NA release 1988)

THE STORY & CONTROLLER Hudson’s Adventure Island Game Story: The Evil Witch Doctor has kidnapped Princess Leilani from Master Higgins and taken her to Adventure Island in the South Pacific. It is your mission to help Master Higgins to save Princess Leilani, but it’s not going to be easy. On the island, there are forests, mountains, caves, many enemy characters, and traps waiting for you . .Can Master Higgins save Princess Leilani from the Evil Witch Doctor? CONTROL FUNCTIONS: Energy Level: Master Higgins’ energy level is displayed•i n top of the screen. The level will decrease as the time passes by. By catching the fruits on the screen you will be able to add a little bit of energy back or get a bottle of milk to fill them all up. Of course, loosing all energy means the game is over.

I think this content could serve as an entertaining dataset, and is ripe for an aesthetically pleasing set of visualizations.

Posted in Uncategorized | Tagged manual, manuals, NES, Nintendo, OCR, text extraction | Comments closed

Uncle Scrooge Experiment Session 2

By Anderson Evans | Published: February 9, 2013

Uncle Scrooge #89

As far as data sets go, Comichron.com has a detailed table presenting data on Uncle Scrooge comics distributed from 1960 to 1998. This table can be found here: http://www.comichron.com/titlespotlights/unclescrooge.html

I cut and pasted into data wrangler. The issue I found was that the amount of information seemed to overload the application. a table sprang up only for data from 1960 to 1965, and I am still having issues moving the row that displays the year to the correct position as pictured here:

Uncle Scrooge comics data table

Posted in Uncategorized | Tagged OCR, text extraction, Uncle Scrooge | Comments closed

Uncle Scrooge Experiment Session 1

By Anderson Evans | Published: February 9, 2013

Uncle Scrooge Corpus Project
Session 1

Uncle Scrooge #89

So my first attempt at text extraction was to take a very old Uncle Scrooge comic book, and use the OCR tools in Adobe Acrobat X. I am not completely unfamiliar with this process, as I had to use this software at a former job. I have experience with turning typewritten manuscripts into editable PDF files. I was curious as to how well text extraction would work with a comic. My initial results are aggravating.

Page 1:
The software was unable to read more than half of the text. It extracted the first speech bubble and the metadata at the bottom of the page:

1101..0 YOUR IIORS5
OONM.O l WWERE ‘
ARE ‘tOO GOING IN SUCH A HURRY l

iJSTMASTER• Please send nolfce on Form 3579 to Western Publis hing Company, Inc., orth Road, Poughkeepsie, New York 12602
? .
:·s ey UNelE SCROOGE, No. 89, October, 1970. Published bi-monthly by Western P blishlng Company, Inc.,- North Road, Poughkeep
: York 12602. Second-class postage paid at Poughkeepsie, New York. Subscr ipt ion pr ic e in t he U.S.A. $1.00 per year; foreign subscrip
s: 5 to be repr
• ·-:Jt permission of Walt Disney Productions. Authorized edit ion. Printed in U.S.A.
– 1870, 185&, by Walt Disney Productions.
GOLD KEY & DESIGN is a Trademark of Western Publis hing Company, Inc.
CHANGE OF ADDRESS should reach us six weeks in advance of the next issue date. Give both your old and
‘ · . new addnu enclosinif possible your old address label.
” • • :al may not .be sold except by authorized dealers and is sold subject to the conditions that it shall not be sold or distribute
rt of its· cover or markinis removed, nor in a mutilated ondition nor affixed to nor as part of any advertising, literary

PAGE 2
I had the same problem here, for some reason the software is picking up only the first speech bubble. I imagine there is something in the settings I can alter to fix the issue…

I WISH ‘t’OU ‘D BE LI KE lt\E, OONALOl I NEVER WASTE MONeY OR
. ANYTHING l

FAIR USE NOTICE: This study utilizes copyrighted material the use of which has not been specifically authorized by the copyright owner. I am making my study available in an effort to advance understanding of environmental, political, human rights, economic, democracy, scientific, and social justice issues, etc. I believe this constitutes a ‘fair use’ of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit to those who have expressed a prior interest in receiving the included information for research and educational purposes. For more information go to: http://www.law.cornell.edu/uscode/17/107.shtml.

If you wish to use copyrighted material from this site for purposes of your own that go beyond ‘fair use’, you must obtain permission from the copyright owner.

Posted in Uncategorized | Tagged information extraction, OCR, text extraction, Uncle Scrooge | Comments closed

Goals:

By Anderson Evans | Published: January 31, 2013

Interested in exploring the following tools/methods:

Hoping to get more familiar with practical API use
Interested in how the internet landscape can be utilized with changes from HTML4 to HTML5/CSS3 in terms of Humanities content curation
Learning to use GitHub in a more professional fashion

My personal interest, as far as my research goes, deals with digital narrative and the digitization of narrative. I’ve been exploring programming/coding themed MOOCs and FPS game engines in hopes of learning skills that would allow a more immersive relationship between humanities based academics and higher level content virtualization.

Links:

Walden Pond Virtualization from USC: http://cinema.usc.edu/interactive/research/walden.cfm
Unity3D Hadrian’s Villa: http://idialab.org/nsf-funded-virtual-simulation-of-hadrians-villa
10Print book from MIT: http://lab.softwarestudies.com/2012/11/10-print-book-from-mit-software-studies.html

Posted in Uncategorized | Comments closed

RSS Links
- All posts
- All comments
Meta
- Register
- Log in

A blog by Anderson Evans

London

Cruel Java

HelloWorld.XML + XSLT + SAXON XSLT Processor = HelloWorld.HTML

Basic XML Exploration and Thoughts on Moving Beyond

NES Instruction Manuals

Uncle Scrooge Experiment Session 2

Uncle Scrooge Experiment Session 1

Goals:

Search

Pages

Categories

Archives

RSS Links

Meta

Need help with the Commons?