Thursday, 30 April 2015

Semantic Web for the Working Ontologist: chapter 4

And… chapter 4 is concerned with Semantic Web application architecture.

It describes the different components of Semantic Web applications, and their use: the eponymous Working Ontologist needs to understand how applications are structured in order to create useful, useable, models.

The components are
RDF Parsers and Serializers
Parsers translate text written in N-Triples, Turtle or RDF/XML into "triples in teh RDF data model".
Serializers do the reverse.

Incidentally, if you parse from a text file, and then reserialise (I have switched to using "s" instead of "z" here because I'm not quoting from the book and I'm English) you won't necessarily get an identical output file…

RDF Stores
Enhanced databases: as well as storing/sorting data, they also "merge information from multiple data sources".

RDF Query Engines
Enable the retrieval of information from an RDF store "according to structured queries".

For the uninitiated (ie: me), a query language is simply a programming language designed for retrieving information from databases. And, the RDF query language that W3C is standardising is called SPARQL. Which is the subject of chapter 5. But, tantalisingly, the authors reveal that it "includes a protocol for communicating queries and results so that a query engine can act as a web service". My guess is that means it enables human beings to read the results: hopefully when I read the next chapter I'll learn if my attempt to serialise that sentence was successful or not…

What it definitely means is that the results provide "another source of data for the semantic web."

"An application has some work that it performs with the data it processes … using some programming language that accesses the RDF store via queries (processed with the RDF query engine)."

I also perform some work with the data I process. That makes me almost, but not quite, an application.

Anyway, the authors list the following "typical RDF applications":
  • calendar/map integration, enabling information from several different people's diaries/"points of interest gathered from different web sites" to be displayed in one place
  • annotation, allowing a variety of users to tag information with keywords that have URIs
  • content indexing of resources available in various places/stores
Most RDF systems also include a converter, which allows the system to access data – like spreadsheets, relational databases and HTML – that isn't stored in a serialised form, but is readily convertible into something that can be read by a parser.

Microformats & RDFa
Microformats are ways of tagging commonly used web-page items (eg: events, business cards), so that they can be used by RDF stores. The problem with them is that they need a controlled vocabulary and a dedicated parser.

RDFa is designed to get round that issue by using "the attribute tags in HTML to embed information that can be parsed into RDF. This is good because
  1. it's easier to extract the RDF data from pages marked up for this purpose
  2. it allows the content author "to express the intended meaning of a web pafe inside the web page itself. This ensures that the RDF data in the document matches the intended meaning of the document itself."
And… then it's nearly the end of the chapter. But first there's an important concept to understand. Happily – assuming I've got it right – this one is fairly intuitive. It's data federation. In other words, information from a range of different sources – "spreadsheets and XML, database tables and web pages" – is converted/parsed into triples and merged, so that "all queries and inferences" work on the merged data, rather than on the original sources.

Nice to end a chapter with the sense that I actually understand something… Even though we're not in Kansas any more…

And - urk! Just noticed that chapter 5 is almost as long as chapters 1-4 put together. I may bneed to b

How reading this book makes me feel.

Thursday, 23 April 2015

Semantic Web for the Working Ontologist: chapter 3

Now I'm onto "RDF—The Basis of the Semantic Web"

It's when it starts getting down to this sort of nitty-gritty stuff and I have to internalise new meanings for familiar (or semi-familiar) words that I feel in need of intellectual flotation aids. But I'm gonna dive straight in and hope to avoid a belly flop and every other variety of lousy extended metaphor.

Chapter 3 starts with an explanation of what RDF – the Resource Description Framework – is for, which is "managing distributed data", so that anyone can be able not only "to make a statement about any entity", but also to "specify any property of an entity". An "entity" is known, in this context, as a resource.

RDF is designed to enable that to happen by setting rules for how the properties of an entity are defined/expressed. These are in three parts, called a "triple" and follow the form subject – predicate – object, where the predicate is the quality linking subject and object. This framework allows for numerous ways to describe a resource/entity.

But this practice is only effective if entities have an agreed Uniform Resource Identifier (URI): "(which specifies things like server name, protocol, port number, file name etc) to locate a file (or a location in a file) on the Web". This "provides a global identification for a resource that is common across the web".

Ok - think I'm still hanging on in here. URIs tend to be long, so conventions have evolved for abbreviating them in print. The one the authors are using is called qnames.

Qnames have two parts - a "namespace" and an "identifier", (written Namespace:Identifier).
The namespace signifies what type of thing the identifier is; the identifier specifies the entity's, well, identity. Each part of each triple has its own qname, so that the different properties of a resource can be combined in different ways. So, in the RDF table the authors provide titled "Geographical Information as Qnames", one row is

geo:Scotland     geo:PartOf   geo:UK

And, in the following table, titled "Triples Referring to URIs with a Variety of Namespaces", we find entries including

lit:Shakespeare          lit:wrote          lit:KingLear
bio:AnneHathaway   bio:married     lit:Shakespeare


lit:Shakespeare         bio:livedIn     geo:Stratford

For these to work, there need to be a range of standardised namespaces, so these have been specified by W3C. Incidentally, it's a complete coincidence that all these references to Shakespeare are appearing in a blog published on his birthday.

The authors discuss one – rdf:type – at some length. "rdf" indicates that "type" (which translates into human English as "is an instance of") is an identifier used in RDF (resource description framework), rather than in RDFS (the RDF Schema language) or OWL (the Web Ontology Language).

I managed to follow that. But now, we're on to "higher-order relationships", where, for instance, we want to say that someone says something about something. The example the authors give is "Wikipedia says Shakespeare wrote Hamlet".

While we can, as the authors show, express the statement "Shakespeare wrote Hamlet in 1601" in three triples:

"bio:n1       bio:author                  lit:Shakespeare .
 bio:n1        bio:title                      "Hamlet" .
 bio:n1        bio:publicationDate   1601" .
saying that Wikipedia says Shakespeare wrote Hamlet requires a whole other layer of information, rendered here as
"q:n1  rdf:subject  lit:Shakespeare ;
           rdf:predicate  lit:wrote ;
           rdf:object  lit:Hamlet .

web:Wikipedia m:says q:n1 ."
The authors suggest readers notice that the "reification triple" doesn't necessarily mean that Shakespeare did write Hamlet, just that Wikipedia says he did.

What I noticed is that I am being gently introduced to some coding conventions.Things like q:n1, that are dropped in without explanation. And the space before the ".". This is explained a page or so later, during the discussion of "alternatives for serialization", including N-Triples – which refer "to resources using their fully unabbreviated URIs", making them difficult to print on paper – and Turtle: the method used in the rest of the book.

Turtle uses qnames, so before using it to express triples, each (local) qname needs to be linked to its (global) URI, using the form

"#prefix rdf: http//"

Although I'm slightly confused, as above that is an example from an earlier illustration in the book:

"#prefix mfg:
<> (the link doesn't work, btw).

So I'm not clear whether we need the < & > as in HTML coding, or not… Anyone care to enlighten me?

Nearly at the end of the chapter - just a few more things I need to remember before I attempt chapter 4: "Semantic Web application architecture" (urk!).

1. Turtle uses contractions/abbreviations so that
a. when several triples share both subject and predicate, they can be represented economically. For example
lit:Shakespeare b:hasChild b:Susanna .
lit:Shakespeare b:hasChild b:Judith .
lit:Shakespeare b:hasChild b:Hamnet .
can be boiled down to
 lit:Shakespeare b:hasChild b:Susanna, b:Judith, b:Hamnet .
Or, to represent an ordered list (birth order, in this case)
 lit:Shakespeare b:hasChild (b:Susanna b:Judith b:Hamnet) . 
b. rdf:type is usually abbreviated to "a", so
lit:Shakespeare a lit:Playwright
rather than
lit:Shakespeare  rdf:type  lit:Playwright .
 The authors then quickly refer to RDF/XML - a method of representing RDF serialisations for the web, and also to "blank nodes", which allow for the representation of resources with no Web identity. They give the example of Shakespeare's mistress, the inspiration for sonnet 78 (at which point I got distracted and went off to read about poetry, as that sonnet was inspired by a young man – most likely Henry Wriothesley or William Herbert – not a woman). I will probably regret that by half way through chapter 4, but heigh-ho.

Anyway, "if we don't want to have an identifier for the mistress … RDF allows for a "blank node" or bnode for short" which "is indicated by putting all the triples of which it is a subject between square brackets as in
[ rdf:type bio:Woman;
        bio:livedIn  geo:England ]
Or, as it should be
 [ rdf:type bio:Man;
        bio:livedIn  geo:England ]
 And that's it for chapter 3. If I hadn't made a commitment to blog my way through this book, I'd probably have given up already…

 That's me, that is.

Wish me luck for chapter 4…

Thursday, 16 April 2015

Semantic Web for the Working Ontologist: chapter 2

Chapter 2 is titled "Semantic modelling".

It explains that models are intellectual tools designed to "help people understand their world by forming an abstract description that hides certain details while illuminating others." In other words, the key to an effective model is working out which details to illuminate and which to conceal.

Models help with communication, explanation and prediction. They also "mediate among multiple viewpoints", by making commonalities explicit and enabling discussion about differences.

There's then a discussion of community tagging as a method of document modelling, with the caveat that as more people provide content, popular tags "saturate with a wide variety of content, making them less and less useful" for people trying to identify something specific.

According to the authors (and this is a key point):
This sort of problem is inherent in information modeling systems; since there isn't an objective description of the meaning of a symbol outside the context of the provider and consumer of the symbol, the communication power of that symbol degrades as it is used in more and more contexts. (p16)
To extrapolate, this is why Semantic Web standards require "modeling formalisms": so that there's an "objective description of the meaning of a symbol" that can exist "outside the context of the provider and consumer of the symbol".

Semantic Web standards follow a similar idea of class hierarchies "for representing commonality and variability" to Object-Oriented Programming. "High-level classes represent commonality among a large variety of entities, whereas lower-level classes represent commonality among a small, specific set of things." Which, I guess, means that lower-level classes are subsets of higher-level classes.

But what we need to add to that is the idea that "any model can be built up from contributions from multiple sources". And these multiple sources may well use different symbols to signify the same thing – or the same symbol to signify different things.

At this stage in the Semantic Web's development, it's often not possible to make a decision about which symbol representing something should be chosen as the symbol (the authors use the example of Pluto's astronomical and astrological symbols). But it should be possible to agree that each entity should be represented by only one symbol. So, in this instance "a model can provide a framework for describing what sorts of things we can say about something".

The chapter concludes with a list of Semantic Web languages from least -> most expressive: with the caveat that more expressive models are not necessarily better than less expressive models. To illustrate this, they show 3 different ways of "expressing" a water molecule, from the formula H2O, through H–––––O–––––H (showing how the components are joined) to a drawing that shows the relative sizes of the components and how they are positioned in relation to each other. Each is useful in a different situation.

The three Semantic Web modeling languages are:
RDF – the Resource Descripton Framework
RDFS – The RDF Schema Language

The next few chapters introduce these languages. And this, I suspect, is where I'll start to become unstuck. Usually, with technical books, I'm fine with the concepts. But when I start having to actually get to grips with the nuts & bolts of languages I turn into the dog in that Far Side cartoon:

Still, at least I'm not quite the cat… At least I hope I'm not… Will let you know in next week's installment…

Thursday, 9 April 2015

Semantic Web for the Working Ontologist: chapter 1

Ok - here goes.

Chapter 1: What is the Semantic Web?

I now understand the difference between

* URLs (Universal Resource Locators), the "addresses" that enable search engines to locate and show individual web pages
* URIs (Universal Resource Identifiers), that allow search engines to find and link individual items of information within different web pages.

I also know that the "data model" used in the drive towards international consistency in the way that individual pieces of information are described is called the Resource Description Framework abbreviated to RDF. And that an "ontology" is simply the attempt to classify information in a document (or elsewhere) in a way that fits a particular, agreed standard.

I had, of course, heard all those terms before. What I didn't understand was why the standards exist, what they enable to happen, and critically, the assumptions that underpin them…
  • the AAA slogan: Anyone can say Anything about Any topic, which means that we can't assume that
    • two entities with different names are different things
    • two entities with the same name are the same thing
    • different people will agree that the same thing should be classified in the same way
  • the Open World Assumption: there is always more that can be said - which means that we "may draw no conclusions that rely on assuming that the information available at any one point is all the information available." (p10)
A key quote for me from this chapter is "The Semantic Web is… about coping in a world where not everyone will agree, and achieving some degree of interoperability nevertheless." (p9)

Love this acceptance of messiness and the idea of such a huge classificatory project that acknowledges the eternal impossibility of getting everything right, but is still going for it, anyway…

Would be grateful if people could let me know if I've misunderstood anything…

Saturday, 4 April 2015

Getting my head round the semantic web

A big part of what will determine whether HayleyWorld will work for readers will be how effectively I use keywords to sort and order the edited quotations from William Hayley's Memoirs that I've entered into my research database, and which I'm using as the core narrative(s).

At the moment, along with details of the volume and page numbers in the Memoirs from which each extract is taken, and, where appropriate the date(s) on which they happened, I have three sets/fields of keywords, which I've labelled –
  • "cast" ~  the people who are mentioned or referred to in the extract, or who are present at a specific event
  • "place" ~ places mentioned or referred to in the extract
  • "keyword" ~ which should, really, be called "subject", but keyword is what I called it when I originally created the database in the late 1990s as a research repository rather than a writing tool. This last field contains word lists describing what the extract is about. So, for example, an extract about mental health issues – one of Hayley's main interests – generally include the descriptors "mental illness", "depression", "melancholy" and "madness", but may also be found alongside "sensibility", "spirits" and other words that Hayley and his circle often used/which contemporary commentators use in writing on the subject.
But – how does this all relate to the semantic web? And why am I trying to get my head around it?

Because my database is (well, sort of) a primitive take on the tools and methods developed to create the semantic web. And understanding how "ontologies" (essentially sets and hierarchies of keywords) are constructed and connected should enable me to make HayleyWorld a better, more effective thing.

So, I have bought a book – Semantic Web for the Working Ontologist by Dean Allemang and Jim Hendler – recommended by Andrew Hugill, Director of the Centre for Creative Computing at Bath Spa University.

But here's the rub. Despite my lifelong love of books, I have never managed to finish reading a technical one. At each atttempt, I've run out of brain long before I've run out of pages.

So, this time, I've decided to blog about each chapter as I read it. I'm hoping this will carry me through/enable me properly understand/internalise the concepts and information I encounter. I'm aiming to do one chapter a week, but make no promises. Other priorities may intervene. And I may run out of brain as fast as usual. But I'm gonna try.

Wish me luck…

Friday, 3 April 2015

Reader journeys

One of the most difficult aspects of making HayleyWorld to get my head round has been imagining and designing the possible reader journeys through the text. How can we (my development partner & I)
  • personalise a reader's journey while still retaining an element of control?
  • make sure that readers can take different paths through the narrative and still read/experience something that feels like a coherent narrative?
  • make readers feel like they're getting to know William Hayley in a away that mimics encountering him in real life?
I'm still near the start of the reader journey design process, but I have a start point that Contentment are in the process of implementing for our first prototype (which should be ready within the next couple of weeks).

At the moment, the idea looks like this:
Please email me for a machine-readable version

As you can see, I still have questions about how much content to deliver before giving readers a chance to either choose more, or move on to another topic. But how will they get there in the first place?

It took me a while to work out how the app should start. But then I realised that it should begin in the way that William Hayley characteristically began his relationships with friends.

So, I embarked on my first element of semi-fictionalisation. I edited together – mostly from phrases/lines Hayley actually used, but with a few of my own (mostly obvious) tweaks  – a Hayleyan letter and sonnet, addressed to the reader…

"The past


My dear Friend

Please allow me the liberty of addressing you thus as, I believe I have reason to hope, such you will soon become. Having some important Topics I wish to share with you, I have thrown them into here, and entreat your acceptance of the following history, that you may peruse at the greatest Ease to yourself.

Before you begin, I would beg you to look on my peculiarities with affectionate indulgence and I hope only that what follows may be thought worthy of you to whom I have taken the liberty of addressing it.

Ever your affectionate



Kind Friend! who reads in far off future years,

I pray thy generous heart will care to know,

A poet who lov’d to laugh yet oft shed tears

And wishes this – his story – to bestow.

Accept this volume:- through these pages flow

Sincere effusions of a long-passed mind!

Herein the untold truths my heart will shew,

Of artists, poets, friends and loves combin'd!

In this fair copy may thy eyes discern

The turns of mortal life herein outlin'd:

Whate'er the lessons from these words we learn

May all their blessings in you be entwin’d!

Thine be that joy which Friendship's bosom fills;

And thine eternal peace, devoid of ills!"
 That's the beginning that every reader will see. But then how to set each one off on their particular journey?

My initial plan was to give them a long form at the beginning. But my research into how people develop and sustain relationships led me to realise that this would be wrong. I need to create the illusion of conversation, which has two main implications.
  • the reader needs to be asked for smaller tranches of information about themselves at various (but not too many) points in the app
  • to have any chance of creating and sustaining the illusion that readers are getting to know William Hayley, they couldn't just be given a series of impersonal forms to complete. He would need to ask them questions. And would need to do so in an entirely Hayleyan manner…
So I wrote this. It is, I think, is more fun than an impersonal form, and has the advantage of allowing for "conversation" between Hayley and the reader without the reader being able to ask the sort of questions that an app will be unable to answer…

please email me for a machine-readable version

Also, given that the act of exploring HayleyWorld means readers are taking an interest in William Hayley, isn't it only right and proper that he should show an interest in them?

Blogging hiatus

I haven't blogged about making HayleyWorld for months. This isn't because I haven't been doing anything to make HayleyWorld (although life has intervened and I am now a part-time, not full-time PhD student. Rather, the work I've been focusing on left me with little to add to the blog. Eliza remains fascinating, as does the relationship between her, William Hayley, and Thomas Alphonso Hayley's mother Mary Cockerell. I will also have more to add on the subject of first/third person, as I'm in the midst of a bunch of reading around it.

Mostly, I've been working with my development partners at Contentment towards producing a first prototype. And that's taking time, as – of necessity – it's a side project for them, and one that, inevitably, throws up a bunch of unexpected and unpredictable development issues that need tackling…

But, if I'm completely honest, those are excuses.

My brain simply doesn't come up with ideas for blog posts all the time (to be fair, it has one or two other things to focus on, so I shouldn't be too cross with it). However, yesterday morning it did, and a these will follow over the next few weeks, with the first coming soon after this one. Soon, in this instance, = an hour or so after this one. Unless I don't finish it before I have to go out and eat fish & chips with friends, in which case it'll = tomorrow. But you didn't want or need to know that, did you?