1 Scientific and Technical Quality

1.1 Concept and Objectives

Creative behaviour in people is multi-faceted, hugely valued by society, well studied, but still mysterious in many ways. One aspect which is clearly central to much creative behaviour is the generation of novel fictional ideas. For instance, an artist might have the idea of a bunch of flowers where every flower is in fact a snowflake, and paint this; or a writer may invent the idea of a cat that can fly and develop adventures for it to have. In the creative arts and the creative industries, the production of fictional ideas around which to write stories, paint pictures or design advertisements, is an essential activity. Such ideas distort reality for humourous, playful, coercive, disturbing and generally thought provoking reasons, and exist in the minds of their creators independently of the ways in which they are presented to audiences. Thinking up such ideas is one way in which we express ourselves creatively. For instance, here are two such ideas found recently on Twitter by searching for the phrase 'What if':

Computers cannot yet come up with such interesting ideas as these, and we would like to change that.

As described in section 1.2, in Computational Creativity research, in order to make progress towards implementing au- tonomously creative systems, we have largely focused on the generation of artefacts in certain application domains, such as paintings, poems, games and musical compositions, and we have broken down multi-faceted creative behaviours into compo- nent parts related mostly to the generative aspects of creativity (producing new material) and the critical aspects (determining the value of that material). It is fair to say that the majority of the research in Computational Creativity has been devoted to designing software able to produce finished artefacts without the software explicitly undertaking idea generation. This has not proven to be hugely problematic, as in many cases, people are naturally able to read ideas in the artefacts generated by software. For example, a poem generated in a simple way using a template may contain enough information for a reader to interpret a novel idea about the world described in the poem, but it is important to note that the creative idea generation here was done by the reader, not by the software. Of course, in certain areas of Artificial Intelligence (AI) research, most notably Machine Learning, concept formation is the point of the exercise, and such concepts are a type of idea. However, the concepts formed tend to be used to describe and categorise real-life data. Hence, they were not designed explicitly to be fictional for the purpose of provoking thought in the same way that a painter might invent an aspect of an alien world, or a writer might think up an unlikely personality trait for a character. Moreover, there have been experiments in Computational Creativity circles to study the idea generation potential of theories such as concept blending in a computational setting. In addition, there have been projects where generation of linguistic material such as metaphors, similes and neologisms can be seen as similar in nature to the kinds of communicable ideas at the heart of human creations. These projects have shown the potential for automated idea generation, but they were relatively small-scale and, while there was some evaluation of the concepts and phrases produced, there were no attempts to consider the generated artefacts as cultural interesting ideas, in order to estimate the value of them in terms of the stories that can be told with them, or in terms of the ways in which they can be expressed.

With the WHIM project, we aim to undertake the first large-scale study of how software can invent, evaluate and express fictional ideas like those presented above. There are many ways to automate idea generation (which we will often refer to as ideation), and coming up with new ideas is a difficult process which requires much world knowledge. For instance, implicit in the second What-If tweet above is the fact that birds are often found in high places; that birds sing and screaming is different to singing; that birds are animals, as are people, and people sometimes get scared of heights, and so on. In collating the knowledge required to lead to the statement and evaluation of such ideas, we see that there are big issues in collecting enough specialised and general knowledge to begin the ideation proces. The first major research question we will therefore address is:

Analysing such knowledge bases - which will largely contain ostensibly truthful information about real-world situations and data - in order to opportunistically derive fictions which could be (but aren't) true in the domain represented will provide the next major hurdle for computational ideation. Given the vast amount of untrue, pointless things it is possible to say about the constituents and contexts of a given world view, much innovation will be required to determine the nature of valuable fictional ideas, and how to produce them. The second major research question we will address is therefore:

A idea which makes sense is not necessarily an idea which excites the mind. For instance, the idea: [What if there was a chair with five legs?] is coherent and it has saliency, given that any knowledge base about chairs is unlikely to contain five-legged examples. However, it takes some work to imagine a scenario in which a five-legged chair would be of particular interest. Hence, this idea is unlikely to enthuse people to play around with it in their mind by dreaming up humourous or dangerous or ridiculous scenarios in which the idea features. A good fictional idea distorts the world view around it in useful ways, and these distortions can be exploited to spark new ideas, to interrogate consequences and to tell stories. But it will be very difficult for software to determine which ideas are good and which are bad. Hence our third major research question will be:

Audience appreciation of the value of an idea is often relative to the way in which it is presented. While the snowflake-flowers idea mentioned above is clearly suitable for a painting, it would be nearly impossible to communicate this through the medium of music, and the idea might not be substantial enough to be the central plank of a story. Even within a particular medium such as linguistic presentations of ideas (which we often call renderings), the choice of genres - such as poems, narratives, dialogues or prose - can increase or decrease how much an individual audience member engages with an idea. Hence, our fourth major research question is:

Of course, it is audience appreciation that drives us: we want to build software which generates ideas that we wouldn't have thought of, but which amuse and entertain, provoke and challenge people. As with human creators, who are given feedback about their ideas from friends, family and audiences, it should be possible for our software to automatically adjust its processes in response to audience feedback. Hence, our fifth and final major research question, is:

In the text below, we will describe a particular approach to automated ideation which will endeavour to provide answers to each of the research questions above. The truth of these answers will be demonstrated via a formalisation and engineering methodology. That is, we aim to derive a formal model of creative fictional ideation which is sufficiently precise to implement in software. The results from running the software and an appraisal of the feedback about these results will show that ideation according to our approach can produce novel and thought provoking ideas of real interest to people. This software will be built in a modular way with each of the WHIM consortium teams developing code to address one of the research questions above, but the modules will be part of an overall, constantly evolving, implementation called the What-If Machine. We don't claim that our answers to the above questions will be the only ones, as our approach is only one of many which we expect to be fruitful for automated ideation in the future. However, if we are successful, we do claim that the What-If Machine will be the first of its kind: creative ideation software that you can learn to trust to produce, motivate and present ideas of real cultural value. This would be a startling advance for Artificial Intelligence research and practice.

1.1.1 An Ideation Approach

Rather than opting for an all-encompassing exploration of automated ideation, we have chosen a single approach which, while risky, we believe has every chance of success, leading to the first proof that software can produce novel and culturally valuable ideas. We hope that this will inspire future projects which investigate other approaches to idea generation, by showing that it is possible. While it might seem obvious to state this, our approach will be idea-centric. That is, the main output of the system will be the ideas, and while other artefacts such as knowledge bases, poems and narratives might be generated, these are secondary and all other processing around the ideas will be for the purpose of evaluating them. Having said that, as mentioned above, for experimental purposes (in order to receive the most pertinent feedback about the ideas), it may be advantageous to render them in certain ways - this is a hypothesis we aim to test. While there is experience in the consortium in the automatic generation of types of artefact such as paintings and video games, we will restrict ourselves to the linguistic domain, as this is most closely allied with the conceptual domain of ideas. That is, while the ideas produced could be represented in visual, interactive or even aural forms, we will aim to render the ideas as natural language artefacts such as neologisms, poems, stories, dialogues and prose. There is much experience in the consortium of automating such creative language generation. We will also use linguistic analysis techniques in answer to the main research questions above, i.e., how to collate relevant world knowledge, how to form ideas and how to evaluate those ideas. To describe our approach, we will look at its three main components, namely: formalisation, implementation and experimentation.

As is common in Artificial Intelligence projects, we will employ a formalism first methodology. This means that, prior to implementation of the whole system, and similarly prior to new code modules or major updates being implemented, we will carefully, and formally (i.e., in an implementation-independent way) define/redefine: (i) the way in which we represent the information passed around the ideation system, and (ii) the processes by which that material is manipulated in order to form ideas. Central to this formalism will be the notion of an f-idea. In particular, we will define an f-idea in terms of the f-world that forms its context, how the f-idea can be expanded in terms of f-narratives about it and how the f-idea can be presented in terms of f-renderings. The contents and nature of these f-constituents will develop as the project progresses, but in general they will include raw text, analyses and summarisations of the texts, factual knowledge including concept definitions, relationships between concepts and chains of concepts in loose reasoning schemes, representations of narrative arcs and information about the nature of the rendered natural language forms.

The tools at our disposal for the formalisation of representations include various logics such as first/second order logic, other knowledge representation formalisms such as ontologies, databases and semantic graphs, NLP representa- tion schemes such as n-grams, bags of words, dependency graphs and semantic frames, Artifical Intelligence schemes for information such as constraints and plans, and of course software-based representations such as the class-object-field schemas from object-oriented programming. The tools at our disposal for the formalisation of processes include formal algorithm presentation schemes, pseudocode, UML diagrams, flowcharts, temporal and other logics, behaviour trees, for- mal modelling languages such as Z and CASL and of course, mathematical expositions of calculations performed by the system. In a short requirements-engineering task at the start of the project, we will estimate the level of expressivity in the representation of knowledge and processes required for us to understand the software that each other has written. We will use this analysis to specify languages and formalisms for describing the data and processes. Naturally, the formalisms will change during the course of the project, but within a formalism-first research culture, each team member will be accustomed to systematically updating documents describing the formalisms employed. This will hopefully minimise any misunderstanding between partners, thus enabling us to understand the other software modules without reference to the code they contain. In addition to enabling a smooth interaction between project partners, the formalisation will be published as a major deliverable from which other teams can learn about how we enabled creative idea generation (again without having to dissect our code).

As is also common in AI projects, we will answer the question of whether software can fruitfully generate ideas by demon- stration, i.e., by building a system that is clearly doing so. Ultimately, when fully implemented, the What-If Machine - which will provide such demonstrations - will sit on a server hosted at Goldsmiths College. It will be operated through a web interface, and will take input guidance from the user (search terms, links to web material, etc.), or be asked to work with trending topics of the day, which it finds automatically from social media and news sources on the internet. Once the user clicks on the Go Button, the software will automatically form, evaluate and present linguistic renderings of ideas clearly related to the topic specified. Employing the formalism described above, for the implementation, we propose a relatively straightforward modular approach with five major processing units, each of which interacts with one or more of the others via knowledge exchanges (KEs), which provide inter-lingua functionality enabling world knowledge, ideas, narratives and renderings to be passed amongst the modules. While the formalism will add common structure to the information being passed around the system, there will still be the need for the information stored in one format to be manipulated into that of another, which will be done within the knowledge exchanges. A system overview of the modules and knowledge exchanges is given in figure 1. As described in the work plan below, there is a separate work package (WP) for the building and testing of each module. The modules are as follows:      

Figure 1. What-If Machine system architecture

In response to question R1 above, we propose to build structured, but relatively shallow knowledge bases via web-mining and natural language processing (NLP) techniques, in particular using wide-coverage open information extraction (Open IE) and ontology construction methods. This will draw on a wealth of freely-available NLP and data mining software, some of which has been built by WHIM consortium partners, but will also require many innovations to form subsets of available knowledge requisite for ideation. The resulting knowledge bases will be instantiations of the formal f-worlds, and will consist of fact triples, semantic networks, ontologies and other information, which will feed into the ideation generator, and the narrative generator. The details of the technologies and research we will bring to bear on the WVB module are described in WP2.

At the heart of the system will be the idea generator, which will take f-world inputs from the WVB module and produce the first draft of f-ideas, which will embed instantiated f-worlds, and be extended later to contain f-narratives and f-renderings. By extending and applying to the f-world both metaphor and joke generating techniques (which already perform limited fictional ideation), and by further subverting categorical information found in the world view, we will implement f-idea generation techniques to deliver fictional ideas appealing to semantic tension and incongruity. The details of the technologies and research we will bring to bear on the IG module are described in work package WP3 below.

There are many ways in which to attempt to evaluate the ideas in terms of various criteria, such as how much they distort world views, how subversive they are, how intriguing or amusing they are (independent of any joke-based rendering). In building the IE module, we will concentrate on the question of predicting whether people will be able to 'run with' the idea when they are exposed to it. By this, we mean that certain ideas have the potential for more expansion than others, where an expansion might mean a scenario within which it could be expressed and utilised or a set of other ideas which flow from it. In particular, certain ideas can be expanded through a story within which the idea features, and the IE module will take f-ideas and the f-world context they came from and employ novel idea-centric automated narrative generation techniques to build narratives around each idea. The quality and quantity of these narratives will be assessed to estimate the value of each f-idea. The details of the technologies and research we will bring to bear on the IE module are described in work package WP4.

To properly demonstrate that the What-If Machine is producing valuable fictional ideas, we need to collect and analyse feedback from audiences asked to think about those ideas. People can consume such ideas in their raw form as a What-If statement (as per the tweets above), but often the ideas are wrapped up in certain linguistic forms. As these presentations can help people to understand and appreciate the idea better, we should be generating and presenting such linguistic forms as poems and stories in order to collect the feedback from people about the ideas in a realistic way. The IR module will take f-ideas filtered through the IE module, containing information both about the f-world context and the f-narratives available, and produce linguistic artefacts. To enhance the artefact generation, we will study ways in which relevancy to a topic and the expansion potential of an idea are appreciated by people. We will also experiment with obfuscation and affect in order to see whether it is possible to increase the general impression people have of the value of the ideas. The details of the technologies and research we will bring to bear on the IR module are described in work package WP5.

We want to be absolutely sure that we have positively answered the question of whether software can generate valuable fictional ideas. Hence, we will be performing serious testing of the software by asking audiences of various types to provide feedback about the ideas that various prototype versions of the What-If Machine producees (as described when discussing experimentation below). This provides an opportunity to not just test the system, but to automatically learn regularities and rules about the feedback we get, and use these to improve the way in which the What-If Machine operates, as a real-time response to feedback. The AM module will involve crowd sourcing software to collate and analyse opinions people have about the fictional ideas produced. These opinion texts will then be processed using sentiment analysis to find patterns of public mood. Moreover, cross- context bisociative analysis techniques will be employed to produce bisociative narratives which we hope will be able to further help the What-If Machine to improve its functioning. The details of the technologies and research we will bring to bear on the AM module are described in work package WP6.

The sketch of the overall system architecture given in figure 1 depicts our current thinking, which is that the WVB, IG, IE and IR modules will pass around semantic information in a linear fashion. That is, the WVB runs at the start in response to some stimulus (user guidance or trending topics), passes f-worlds to IG, which in turn passes f-ideas to IR, which in turn passes f-ideas (now containing f-narratives) to IR. However, each module will be able to pass back information via the knowledge exchanges to the modules supplying it, in order to ask for more information based on its partial processing. In particular, we think the IE module may need to go back to WVB to gain more world view knowledge than is provided by the IG module alone. Hence the WVB and IE modules are linked by a knowledge exchange. The AM module will be sent information from each of the other modules, which will identify themselves and be supplied with relevant parameterisations, and calculations pertaining to an audience model via processes which have been machine learned from crowd sourced data. Naturally, we could implement each module with its own audience model, but as we expect the AM model to change dynamically in response to continual feedback, it makes sense to keep the AM module as a service which all the others use.

The third major aspect of our approach is the experimentation we will perform. At the start, none of the modules will be able to provide input to the others. Hence, we will hand-craft fixed data to simulate that which will be input by the other modules. Then, as prototypes of each module are constructed and made available to the whole consortia, the link up will allow us to discard the fixed data and perform more realistic experiments. These experiments will involve unit testing the individual processes in each module for correctness against the formal specification. In addition, we will need to determine the value of the results coming from the software. As is usual in Computational Creativity circles, for much of the time, we will be the Guinea pigs in the testing, i.e., we will assess the quality of the world views, ideas, narratives and renderings. However, such subjective testing only provides an approximation of the general impression people will have of the generated ideas, and it is very important to regularly test the output of the What-If Machine on members of the public. To do this, we will set up a crowd sourcing framework within which both the general public and specialised groups of people from the creative industries will be exposed to the ideas. We will employ summary measures of the value that people assign to the ideas, along the lines described in Colton et al. (2011), where audience members are asked their opinions about the amount of time/thought they expended in grasping a concept, and whether they like the idea or not. As described in the work plan, there will be three periods of system-wide experimentation, roughly at the end of each of the three years of the project. We will work intensively up to the start of these periods in order to get a working prototype able to generate and present ideas.

We will use pilot studies with smaller numbers of people (10-20) in order to fine tune the experiment before engaging in experiments with much larger numbers of participants. Following the pilot studies, we will carry out the large scale studies of the opinions of thousands of people. In addition, we will organise exposing select groups from the creative industries to the ideas, in order to get more focused feedback from professionals for which creative ideation is part of their daily routine. These experiments will also serve as dissemination routes to the creative industries, and be organised as activities within the PROSECCO EU-funded co-ordination action. We will further carry over the results from previous experiments and employ statistical testing to (hopefully) show that with each latest prototype, there is a statistically significant increase in the perceived value of the ideas being generated. The results from the experiments will be analysed in order to suggest recommendations for improving the What-If Machine, and we will implement those recommendations. As the sophistication of the ideas increases, we will implement a passive daily service: 'Idea(s) of the Day'. In particular, new ideas generated by the system will be regularly uploaded to a FaceBook page in order for people to 'like' them and/or leave their opinions. The What-If Machine will also tweet an idea per day, and we will analyse the retweets and hashtagged (#whatifmachine) comments it raises. At this stage, the idea generation will be guided by the trending topics of the day. In the final stages of the project, the What-If Machine will go live as an interactive service operated through a web interface. Users will be able to direct the ideation, and leave behind opinions which, via the AM module, will be automatically used to improve the software.

1.1.2 A Worked Example: WHIM Can Imagine It For You Wholesale

Children are often excited by possibilities that adults dismiss as silly. They are not afraid to play with and talk about silly ideas. Children love puns, whereas adults often feel the need to apologize for these simple kinds of humour ("pardon the pun!"). Young children have not yet developed the internal censors that tell them to disdain silliness and to guard what they say. As a consequence, they revel in the possibility that the conventional wisdom and the rules of the grown-up world are faulty, and cannot wait to expose these faults. In sharp contrast to the many AI projects that aim to capture the rational abilities of logical adults, WHIM will aim to capture the often silly and free-wheeling imagination of uninhibited children. To freely generate silly but thought-provoking ideas that can be defended with further linguistic evidence - and elaborated into short stories and poems - we will design the What-If Machine to revel in puns and bad jokes, to over-stretch and reanimate dead metaphors, and to look at conventional language with fresh eyes, as though for the first time. As an example, consider the topic memory, and the kinds of "what if" questions that an imaginative child might pose:

The last what-if question above formed the basis for the Philip K. Dick short story "We can remember it for you wholesale", which subsequently inspired the hit movie "Total Recall". The recent film "Inception" explored a related idea, that of memory thieves and memory hackers, and posed similar what-if questions. Science fiction is a genre that, like children, revels in the imaginative possibilities that common-sense is all to quick to dismiss as silly.

The ideas evoked by these questions straddle the boundary between sense and nonsense. They represent precisely the kind of ideas that the What-If Machine will be designed to create - the ideas that challenge our common-sense assumptions, and which require some (but not too much) suspension of disbelief. So where do these ideas come from, and can a computer system like the What-If Machine drink from the same well? These ideas are all suggested by fragments of everyday language, and by deliberate and playful mis-interpretation or over-interpretation of those fragments. For example, the common phrase "memory bank" literally denotes an array of electronic storage devices for storing computer memories of ones and zeroes. A memory bank is not a financial institution that lends money and keeps our valuables safe from theft. Even a simple electronic dictionary like WordNet can tell our software systems as much. However, by deliberately mis-reading the "bank" of "memory bank" in its financial sense, we are prompted to generate a raft of interesting "what if" questions. Another common phrase that prompts silly-but-interesting questions in this way is "memory store". The word "store" here does not denote a shop, but another electronic storage device. However, by deliberately mis-reading "store" as meaning the same as "shop", and "memory" as denoting a human recollection rather than a computer pattern of ones and zeroes (in fact, these are the most common readings of these words) a system like the What-If Machine could generate provocative "what if" questions about the buying and selling of human memories. So,

It is easy - perhaps too easy - for a computer to generate deliberate misunderstandings like these. In effect, the What-If Machine would be turning a commonplace bug (computers frequently make such errors when assigning senses to words) into a valuable feature. However, it must do much more than generate deliberate misunderstandings: it must ponder the questions that emerge from these misunderstandings to determine which have real merit and which are mere nonsense. In effect, it must do what all Computational Creativity systems must do to earn the label "creative" - it must go beyond "mere generation" and evaluate its own what-if questions critically and rigorously, to keep the best and discard the rest. As described in the work plan later, two strategies for focusing on the most productive misunderstandings are described in work package WP3: Taking metaphors seriously and Taking jokes seriously. Both strategies are fueled by commonplace patterns of language that are ripe for mischievous mis-reading (such as lexical ambiguities) or over-reading (such as conventional metaphors). The What-If Machine will first retrieve patterns that are relevant to the topic at hand (such as memory) from a large corpus of everyday language (such as texts from the Web, or the Google n-grams) and filter those that cannot support multiple competing interpretations. Of these productive candidates, it will select the phrases that are most systematically linked to a body of related linguistic evidence.

For example, consider again whether we can truly buy and sell memories. Corpus analysis and information extraction will reveal that people do indeed speak (if only occasionally) of "buying memories" (this phrase has a frequency of 56 in the Google n-grams), of "selling memories" (an n-gram frequency of 190) and of "stealing memories" (a frequency of 59). The What-If Machine will focus only on the what-if possibilities that are supported by the most linguistic evidence and which can be fleshed out in the most elaborate ways. This latter point is crucial, since these possibilities must contain enough meat to later support an interesting poem or short story. But if phrases like "buying memories" and "selling memories" and even "memory thief" can already be found in Web corpora, surely the software is just exploring well-trodden ground and not actually generating novel what-if possibilities? The key lies in how the what-if questions are posed, how they tie together disparate fragments of corpus- supported ideas, and the degree of imagination displayed by the composite whole. Like a curious child, the What-If Machine will be willing to entertain silly possibilities, and to go further than is strictly supported by either logic or the corpus evidence. So, using the strategies Taking metaphors seriously and Taking jokes seriously, it will suggest that "memory stores" are real shops for buying and selling real memories, and that "memory banks" are real banks for borrowing and safeguarding real memories.

However, an important part of the puzzle has yet to be clarified. Before the software can stretch its imagination on fanciful topics such as "memory thieves" and "memory shops", it must already possess a common-sense understanding of the workings of real banks, of real shops and of real thieves. It would be easy to suggest that this understanding can also be derived from large- scale corpus analysis, but tacit common-sense knowledge is rarely articulated in explicit terms. Previous research by consortium members (Veale and Li (2011)) has explored this very issue, and shown that high-quality elements of common-sense knowledge can be harvested from the Web by specifically targeting the presupposition-rich questions that people ask of each other. To see how informative these questions are, from a computer's perspective at least, simply go to the Google search box and enter the partial question "Why do banks". In response, Google will show the most popular completions of this question, as provided by its many other users to pose similar questions. Popular completions include "why do banks lend money?" and "why do banks lend to each other?". Veale and Li describe how large amounts of common-sense knowledge can be milked from Google in this way. In addition, they show how the sparseness of the knowledge - Google does not provide meaningful completions for every topic - can be remedied using a practical representation called a conceptual mash-up. But most of all, these authors show how the questions that are harvested from the Web can be re-purposed for new topics, to allow a computer to introspectively pose new questions of itself. It is this ability that the What-If Machine will rely upon to formulate its own what-if questions and thereby explore that intriguing space between sense and nonsense.

1.1.3 Summary of the WHIM Concept

We plan to engage in cutting edge automated ideation research within a Computational Creativity context, in order to demonstrate that it is possible for software to create and disseminate thought provoking ideas on chosen topics. The following is a summary of how we plan to provide this demonstration with the What-If Machine:

1.1.4 Objectives

Overall, our objective is to show the world that software can express creative behaviour of the type that produces novel and valuable fictional ideas. From a scientific perspective, we aim to propose plausible methods by which computational processes can invent, evaluate and present thought provoking fictional ideas; and to prove that these methods are valid through thorough experimentation. In order to achieve this, we intend to answer research questions R1 to R5 above according to the formalisation, implementation and experimentation approach previously described. To do so, there are five main areas in which we will need to innovate: data mining over web resources, information extraction from text sources, linguistic-based idea generation, narrative production and idea-centric linguistic rendering methods. In each of these areas, we will put forward falsifiable hypotheses regarding the nature of the processes that we implement and the quality of the output our software produces. Further details of these hypotheses are given in the work package descriptions below, but as examples, we expect to test the hypotheses: that shallow information extraction can produce rich enough knowledge sources for ideation; that the subverting of category expectations is a good basis for idea formation; that there is a positive correlation between the value people assign to ideas and the quality and/or quantity of the narratives we can automatically generate about the idea; that rendering ideas in linguistic forms appealing to emotional modeling can lead to a heightened appreciation in audience members. We will perform well designed experiments in order to test the truth of these hypotheses, and we will analyse any failures to suggest fixes or alternative methods. All this will help us to put together a formal - computational - understanding of the particular creative process of fictional ideation.

From a technological perspective, we aim to solve the many engineering problems that arise in the design and implementation of the individual code modules, the integration of the system as a whole and the successful execution of the What-If Machine. These problems will be many and varied, and cover issues such as real-time performance: given the time taken to construct knowledge bases, search a potentially huge space of possible ideas, construct and assess as many varied narratives as possible, and finally package all this up in a poem or a story, it may not be possible to do this on demand in reaction to a user query, or it may be possible but only with massive parallelism. The issues will no doubt also cover knowledge representation: given the multitude of types of information required for ideation, it may be difficult to choose and implement flexible enough data structures. We believe that the What-If Machine will be a technical marvel: the first computational system that can demonstrate its creativity reliably through a particularly difficult activity: coming up with fictional ideas. If this project is successful, we expect that it will become quite difficult for people to avoid using the word 'creative' to describe the system as it clearly requires skill, appreciation and imagination to produce, evaluate and present the kinds of high-quality ideas we hope for.

Figure 2: Objective 8.1c from the Information and Communications Technologies Work Programme 2013.

1.1.5 Relevance to the ICT Programme

Figure 2 contains the extract from the EU ICT Work Programme document relating to expected outcome 8.1c from projects addressing challenge 8 (ICT for Creativity and Learning). The WHIM project fits extremely well with respect to this challenge, and especially with this targeted outcome. In particular, we will produce a plausible computational model of a process which is undeniably linked to creative behaviour, namely the invention of fictional ideas. Given our formalism first approach, this will surely represent progress towards formal understanding of creativity, and lead to an advance in the measurable capability of computers to produce useful, original and surprising results, as per objective 8.1c. Moreover, the contributions we aim to make to technological and theoretical insights on creativity have been mentioned above: on the technological side, the What-If Machine will represent technology far superior to any that is currently presented in Computational Creativity forums; on the theoretical side, the formal model of automated ideation will be available to all as a well developed instance of creative behaviour being captured in algorithmic form.

This is an Artificial Intelligence project, where the aim is to push the technological possibilities of creative software to breaking point, then innovate our way past the difficulties. As in the majority of Artificial Intelligence projects - certainly most of the ones we've been involved in - we take inspiration from understanding of human intelligence, often turning to results from psychology, cognitive science and philosophy in our search for insights into how to simulate creativity. Much of the time, this is not just advantageous, it is necessary... for instance, try implementing computational models of humour without reference to the psychology literature of humour theory (Raskin (1985); Attardo et al. (2002)) on the subject of what people find funny and why. The majority of Computational Creativity projects are designed to produce novel artefacts for people, hence reference to the sciences of humanity are necessary. This project is no different, and we will constantly appeal to the most recent relevant theories from neighbouring disciplines to get the most out of our software. Moreover, using analyses of crowd sourced responses to generated ideas, we aim to give back to these disciplines with some regularities we find in human idea consumption as described in WP6 below. While this is a secondary aim of the project, we hope to show that an approach based on bisociative relation learning over opinions that people have about the What-If Machine's ideas, will lead to insights about how people consume ideas, and that this will contribute to an overall understanding of human creativity. Finally, as mentioned a few times previously, we will answer research questions R1 to R5 with technological demonstrators carried out by the What-If Machine. These will clearly be proofs of concept in innovative autonomous creative systems, as required by Objective 8.1c. Moreover, we have identified pastiche as a particularly thorny issue in Computational Creativity, as there have been many projects towards getting software to create in a style of a given artist/musician/poet/etc., and these have identified that such generative software is not necessarily ever going to be seen as truly creative. For instance, in Pease and Colton (2011b), we have discussed why Turing-style evaluation tests might be damaging to Computational Creativity, as they encourage pastiche and naivety. Hence we know more than most about pastiche, and - as encouraged in objective 8.1c - the What-If Machine will be designed from the very beginning to transcend what has come before it, whether computer or human.

1.1.6 Summary of WHIM Objectives

In Computational Creativity research, we have already shown that software can produce culturally valuable artefacts: automat- ically produced paintings have been sold (Colton and Pe ́rez Ferrer (2012)); people have paid to attend concerts where one of the performers is a software system (Pachet (2003)); mathematical theorems discovered by computer have been published in the mathematical literature (Colton (1999)); anthologies of computer generated poems have been published (Gervas (2010)) and popular board games invented by a computer program and sold in physical form (Browne (2011)), and so on. However, as we argue in section 1.2.2 below, even the most notable software is rarely perceived as being independently creative. This is in part because creation without creativity has been the norm in Computational Creativity projects. That is, because high valued artefacts produced by uncreative software can be interpreted as coming from an intelligent, creative being, we have largely allowed this, instead of building truly creative software. We believe that creative ideation will be a necessary condition for software purporting to be truly creative. Our main objective in this project is therefore:

There will be many hurdles to get over along the way to achieving this aim, and in addressing them, in summary, our objectives will be to show that we have answered the five questions R1 to R5 above. In particular, with the research in work packages WP2 to WP6 respectively, we will have to show that:

In the long term, if we are to bring about a new era in which software acts as our creative collaborators, then we will need to tell the world about the results from projects such as WHIM. While crowd sourcing opinions about ideas will form part of work package WP6, and will clearly expose many people to the notion that software can create ideas, as described in work package WP7, we will have to work harder to change the general impression of software being uncreative. Hence, we will aim to:

1.2 Progress Beyond the State of the Art

1.2.1 Computational Creativity

In Colton and Wiggins (2012), we provide a working definition of Computational Creativity research as:

At one end of the spectrum, many researchers in Computational Creativity are interested in building creative systems from the standpoint of an intellectual challenge, i.e., pushing the limits of Artificial Intelligence systems to breaking point, necessitating progress. At the other end of the spectrum, many researchers are interested in computational modelling of human creativity, i.e., performing cognitive science simulations to learn more about ourselves. The software we build ranges from fully autonomous systems, able to create interesting and valuable material with minimal input, to systems which are designed to aid people to be more creative, whether at the level of a muse, a tool or a collaborator. The overiding theme throughout the field is that the research must have a computational element, discussing software which explicitly takes on some of the creative responsibilities in arts and science projects.

In the early days of the field (with focused workshops going back around 15 years, (as described in Cardoso et al. (2010)), the focus was on building software which could generate culturally interesting artefacts automatically. Over the years, there has been an explosion of systems able to generate linguistic artefacts such as poems, jokes, neologisms, plot lines, stories and dialogues; visual artefacts such as sketches, paintings and collages; musical artefacts such as melodies, harmonisations and lyrics; interactive artefacts such artworks and games; and knowledge artefacts, such as mathematical concepts, and scientific hypotheses. This progress was mirrored by other developments in Artificial Intelligence geared towards knowledge generation, such as in constraint solving (the production of examples), machine learning (the production of concepts and rules) and automated theorem proving (the production of proofs). The development of such generative systems has pushed the field of Computational Creativity forward, and has led to more focused communities being formed within particular application domains, such as procedural content generation for video game design and evolutionary art/music. However, to be of current interest to Computational Creativity, in general, research needs to be towards software which can take on higher level creative responsibilities, such as the employment and/or generation of novel aesthetic considerations, innovation at the process level, or the self-explanation of its motivations, source materials and processing.

Over the last decade, there have been a number of important steps towards formalising the assessment of progress in our field. Naturally, an important measure of progress is how more sophisticated the software that we build is. Under the assumption that software producing artefacts of cultural value must be exhibiting creativity of some kind, various formalisms have been derived around assessing the progress of software in terms of an increase in the novelty and value of its output, when aspects such as the nature of the input, typicality of results and the fine tuning of processes are taken into account. The most well known of these formalisms are Ritchie's criteria, as described in Ritchie (2007). Under a related but subtly different assumption that if software can produce artefacts indistinguishable under experimental conditions from those produced by people, then the software must be exhibiting some form of creativity, a number of researchers have turned to Turing-style tests to assess the output of their creative software, as discussed in Eigenfeldt et al. (2012). In such tests, human-produced and computer-generated results are shown side by side, and it marks a milestone in the development of a system if observers are not able to reliably tell the difference.

We have argued that, while such artefact-based assessment methods as Ritchie's criteria give approximations to the cre- ativity of software systems, they do not tell the whole story. In particular, such product-centric approaches can encourage over-involvement of the programmer/user of creative systems at the selection level, i.e., using a fairly straightforward generative approach, but then carefully choosing the best outputs, thus exerting more creative influence over the results than the software itselft (as described using the notion of a curation coefficient in Colton and Wiggins (2012)), yet claiming an inflated level of creativity in the software. In general, it is a difficult question to determine the creative input from people and from software in generative applications. Moreover, as programmers hand over more creative responsibility to software, the quality of the output can decrease, a phenomena similar to U-shaped learning, which we call the latent-heat issue in Computational Creativity. In Pease and Colton (2011b), we have further argued that, while Turing-style imitation games can bring much needed objectivity to an area where biases against the creativity and intelligence of software naturally arise (Moffat and Kelly (2006)), the imitation aspect of such tests encourage simulating two behaviours which are usually associated with uncreative behaviour, namely naivety and pastiche.

In principle, the word 'creative' as an adjective should be primarily used to describe the behaviour of people and the pro- cessing of software. The categorisation of creativity into exploratory, combinatorial and transformational types given in Boden (2003) laid the groundwork for a more formal discussion of the role of search in creative processing in Wiggins (2006a,b), which introduced a Creative Systems Framework. Our own contribution to the study of the creativity of software in terms of what it does rather than what it produces started in Colton (2008), where we proposed the Creativity Tripod as being required to support creativity. That is, we identified three main ways in which people can criticise software as being uncreative: in terms of its lack of skill, its lack of appreciation or its lack of imagination. We proposed that progress in the sophistication of creative software could be measured in terms of the increase in level of perceived skill, appreciation and imagination. In Colton et al. (2011) and Pease and Colton (2011a), we further formalised this by introducing the FACE model to describe the creative acts undertaken by software. We proposed that such acts could be categorised as producing (F)raming information to add value to creative acts, novel (A)esthetic criteria, (C)oncepts addressing those criteria and (E)xamples of such concepts. We also provided concrete calculations for quantifying the value of systems in terms of the creative acts they perform, suggestions for how the model could be utilised in various application domains, and comparisons of different creative systems performed using the model. As with the creativity tripod, we proposed the FACE model as a framework for driving forward actual implementations, as per the poetry generator described in Colton et al. (2012). We have also been at the forefront of addressing higher level topics in Computa- tional Creativity. These topics include: how software can show intentionality when creating artefacts (Cook and Colton (2011); Krzeczkowska et al. (2010)); how it can add value to its artefacts through the framing of its motivations, processes and products (Charnley et al. (2012)); and how people perceive software as being creative or not (Colton (2008)).

1.2.2 Projects Related to Automated Ideation

In many subfields of Artificial Intelligence research, most notably Machine Learning, the invention of concepts or the linking of concepts through associations are key activities, and both concepts and association rules can be seen as ideas. However, the aim of such projects is to find relationships about real data - we are not aware of any projects where imagined data designed to model an imagined reality rather than the true one has been used in a machine learning or data mining application. Of course, some of the rules learned may turn out to be co-incidental or over-fitting the data, but this makes them false, not fictional, as the latter implies that they were designed explicitly for the purpose of provoking imaginative thought. Perhaps the closest project within Machine Learning to automated ideation has been our own work in automatic mathematical concept formation, as described in Colton (2002) and Colton and Muggleton (2006). While this successfully led to the invention of mathematical concepts which were not known before, our approach was still based on learning over real datasets (albeit data about abstract mathematical objects such as numbers, graphs and groups). There have been other projects related to automated ideation based more in fictional scenarios. These have largely approached the task from a natural language generation perspective, aiming to generate irreducible linguistic items such as metaphors, similes and neologisms, and employing them in larger linguistic constructs such as poems and stories. Some such projects (many of which were carried out by members of the WHIM consortium) are described in section 1.2.3 below.

In addition, there have been a small number of projects where psychological theories of concept formation have been implemented and used for fictional ideation purposes. In par- ticular conceptual blending, introduced by Fauconnier and Turner (2002) has been given a computational reading in Pereira (2007) and applied to fictional ideation in various ap- plication domains, such as fictional character building, where the Divago system by Pereira and Cardoso (2006) re-invented classic ideas such as a winged horse (Pereira and Cardoso (2003)) and invented other novel characters. Via a link up with 3D modelling software, these characters were given visual form, for instance see the characters in figure 3. The blending in Divago was designed to increase a-priori the potential value of the blended concepts which are produced. As such, the invented concepts were not evaluated in sophisticated ways, and certainly there was no attempt to tell stories about the ideas as a means of assessing the ideas. The work presented recently in Li et al. (2012) implemented a goal-driven model of conceptual blending to be used in the story generation arena. Their aim was to invent blended objects and characters, and their initial experiments looked at two case studies, namely gadget generation and virtual characters for pretend play. This approach was story-centric, with the blended objects tested in the story: if any necessary conditions of the object could not be met in the story, then they were discarded. This contrasts somewhat with our proposed approach for the What-If Machine, which will generate and assess many rather than using a given story against which to test functional attributes of invented objects/concepts. The input to the blending process came from the story itself, and in both these blending projects, it was necessary to hand-craft knowledge sources in order to get the blending processes to work. We are not aware of any project involving blending where knowledge bases were built from the internet as source material for the ideation process, as we propose here.

Figure 3: Characters from blending horse and dragon concepts.

Notwithstanding these projects, there has never been a large-scale study of fictional ideation where the point of the exercise has been to investigate the potential for software to creatively output raw ideas themselves rather than finished artefacts such as paintings, poems or stories. The human mind is able to read ideas and emotional content into such automatically generated artefacts - indeed if software presented five words randomly, we could probably invent and justify a coherent idea linking some or all of them. However, this doesn't mean that they were put there by design by the software. For too long in Computational Creativity projects, the modelling of idea generation has been ignored because artefacts produced without ideation can be appre- ciated as if the ideas we read/see/hear in them were put there on purpose. This means that such software is able to create without necessarily being seen to be creative (as discussed in Colton (2008)) and without implementing a recognisable formal model of creativity. We have made much progress in Computational Creativity research with such creation without creativity projects, but most of them have been domain specific (linguistics, visual, musical) and the formalisations and implementations have not carried over to other domains. We believe that without abilities to intentionally generate, assess and utilise ideas, Computational Creativity research will never reach the next level where it is clear that software should be considered as creative partners.

The What-If Machine will be the first of its kind: able to creatively invent, assess and present ideas that will surprise, entertain, amuse, and provoke people, and provide a platform for imaginative thought. As we argue in section 3, this could be a remarkable step forward for Artificial Intelligence research, on a par with amazing problem-solving systems such as IBM's Watson. Our software will create these ideas this from little, if no, given background knowledge - as its knowledge will come from web searches before ideation. Ideas are the currency of creative behaviour common to many different application domains. While the ideas that our software produces may not necessarily be of value to all domains (music in particular, as the communicative nature of the ideas may not translate well to this more interpretative area), it is clear that the ideas could in principle be used by most generative software in the visual arts, literature, graphic, game and film design. The value of this will be huge and not wasted on the Computational Creativity community, and we hope to instigate a new idea-centric arm of this field.

The WHIM project will also mark an improvement in the state of the art in how we assess creative systems and how we determine progress in Computational Creativity research. Even though there are formalisms for assessing the value of the output of software, the evaluation of creative systems has largely been carried out in an ad-hoc way, and often omitted entirely as pointed out by Jordanous (2011). As argued above, there are drawbacks to using product-based assessments, and this might account for the relative lack of use of such methods as part of a standard methodology in Computational Creativity. We will bring to bear a full assessment of the What-If Machine, using both product (Ritchie (2007)) and process (Colton et al. (2011)) based assessment formalisms, enacted via a crowd sourcing framework. There will be nowhere to hide when thousands of people are giving their opinions about the ideas produced and the processes employed. We hope this will drive forward research on the sociology of assessing creative computer systems.

1.2.3 Progress in Individual Domains

In addition to the progress in the state of the art of Computational Creativity as a whole that we aim for with the WHIM project, there will also be significant advances in various individual technical domains of Artificial Intelligence, as described below. While there will be much overlap in the technologies used and extended, each of the following domains relates mostly to the engineering of one of the code modules (WVB, IG, IE, IR, AM) which are to be developed in work packages WP2 to WP6 respectively.

The current state-of-the-art in automatic knowledge base construction on a large-scale, using web-scale resources, is provided by the work in Open Information Extraction, particularly the Know-it-all and related projects at the University of Washington, as described in Etzioni et al. (2008). The approach taken uses simple, finite-state parsing technology which can analyse simple grammatical structures to provide triples representing facts such as Thomas Edison inventing the light bulb (hThomas Edison, invent, light bulbi). The advantage of this approach is that, because it uses simple finite-state parsing, which operates in linear time with respect to sentence length, massive numbers of sentences can be parsed given current computing resources, leading to huge numbers of facts being extracted from web-scale data.

Advances in the State of the Art: Rather than using finite-state parsing technology, we will use a parser based on the mildly- context-sensitive grammar formalism Combinatory Categorial Grammar (Steedman (2000)). Despite the power of the grammar, there are polynomial-time parsing algorithms available (Vijay-Shanker and Weir (1993)); moreover, the use of context-free approximations to the grammar, together with statistical pruning, mean that it is possible to develop a highly efficient parser capable of parsing billions of words of text in a reasonable time, as we demonstrated in Clark and Curran (2007) and Clark et al. (2009). Because of its efficiency, and the varied and detailed linguistic output it is capable of producing, the C&C parser (Clark and Curran (2007)) will be at the heart of many of the resources we construct automatically. We will extend the state-of-the-art in fact and knowledge extraction by using a more sophisticated representation than currently used in Open IE, which focuses on triples, typically represented as simple subject-verb-object constructions in text. The C&C parser can extract information which is represented in grammatically diverse ways, for example through relative clauses (e.g. Edison, who invented the light bulb. Also, through the use of dependency graphs, it can represent facts which cannot be expressed as subject-verb-object triples, e.g. Tom Cruise gave a 1.5M dollar engagement ring to Katie Holmes. We will also explore ways in which dependency graphs can be processed so that relevant facts are extracted from the graphs, for the purposes of ideation. For example, suppose that one of the keywords input to the What-If Machine is giraffe. A relevant fact relating to a giraffe is that it has a long neck; a fact which could be manipulated in some way to provide an unusual or creative take on the concept of a giraffe. However, extracting this information from text will be challenging since it will be quite difficult to determine that the adjective long is relevant in this context? (Extracting the fact that giraffes have necks would not be interesting). From the perspective of the dependency graph representation, the problem is to extract informative sub-graphs from a large set of parsed sentences; however, the number of such sub-graphs is exponential in the number of sentences, and so we have a challenging search problem. The only preliminary work we are aware of which has considered a related problem is Kelly et al. (2010), who search and extract dependency-graph representations in order to represent conceptual property norms.

The idea of using knowledge bases for query expansion in Information Retrieval is an old idea, although for the task of document retrieval - which is the obvious application - it has been difficult to obtain consistent improvements in retrieval performance, as discussed in Navigli and Velardi (2003). Automatically constructed knowledge bases are typically evaluated against datasets of human semantic similarity judgements, for example the Rubenstein and Goodenough dataset (Rubenstein and Goodenough (1965); Harrington (2010)).

Advances in the State of the Art: We are not aware of any existing work which uses query expansion as a tool for the ideation process. There are a number of papers on using random walks in order to find semantically-related concepts, such as Hughes and Ramage (2007) and Agirre et al. (2009). However, the application here is not to find concepts which are related in which ways which could be useful or interesting in a creative generation process. Hence we will need to extend existing techniques to move beyond simply measuring semantic similarity, but develop a new evaluation metric which measures relatedness of concepts specifically for the ideation process. For example, giraffes and champagne glasses are not semantically similar, but both have long necks, which could be useful in creating an original and interesting take on giraffes drinking champagne.

There is a large and growing literature on detecting events which are announced and described on social media, in particular Twitter; for a recent paper on this topic, see Petrovic et al. (2012). However, much of this work focuses on the topic of first-story detection (Allan et al. (2000)), in which the task is to determine whether an event mentioned in a tweet is novel and interesting. The potential applications of this technology rest on the immediacy of social media, and include helping news agencies spot emerging stories quickly, and also warn relevant agencies in (close to) real time of natural disasters such as earthquakes.

Advances in the State of the Art: We know of no existing work which uses event detection - and related tasks - applied to social media for a creative generation process. Here the emphasis is less on the real-time processing of the social media - although we would still like the extracted information to be timely - but with more of a focus on the relevance of the information. For example, we would not want the ideation process to be based on any of the countless individual and trivial events reported on Twitter, but based on events which are of cultural importance. Hence we will need to extend existing techniques to move beyond simply measuring whether an event is new and trending on Twitter, to finding events which will be useful specifically for the ideation process. Another difference is that we require a method to select which of the culturally relevant events to choose in order to expand on the original search terms provided by the user. For example, if the user originally provides giraffe as a search term, would it be useful to know that the singer Amy Winehouse had died recently? Given that we intend the system to be genuinely creative, we do not want to be too restrictive on which concepts and events to consider. Hence a key part of the research will be to develop metrics which can determine, from the various knowledge sources available - including social media sources - which concepts, facts and events are useful given a particular set of query terms.

Picasso famously stated that "computers are useless - they can only give you answers!" It is true that computationalists focus a great deal more on the finding of answers and the solving of well-defined problems than they do on building software that can pose its own questions and use its own imagination to challenge that of its users. Nonetheless, some meaningful progress has been made on imbuing computers with non-trivial and useful introspective question-posing abilities. Veale and Li (2011) have argued that since metaphor gen- eration and comprehension are essentially introspective processes - requiring a cognitive agent to ask questions of itself, such as "in what ways are these two concepts alike? in what ways are they different? what behaviors and properties of one can be sensibly projected onto the other?" - computational approaches to metaphor handling should already incorporate aspects of automated introspection. Veale and Li (2011) go further and show how a system for metaphor comprehension and generation can acquire its knowledge-base of stereotypical facts from the Web by harvesting the presupposition-rich questions that Web users send to search engines - such as "why do dogs chase cats?" or "why do dogs bury bones?" They further show how this system can then pose its own questions about related topics by introspectively repurposing its previously acquired questions to these new topics (e.g. if Man is a Dog, what kind of Cats does he chase?).



Figure 4: Corpus-derived metaphors for input concept 'crime'.

The development of novel computational techniques for creative ideation will build on and significantly extend the current state of the art in computational approaches to metaphorical creativity. Shutova (2010) describes a robust corpus-based means of finding informative paraphrases for metaphors, while Veale (2012c) describes a range of corpus-driven approaches to metaphor- ical ideation. Each of these approaches combines a knowledge-base of conventional knowledge (of stereotypical beliefs) and a large corpus of everyday language use, for instance the Web at large, or the Google n-grams database (Brants and Franz (2006)). Veale has shown how the integration of even a small amount of high-quality knowledge (which has been collected and refined from the Web) with a large amount of linguistic data can yield insights into the many metaphors that pervade our everyday language (Barnden (2006); Lakoff and Johnson (1980)). Metaphors like crime is a disease and crime is war are richly, if tacitly, supported by the language we all conventionally use, as described in Shutova (2010) and Veale (2012c). The Metaphor Eyes sys- tem described in Veale (2012c) uses this synthesis of stereotypical knowledge and associated linguistic patterns to generate both the metaphors themselves and their corpus-supported analyses. Figure 4 shows the suggestions made by Metaphor Eyes for the input concept crime. Figure 5(a) shows this system's subsequent analysis of the metaphor crime is a disease, while Figure 5(b) shows its analysis of the metaphor crime is war. Note how each analysis is supported by linguistic norms, such as the phrase "crime prevention" and "crime fighter". The right side of each screenshot shows a Google gadget primed to retrieve documents that use the analysis of the metaphor for query expansion (in what Veale (2011) names Creative Information Retrieval).


Figure 5: Screenshots of a linguistic analysis of the metaphor (a) 'crime is a disease' and (b) 'crime is war'.

Advances in the State of the Art: The WHIM project will build on this start-of-the-art baseline in significant ways. For one, it will use deeper analyses of the available linguistic content to generate even deeper and more structurally complex interpretations of each of the metaphors it discovers, proposes and subsequently elaborates. For another, it will explore new ways of packaging the resulting analyses into well-formed linguistic objects such as witty epigrams, memorable poems and short, punchy narratives. In doing so, we will also significantly advance the state of the art in computational humour. This decades-old field has thoroughly explored the low-hanging fruit to be found in deliberate ambiguity, from the phonetic ambiguity of puns to the referential ambiguity of pronouns to the structural ambiguity of sentences (Attardo et al. (2002); Hofstadter and Gabora (2002); Raskin (1985); Ritchie (2003); Ritchie et al. (2006)). Though also guided by the structures and potentials of language, in the WHIM project, we will focus on the conceptual possibilities arising from the implicit semantic tension in conventional wisdom (Raskin (1985); Ritchie (1999); Veale (2005)). We intend to give the What-If Machine a rudimentary yet productive sense of irony (as discussed in Stock et al. (2008) and Veale (2012b)) that will allow it to detect and expand upon the potential for ironic contrast in everyday knowledge (Veale (2012a,c); Veale and Hao (2010)). For instance, one needs a curved bow to launch a straight arrow, while the cleanest surgeons treat the dirtiest diseases. These contrasts are not evident in most common-sense knowledge-bases, but can be drawn into the foreground by a system that appreciates the often unspoken underpinnings of our everyday stereotypes. Installing a sense of irony means more than allowing it to detect implicit contrasts; it also means giving the system the ability to inject ironic contrast into otherwise innocuous statements. For example, limousines are typically long, and celebrities stereotypically ride in limousines. There is no obvious contrast here, yet the frequent Google query "short celebrities" suggests that shortness is an interesting quality of many celebrities. In principle, the What-If Machine will be able to thus generate the provocative generalization "the shortest celebrities often ride in the longest limousines" from this combination of a banal fact and a frequent observation. This generalization can, in turn, form the seed for a witty epigram, a poem or a short story. So, to go beyond the state of the art in ideation, we must go beyond pure logic itself, and use the principles of humour production to generate provocative statements that are nonetheless grounded in conventional wisdom and language norms. In the WHIM project, we will explore different strategies for combining stereotypical beliefs with conventional knowledge-bases and linguistic norms, to create provocative new hypotheses that the system can develop into witty and original narratives.

Storytelling efforts in AI have focused on two different tasks: that of building fictional plots from scratch and that of structuring appropriate discourse for conveying a given plot, as we described in Gervas (2009). The importance of causal relations in narrative comprehension has led to AI models of plot generation that rely heavily on the concept of planning. Many existing storytelling systems feature a planning component of some kind, whether as a main module or as an auxiliary one. TALESPIN (Meehan (1977)), AUTHOR (Dehn (1981)), UNIVERSE (Lebowitz (1983)), MINSTREL (Turner (1993)) and Fabulist (Riedl and Young (2010)), all include some representation of goals and/or causality, though each of them uses it differently in the task of generating stories. An important insight resulting from this work (originally formulated by Dehn (1981) but later taken up by others) was the distinction between goals of the characters in the story or goals of the author. A less frequently modelled aspect but also very relevant is emotion, which clearly plays a heavy role in the appreciation of narrative. The MEXICA storytelling system detailed in Pe ́rez y Pe ́rez (1999) takes into account emotional links and tensions between the characters as a means for driving and evaluating ongoing stories. The system evaluates the quality of a partial draft for a story in terms of the rising and falling shape of the arc of emotional tensions that can be computed from this information. The BRUTUS system of Bringsjord and Ferrucci (1999) performs story generation using a knowledge intensive rule-set based on a combination of logic and grammars. It bases its storytelling ability on a logical model of betrayal to produce very rich stories. BRUTUS is capable of creating a story of impressive quality, with most of the features (in terms of literary tropes, or dialogue, identification with the characters, etc.) one would find in a human authored story. However, the authors make it clear that BRUTUS is not creative at all but rather the result of reverse engineering a program out of a story, to ensure that it can build that particular story.

Character-based storytellers such as the one in Theune et al. (2003) rely on implementations of characters as autonomous intelligent agents that can choose their own actions informed by their internal states (including goals and emotions) and their perception of the environment. Narrative is understood to emerge from the interaction of these characters with one another. This guarantees coherent plots, but, as Dehn (1981) pointed out, not necessarily very interesting ones. However, this approach has been found very useful in the context of virtual environments, where the introduction of such agents injects a measure of narrative to an interactive setting. An alternative approach to story generation relies on trying to reuse material from existing narrative in the production of new ones. Several solutions have been applied, generally related to Case-Based Reasoning. Knowledge intensive case-based reasoning approaches (Gervas et al. (2005); Peinado and Gervas (2006a); Peinado (2008)) use Semantic Web technologies for knowledge representation and simple combinatorial algorithms for generating the structure of new plots by reusing fragments of structure of previous stories, inspired in the morphology of Russian folk-tales studied by the formal- ist Vladimir Propp (1928). Relying on more shallow representations, Riedl and Leo ́n (2008) and Riedl and Sugandh (2008) introduce a story planning algorithm inspired by case-based reasoning that incorporates vignettes - pre-existing short narrative segments - into the story being generated. Finally, evolutionary algorithms have also been applied to story generation tasks, as describted in McIntyre and Lapata (2009a) and McIntyre and Lapata (2010). With respect to the task of building an appropriate discourse for rendering a given plot, significant efforts have been carried out in the domain of cinematic visual discourse. These include work on the use of flashback and foreshadowing to produce surprise (Bae and Young (2008)) and automatic generation of camera placements over time to define a visual discourse that best fits the plot to be rendered (Jhala and Young (2010)). Both of these efforts rely on a planning based approach to narrative, with plots represented as plans. A computational model for the composition of narrative is presented in Gervas (2012), where issues such as focalization, chronology, and the combination of multiple narrative threads are explicitly contemplated in the transformation of an input set of facts into a sequential narrative discourse.

Evaluation constitutes a fundamental issue that has also been specifically addressed in automatic story generation. Peinado and Gervas (2006a) perform specific evaluation by querying four values from human testers. Among other things, they conclude that while understanding stories, people tend to rely on their own knowledge to complete and give sense to partially non-coherent stories. This happens to the extent that stories with a high degree of random facts with no causal relation with the rest of the story, can receive good ratings.

Advances in the State of the Art: The relative success of past efforts at generating narrative have always been constrained by three limiting factors: (i) they set out from their respective sources of knowledge with no prior selection of valuable starting material (ii) they operated with an open conception of what the target form should be, merely requiring that it be intepretable as a story, and (iii) they were restricted to an underlying vocabulary of knowledge elements either hand-crafted or limited by the coverage of a single resource. The generators that will be designed for the What-If Machine will improve upon these circumstances in the following way. They will be starting from preselected valuable conceptual material resulting from Tasks 3.1, 3.2 and 3.3 of WP3. This will automatically eliminate a large number of dead avenues that need not be explored. They will be operating with very specific targets to achieve, as suggested by the meta annotations produced in Task 3.4, indicating suitable framings for each particular idea. This will serve to guide the generation mechanisms, significantly reducing the search spaces that need to be explored. Their vocabulary of underlying knowledge and available concepts will be continuously fed with new material resulting from WP2. This will ensure a sustained perception of creativity, susceptible of being measured and evaluated formally. Additionally, the possibility of combining the various technologies being considered, together with the factors above, will ensure significant improvements in performance with respect to state of the art generators.

Figure 6: Commentary and poem from the full-FACE poetry generator (Colton et. al (2012)).

In the idea rendering model, as described in WP5, we will be investigating and extending existing techniques for producing linguistic artefacts. The generation of metaphors and other linguistic irreducibles, in addition to stories, has been covered above. In poetry generation, the early attempts were based on template mechanisms and heavy curating of the output, which is still popular if the aim is to produce interesting interpretable poems (as per Montfort and Fedorova (2012), whose aesthetic involves writing the smallest amount of code able to generate readable poems), rather than modelling the creative process of producing poems. Recently, the seminal evolutionary poetry generator McGONAGALL (Manurung (2004)) has made a programmatic comeback, as described Manurung et al. (2012a). This work is based on the maxim that "deviations from the rules and norms [of syntax and semantics] must have some purpose, and not be random" and the authors specify that falsifiable poetry generation software must meet the triple constraints of grammaticality, meaningfulness and poeticness. McGONAGALL, along with the most recent incarnation of the WASP system described in Gervas (2010), and the system of Greene et al. (2010), all produce technically proficient poems satisfying these criteria. The GRIOT system was developed to generate interactive multimedia events for rendering in various linguistic artefacts including poems (Chow and Harrell (2008)) and narratives (Harrell (2007)). The Alloy algorithm at the heart of GRIOT employs blending results from cognitive linguistics, and algebraic semiotics, as described in Goguen (1999). A further contribution to this area by WHIM project partners involved using the FACE formalisation of assessing the creative acts by software (Colton et al. (2011)) to guide the building of a poetry generation system. The software automatically produces poem templates [(C)oncept], instantiates the templates to produce a poem [(E)xample] using simile multiplication and overlapping them with snippets from a newspaper article, in such a way as to maximise an aesthetic measure it has invented [(A)esthetic], which can be based on mood via a sentiment analysis of the daily newspaper articles or other attributes like relevance to the newspaper article, flourishes and lyricism, and finally provides a commentary on how it did all this [(F)raming]. An example poem from this full-FACE poetry generator is given in figure 6, and the system is described in Colton et al. (2012). There have been a number of other ad-hoc generative linguistic projects to produce a diverse range of artefacts, including punning riddles: (Ritchie et al. (2006)), lyrics: (Barbieri et al. (2012)), ironic kinematic expressions: (Stock et al. (2008)), advertising slogans (Valitutti et al. (2008)) and dialogue (Strong and Mateas (2008)).

Advances in the State of the Art: In the WHIM project, we will be interested not in ideation for the purposes of the generation of poems, stories and jokes, but rather we will be generating such linguistic artefacts for the purposes of conveying an idea. As such, we will be the first to experiment with how a rendering can highlight or obfuscate an idea, heighten or dampen emotions attached to it, and generally present it in more or less thought provoking ways. We will also be the first to test the hypothesis that automatically created artefacts which are given alongside an automatically created commentary about how they were produced, as per figure 6 and laid out in Colton et al. (2011) and Pease and Colton (2011b) will be more valued than those without the commentary. If this hypothesis is supported by crowd-sourced results, then we would hope that other Computational Creativity researchers will begin to think like we do: that a generated artefact should always be seen as a pair of objects, the material object itself (the poem/story/etc.) and some framing information such as a commentary to add value.

Symbolic data analysis techniques are used to discover comprehensible models or interesting patterns in data. They can be divided into techniques for predictive induction, where models, typically induced from class labelled data, are used to predict the class value of previously unseen examples, and descriptive induction, where the aim is to find comprehensible patterns, typically induced from unlabelled data. Until recently, these techniques have been investigated by two different research communities: predictive induction mainly by the machine learning community, and descriptive induction mainly by the data mining community. Data mining tasks where the goal is to find human-interpretable differences between groups have been addressed by both communities independently. The groups can be interpreted as class labels, so the data mining community, using the association rule learning perspective, adapted association rule learners, like the Apriori method of Agrawal et al. (1996), to perform tasks named contrast set mining (Bay and Pazzani (2001)) and emerging pattern mining (Dong and Li (1999)). On the other hand, instead of building sets of classification/prediction rules using software such as CN2 (Clark and Niblett (1989)) or RIPPER (Cohen (1995)), the machine learning community was challenged to build individual rules for exploratory data analysis and interpretation, which they call subgroup discovery, as described in Wrobel (1997).

Advances in the State of the Art: In the WHIM project, predictive and descriptive data mining techniques will be used for analysing questionnaires, acquired through crowd-sourcing, through which people will evaluate the artefacts created by the What-If Machine. We will be interested in finding patterns which best differentiate the positively labelled questionnaires (human positively evaluated artefacts) from the negatively evaluated ones. We will therefore need to develop a new algorithm for contrast set mining able to analyse questionnaires, advancing the approach to contrast set mining, first defined in Bay and Pazzani (2001) as finding contrast sets as "conjunctions of attributes and values that differ meaningfully in their distributions across groups". Following our own approach proposed in Kralj Novak et al. (2009), we will transform the problem of finding contrasting groups into a subgroup discovery problem and will develop an appropriate algorithm for doing this task, upgrading our subgroup discovery algorithms CN2-SD and APRIORI-SD, which are briefly described in Kralj Novak et al. (2009).

As explained in Turney (2002) and Liu (2012), sentiment analysis or opinion mining research explores how to detect the author attitudes, emotions or opinions about a given topic expressed in text. The term 'sentiment', used in the context of automatic analysis of text and tracking of predictive judgments, first appeared in Das and Chen (2001) and Tong (2001), who were interested in analyzing stock market sentiment. As more and more personal opinions are made available online, recent research indicates that analysis of online texts such as blogs, web pages, wikis and social networks can be useful for predicting different economic trends. For instance, Gruhl et al. (2005) showed that the frequency of blog posts can be used to predict spikes in consumer purchase quantity at online retailers. Moreover, it was shown in Tong (2001) that references to movies in newsgroups were correlated with their sales, and sentiment analysis of weblog data was used to predict the financial success of movies (Mishne and Glance (2006)). Twitter posts were also shown in Asur and Huberman (2010) to be useful for predicting box-office revenues of movies before their release.

Advances in the State of the Art: We intend to perform predictive and descriptive sentiment analysis from free texts, acquired from crowd-sourcing. We are not aware of any off-the-shelf approaches enabling the construction of predictive models and descriptive patterns that should be able to distinguish judgements expressing positive sentiments from those expressing negative sentiments. We will upgrade our approach to predictive sentiment analysis from (Smailovic et al., 2013-pending) for the needs of building understandable patterns from opinionated texts collected through crowd-sourcing the evaluations of What-If Machine generated ideas.

In creative knowledge discovery the associative creativity theory of Mednick (1962) and the domain-crossing associations called bisociations of Koestler (1964) are important notions. Mednick defined creative thinking as the capacity of generating new combinations of distinct associative elements (concepts), and explained how thinking about the concepts that are not strictly related to the elements under investigation inspires unforeseen useful connections between these elements. On the other hand, according to Koestler, a bisociation is a result of creative processes of the mind when making completely new associations between concepts from domains that are usually considered separate. Consequently, discovering bisociations may considerably improve creative discovery processes. Approaches to creative knowledge discovery from text documents were also investigated in the area of medical literature mining. Swanson (1986) designed the so-called ABC model approach to investigate whether the phenomenon of interest C in the first domain is related to some phenomenon A in the other literature through some interconnecting phenomenon B addressed in both literatures. If the literature about C relates C with B, and the literature about A relates A with the same B, then combining these relations may suggest a relation between C and A. If closer inspection confirms that an uncovered relation between C and A is new, meaningful and interesting, this can be viewed as new evidence or considered as a new piece of knowledge. Weeber et al. (2001) have followed the work of Swanson with the goal of finding interesting bridging terms that relate two different literatures. Bridging term identification is also the main goal of the RaJoLink system (Petric et al. (2009); Urbancic et al. (2009)).

Advances in the State of the Art: One of the recent advances in bisociative knowledge discovery, developed within the FP7 EU funded project BISON (Berthold (2012)), is the Cross-Context Bisociation Explorer (CrossBee), described in Jursic et al. (2012), which is a user-friendly tool for ranking and exploration of bisociative terms that have the potential for cross-context link discovery. In the WHIM project, we will use CrossBee as a tool for bisociative fictional ideation from sentiment-labeled free opinionated texts. To our knowledge, none of the approaches to bisociative literature mining has been applied to such a challenging task. To this end, the functionality of CrossBee will need to be substantially extended and adapted.

1.2.4 Summary of Innovations in the WHIM Project

We have described above many advances in the state of the art that we hope the WHIM project will bring about. To summarize the novelty of the WHIM project, as described in the previous sections, we will advance the state of the art in Computational Cre- ativity research by creating a novel theoretical, engineering and experimental framework and a new methodology for facilitating machine creation of ideas. Given along with the major related work packages (WP), the main areas of innovation will be:

To summarize the importance of the WHIM project as advancing the state of the art, we offer the following two points:


Aggarwal, A., P. Gervas, and R. Hervas (2009). Measuring the influence of errors induced by the presence of dialogues in reference clustering of narrative text. In Proceedings of the 7th International Conference on Natural Language Process- ing, pp. 209-218.

Agirre, E., E. Alfonseca, K. B. Hall, J. Kravalova, and M. P. A. Soroa (2009). A study on similarity and relatedness using distributional and wordnet-based approaches. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Boulder, CO, pp. 19-27.

Agrawal, R., H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo (1996). Fast discovery of association rules. Advances in Knowledge Discovery and Data Mining, 307-328.

Allan, J., V. Lavrenko, and H. Jin (2000). First story detection. In Proc. 9th Conference on Information Knowledge Management CIKM, McClean, VA USA, pp. 374-381.

Asur, S. and B. A. Huberman (2010). Predicting the Future with Social Media. In Proceedings of the ACM International Conference on Web Intelligence.

Attardo, S., C. F. Hempelmann, and S. Di Maio (2002). Script oppositions and logical mechanisms: Modeling incongruities and their resolutions. Humor: International Journal of Humor Research 15(1), 3-46.

Bae, B.-C. and R. M. Young (2008). A use of flashback and foreshadowing for surprise arousal in narrative using a plan-based approach. In Proc. ICIDS 2008.

Bal, M (2009). Narratology. University of Toronto Press, Toronto.

Ballesteros, M., V. Francisco, A. Dıaz, J. Herrera, and P. Gervas (2011). Inferring the scope of speculation using dependency analysis. In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (KDIR 2011).

Barbieri, G., F. Pachet, P. Roy, and M. Degli Esposti (2012). Markov Constraints for Generating Lyrics with Style. In Proceedings of the 20th European Conference on Artificial Intelligence.

Barnden, J. (2006). Artificial Intelligence, figurative language and cognitive linguistics. In Cognitive Linguistics: Current Application and Future Perspectives.

Bay, S. D. and M. J. Pazzani (2001). Detecting group differences: Mining contrast sets. Data Mining and Knowledge Discovery 5(3), 213-246.

Beltran-Ferruz, P., P. A. Gonzalez-Calero, and P. Gervas (2004). Converting mikrokosmos frames into description logics. In Proceedings of the 4th Workshop on NLP and XML (NLPXML-2004), Barcelona, Spain, pp. 3542.

Berthold, M. (Ed.) (2012). Bisociative Knowledge Discovery. Springer.

Boden, M. (2003). The Creative Mind: Myths and Mechanisms (second edition). Routledge.

Bos, J., S. Clark, M. Steedman, J. R. Curran, and J. Hock- enmaier (2004). Wide-coverage semantic representations from a CCG parser. In Proceedings of COLING-04, Geneva, Switzerland, pp. 1240-1246.

Brants, T. and A. Franz (2006). Web 1T 5-gram Version 1. Linguistic Data Consortium.

Bringsjord, S. and D. Ferrucci (1999). Artificial Intelligence and Literary Creativity: Inside the mind of Brutus, a StoryTelling Machine. Hillsdale, NJ: Lawrence Erlbaum Associates.

Browne, C. (2011). Evolutionary Game Design. Springer.

Camara Pereira, F., R. Hervas, P. Gercas, and A. Cardoso (2006). A multiagent text generator with simple rhetorical habilities. In Computational Aesthetics: Artificial Intelli- gence Approaches to Beauty and Happiness (Workshop from AAAI-06), Boston, USA, pp. 37-44.

Cardoso, A., T. Veale, and G. A. Wiggins (2010). Converging on the divergent: The History (and Future) of the International Joint Workshops in Computational Creativity. AI Magazine 30(3).

Charnley, J., A. Pease, and S. Colton (2012). On the Notion of Framing in Computational Creativity. In Proceedings of the 3rd International Conference on Computational Creativity.

Chow, K. and D. F. Harrell (2008). Generative Visual Renku: Linked Poetry Generation with the GRIOT System. In Visionary Landscapes: Electronic Literature Organization Conference.

Clark, P. and T. Niblett (1989). The CN2 induction algorithm. Machine Learning 3(4), 261-283.

Clark, S., A. Copestake, J. R. Curran, Y. Zhang, A. Herbelot, J. Haggerty, B. G. Ahn, C. V. Wyk, J. Roesner, J. Kummerfeld, and T. Dawborn (2009). Large-scale syntactic processing: Parsing the web. Technical report, Johns Hopkins University Center for Speech and Language Processing Summer Research Workshop.

Clark, S. and J. R. Curran (2007). Wide-coverage efficient statistical parsing with CCG and log-linear models. Comp. Ling. 33(4), 493-552.

Cohen, W. W. (1995). Fast effective rule induction. In Pro- ceedings of the 12th International Conference on Machine Learning (ICML'95), pp. 115-123.

Collins, A. M. and E. F. Loftus (1975). A spreading-activation theory of semantic processing. Psychological Review 82(6), 407-428.

Colton, S. (1999). Refactorable numbers - a machine invention. Journal of Integer Sequences 2.

Colton, S. (2002). Automated Theory Formation in Pure Math- ematics. Springer-Verlag.

Colton, S. (2008). Creativity versus the Perception of Creativity in Computational Systems. In Proceedings of the AAAI Spring Symposium on Creative Systems.

Colton, S. (2012). The Painting Fool: Stories from Building an Automated Painter. In Computers and Creativity. Springer.

Colton, S., J. Charnley, and A. Pease (2011). Computational Creativity Theory: the FACE and IDEA Models. In Proceedings of the 2nd International Conference on Computational Creativity.

Colton, S., M. Cook, and A. Raad (2011). Ludic Considerations of Tablet-Based Evo-Art. In Proceedings of the EvoMusArt Workshop.

Colton, S., J. Goodwin, and T. Veale (2012). Full-FACE Poetry Generation. In Proceedings of the 3rd International Conference on Computational Creativity.

Colton, S. and S. Muggleton (2006). Mathematical Applications of Inductive Logic Programming. Machine Learning 64, 25-64.

Colton, S. and B. Perez Ferrer (2012). No photos harmed/growing paths from seed. In Proceedings of the Non-Photorealistic Animation and Rendering Symposium.

Colton, S. and G. A. Wiggins (2012). Computational Creativity: The Final Frontier? In Proceedings of the 20th European Conference on Artificial Intelligence.

Cook, M. and S. Colton (2011). Automated Collage Generation - With More Intent. In Proceedings of the 2nd International Conference on Computational Creativity.

Curran, J. R. and S. Clark (2003a). Investigating GIS and smoothing for maximum entropy taggers. In Proceedings of the 10th Meeting of the EACL, Budapest, Hungary, pp. 91-98.

Curran, J. R. and S. Clark (2003b). Language independent NER using a maximum entropy tagger. In Proceedings of the Seventh Conference on Natural Language Learning (CoNLL- 03), Edmonton, Canada, pp. 164-167.

Das, S. and M. Chen (2001). Yahoo! for Amazon: Extracting market sentiment from stock message boards. In Proceedings of the 8th the Asia Pacific Finance Association Annual Conference (APFA).

Dehn, N. (1981). Story generation after Tale-Spin. In Proc. IJCAI 1981, pp. 16-18.

Dıaz-Agudo, B., P. Gervas, and F. Peinado (2004). A Case Based Reasoning Approach to Story Plot Generation. In Proceedings of the 7th European Conference on Case Based Reasoning.

Dong, G. and J. Li (1999). Efficient mining of emerging pat- terns: Discovering trends and differences. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'99), pp. 43-52.

Eigenfeldt, A., A. Burnett, and P. Pasquier (2012). Evaluating Musical Metacreation in a Live Performance Context. In Proceedings of the 3rd International Conference on Computational Creativity.

Etzioni, O., M. Banko, S. Soderland, and D. S. Weld (2008). Open information extraction from the web. Communications of the ACM 51(12), 68-74.

Fauconnier, G. and M. Turner (2002). The Way We Think: Conceptual Blending and the Mind's Hidden Complexities. Basic Books.

Fellbaum, C. (Ed.) (1998). WordNet: an Electronic Lexical Database. Cambridge, MA: The MIT Press.

Forster, E. M. (1927). Aspects of the novel. New York: Harcourt.

Francisco, V., P. Gervas, and R Hervas (2007). Dependency analysis and CBR to bridge the generation gap in template-based NLG. In Computational Linguistics and Intelligent Text Processing (CICLing 2007), Mexico City, Mexico. Springer-Verlag.

Francisco, V., R. Hervas, and P. Gervas (2006). Automated knowledge acquisition in case-based text generation. In 8th European Conference on Case-Based Reasoning, Fethiye, Turkey.

Gendler, T. S. (2000). Thought Experiment: On the Powers and Limits of Imaginary Cases. Garland Publishing.

Genette, G. (1980). Narrative discourse : an essay in method. Cornell University Press.

Gervas, P.(2002). Exploring Quantitative Evaluations of the Creativity of Automatic Poets. In Proceedings of the ECAI Workshop on Creative Systems, Approaches to Creativity in Artificial Intelligence and Cognitive Science.

Gervas, P. (2009). Computational approaches to storytelling and creativity. AI Magazine 30(3), 49-62.

Gervas, P. (2010). Diez poemas emocionales generados por un computador. In D. Canas and G. Tardon (Eds.), Puede un computador escribir un poema de amor?

Gervas, P. (2010). Engineering linguistic creativity: Bird flight and jet planes. In Proceedings of the Second Workshop on Computational Approaches to Linguistic Creativity, Los Angeles, California. Association for Computational Linguistics.

Gervas, P. (2010). Engineering linguistic creativity: Bird flight and jet planes. In NAACL HLT 2010 Second Workshop on Computational Approaches to Linguistic Creativity, Los Angeles. Association for Computational Linguistics: Association for Computational Linguistics.

Gervas, P. (2011). Dynamic inspiring sets for sustained nov- eltyinpoetrygeneration. In D. Ventura, P.Gervas, D. F. Harrell, M. L. Maher, A. Pease, and G. Wiggins (Eds.), ICCC, Menlo Park, California, pp. 111-116. Universidad Autonoma Metropolitana, Unidad Cuajimalpa.

Gervas, P. (2012). From the fleece of fact to narrative yarns: a computational model of composition. In Workshop on Computational Models of Narrative, 2012 Language Resources and Evaluation Conference (LREC'2012), Istanbul, Turkey.

Gervas, P., G. A. Carredano, R. Hervas, G. Perez, S. Bautista, V. Francisco, and P. M. Portillo (2010). Integrating aggregation strategies in an in-home domain dialogue system. In TSD, pp. 499-506.

Gervas, P., B. Diaz-Agudo, F.Peinado, and R. Hervas (2005). Story Plot Generation Based on CBR. Knowledge-Based Systems. Special Issue: AI-2004 18, 235-242.

Gervas, P. and C. Leon (2010). Story generation driven by system-modified evaluation validated by human judges. In First International Conference on Computational Creativity, Lisboa, Portugal.

Gervas, P., B. Linneker, J. C. Meister, and F. Peinado (2006). Narrative models: Narratology meets artificial intelligence. In International Conference on Language Resources and Evaluation. Satellite Workshop: Toward Computational Models of Literary Analysis, Genova, Italy, pp. 44-51.

Goguen, J. (1999). An Introduction to Algebraic Semiotics, with Applications to User Interface Design. In C. Nehaniv (Ed.), Computation for Metaphor, Analogy and Agents. Springer.

Goodwin, J. (2012). Rhetoric-Enhanced Poetry Generation. Masters thesis, Department of Computing, Imperial College, London.

Gow, J., R. Baumgarten, P. Cairns, S. Colton, and P. Miller (2012). Unsupervised Modelling of Player Style with LDA. IEEE Transactions on Computational Intelligence and AI in Games 4(3), 152-166.

Greene, E., T. Bodrumlu, and K. Knight (2010). Automatic analysis of rhythmic poetry with applications to generation and translation. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP'10, Stroudsburg, PA, USA. Association for Computational Linguistics.

Gruhl, D., R. Guha, R. Kumar, J. Novak, and A. Tomkins (2005). The Predictive Power of Online Chatter. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining.

Harrell, D. F. (2007). GRIOTs Tales of Haints and Seraphs: A Computational Narrative Generation System. In N. Wardrip-Fruin and P. Harrigan (Eds.), Second Person: Role-Playing and Story in Games and Playable Media. MIT Press.

Harrington, B. (2010). A semantic network approach to measuring relatedness. In Proceedings of COLING-10, Bejing, China.

Harrington, B. (2011). Discovering novel biomedical relations using asknet semantic networks. In Proceedings of the 4th International Symposium On Applied Sciences In Biomedical And Communication Technologies (ISABEL 2011), Barcelona, Spain.

Harrington, B. and S. Clark (2007). ASKNet: Automated semantic knowledge network. In AAAI-07, Vancouver, Canada, pp. 889-894.

Hassan, S., C. Leon, P. Gervas, and R. Hervas (2007). A computer model that generates biography-like narratives. In Proceedings of the International Joint Workshop on Computational Creativity.

Hervas, R. (2009). Expresiones de Referencia y Figuras Retoricas para la Distincion y Descripcion de Entidades en Discursos Generados Automaticamente (Spanish version). Ph.D. thesis, Universidade Complutense de Madrid.

Hervas, R., F. Camara Pereira, P. Gervas, and A. Cardoso (2006). A text generation system that uses simple rhetorical figures. Procesamiento de Lenguaje Natural 37, 199-206.

Herva ́s, R., R. Costa, H. Costa, P. Gervas, and F. Camara Pereira (2007). Enrichment of automatically generated texts using metaphor. In 6th Mexican International Con- ference on Artificial Intelligence (MICAI-07), Volume 4827, Aguascalientes, Mexico, pp. 944954. Springer Verlag, LNAI Series: Springer Verlag, LNAI Series.

Hervas, R. and M. Finlayson (2010). The prevalence of descriptive referring expressions in news and narrative. In 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010), Uppsala, Sweden.

Hervas, R. and P. Gervas (2005a). Applying genetic algorithms to referring expression generation and aggregation. In 10th International Conference on Computer Aided Systems Theory (EUROCAST), Spain, pp. 145-149.

Hervas, R. and P. Gervas (2005b). An evolutionary approach to referring expression generation and aggregation. In 10th European Workshop on Natural Language Generation (ENLG'05), Aberdeen, England, pp. 168-173.

Hervas, R. and P. Gervas (2006). Case-based reasoning for knowledge-intensive template selection during text generation. In 8th European Conference on Case-Based Reasoning (ECCBR-06), Volume 4106, Fethiye, Turkey, pp. 151-165.

Hervas, R. and P. Gervas (2008). Descripcion de entidades y generacion de expresiones de referencia en la generacion automatica de discurso. Procesamiento de Lenguaje Natural 41, 217-224.

Hofstadter, D. and L. Gabora (2002). Synopsis of the Workshop on Humor and Cognition. Humor: International Journal of Humor Research 2(4), 417-440.

Hughes, T. and D. Ramage (2007). Lexical semantic relatedness with random graph walks. In EMNLP-CoNLL-07, Prague, pp. 581-589.

Jhala, A. and R. M. Young (2010). Cinematic visual discourse: Representation, generation, and evaluation. IEEE Trans. on Comp. Int. and AI in Games 2(2), 69-81.

Jordanous, A. (2011). Evaluating Evaluation: Assessing Progress in Computational Creativity Research. In Proceedings of the 2nd International Conference on Computational Creativity.

Jursic,M., B.Cestnik, T.Urbancic,,and N.Lavrac (2012). Cross-domain Literature Mining: Finding Bridging Concepts with CrossBee. In Proceedings of the 3rd International Conference on Computational Creativity.

Kelly, C., B. Devereux, and A. Korhonen (2010). Acquiring human-like feature-based conceptual representations from corpora. In Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics.

Kirschner, P., J. Sweller, and R. Clark (2006). Why Minimal Guidance During Instruction Does Not Work: An Analysis of the Failure of Constructivist, Discovery, Problem-Based, Experiential, and Inquiry-Based Teaching. Educational Psychologist 41(2), 75-86.

Koestler, A. (1964). The Act of Creation. MacMillan.

Kralj Novak, P., N. Lavrac, and G. I. Webb (2009). Supervised descriptive rule discovery: A unifying survey of contrast set, emerging pattern and subgroup mining. Journal of Machine Learning Research 10, 377-403.

Krzeczkowska, A., J. El-Hage, S. Colton, and S. Clark (2010). Automated Collage Generation - With Intent. In Proceedings of the 1st International Conference on Computational Creativity.

Lakoff, G. (2010). Why it Matters How We Frame the Environment. Environmental Communication: A Journal of Nature and Culture 4(1).

Lakoff, L. and M. Johnson (1980). Metaphors We Live By. University of Chicago Press.

Lazzaro, N. (2004). Why we play games: Four keys to more emotion without story. Technical report, XEO Design Inc.

Lebowitz, M. (1983). Story-telling as planning and learning. In Proc. IJCAI 1983, Volume 1.

Leon, C. (2010). A Computational Model for Automated Extraction of Structural Schemas from Simple Narrative Plots. Ph. D. thesis, Universidad Complutense de Madrid.

Leon, C. and P. Gervas (2008). Creative storytelling based on transformation of generation rules. In 5th International Joint Workshop on Computational Creativity.

Leon, C. and P. Gervas (2011). A top-down design method- ology based on causality and chronology for developing as- sisted story generation systems. In 8th ACM Conference on Creativity and Cognition, Atlanta.

Leon, C.and P. Gervas (2012). Prototyping the use of plot curves to guide story generation. In Workshop on Computational Models of Narrative, 2012 Language Resources and Evaluation Conference (LREC'2012), Istanbul, Turkey.

Leon, C., S. Hassan, and P. Gervas (2007). From the event log of a social simulation to narrative discourse: Content planning in story generation. In P. Olivier and C. Kray (Eds.), Conference of the Artificial and Ambient Intelligence, Culture Lab, Newcastle University, Newcastle upon Tyne, UK, pp. 402-409.

Leon, C., S. Hassan, P. Gervas, and J. Pavon (2007). Mixed narrative and dialog content planning based on bdi agents. In XII Conferencia de la Asociacion Espaola para Inteligencia Artificial, Salamanca, Spain.

Li, B., A. Zook, N. Davis, and M. Riedl (2012). Goal-Driven Conceptual Blending: A Computational Approach for Creativity. In Proceedings of the 3rd International Conference on Computational Creativity.

Liu, B. (2012). Sentiment Analysis and Opinion Mining. Morgan and Claypool Publishers.

Manning, C. D., P. Raghavan, and H. Schuetze (2008). Introduction to Information Retrieval. Cambridge University Press.

Manurung, H. (2003). An evolutionary algorithm approach to poetry generation. PhD Thesis, University of Edinburgh.

Manurung, H. M. (1999). Chart generation of rhythm-patterned text. In Proc. of the First International Workshop on Literature in Cognition and Computers.

Manurung, R., G. Ritchie, and H. Thompson (2012a). Using genetic algorithms to create meaningful poetic text. Journal of Experimental and Theoretical Artificial Intelligence 24(1), 43-64.

Manurung, R., G. Ritchie, and H. Thompson (2012b). Using Genetic Algorithms to Create Meaningful Poetic Text. Journal of Experimental and Theoretical Artificial Intelligence 24(1), 43-64.

McIntyre, N. and M. Lapata (2009a). Learning to tell tales: A data-driven approach to story generation. In K. Y. Su, J. Su, and J. Wiebe (Eds.), ACL/AFNLP, pp. 217-225. The Association for Computer Linguistics.

McIntyre, N. and M. Lapata (2009b). Learning to Tell Tales: A Data-driven Approach to Story Generation. In Proceedings of the 47th Annual Meeting of the ACL.

McIntyre, N. and M. Lapata (2010). Plot induction and evolutionary search for story generation. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL'10, Stroudsburg, PA, USA, pp. 1562-1572. Association for Computational Linguistics.

Mednick, S. A. (1962). The associative basis of the creative process. Psychol. Rev 69, 220-232.

Meehan, J. R. (1977). Tale-spin, an interactive program that writes stories. In Proc. IJCAI 1977, pp. 91-98.

Mishne, G. and N. Glance (2006). Predicting Movie Sales from Blogger Sentiment. In Proceedings of the AAAI Symposium on Computational Approaches to Analysing Weblogs.

Moffat, D. and M. Kelly (2006). An Investigation into People's Bias Against Computational Creativity in Music Composition. In Proceedings of the International Joint Workshop on Computational Creativity.

Montfort, N. and N. Fedorova (2012). Small-Scale Systems and Computational Creativity. In Proceedings of the 3rd International Conference on Computational Creativity.

Murphy, G. (2004). The Big Book of Concepts. MIT Press.

Navigli, R. and P. Velardi (2003). An analysis of ontology-based query expansion strategies. In Proc. of Workshop on Adaptive Text Extraction and Mining (ATEM 2003), in the 14th European Conference on Machine Learning (ECML 2003).

Norman, D. (2005). Emotional Design: Why we love (or hate) everyday things. Basic Books.

Pachet, F. (2003). The Continuator: Musical interaction with style. Journal of New Music Research 32(3), 333-341.

Pease, A. and S. Colton (2011a). Computational Creativity Theory: Inspirations Behind the FACE and IDEA Models. In Proceedings of the 2nd International Conference on Computational Creativity.

Pease, A. and S. Colton (2011b). On Impact and Evaluation in Computational Creativity: A Discussion of the Turing Test and an Alternative Proposal. In Proceedings of the AISB symposium on AI and Philosophy.

Peinado, F. (2008). Un Armazon para el Desarrollo de Aplicaciones de Narracion Automatica basado en Componentes Ontologicos Reutilizables. PhD. thesis, Universidad Complutense de Madrid

Peinado, F., V. Francisco, R. Hervas, and P. Gervas (2010). Assessing the novelty of computer-generated narratives using empirical metrics. Minds and Machines 20(4), 588.

Peinado, F. and P. Gervas (2005). A generative and case-based implementation of proppian morphology. In The 17th Joint International Conference of the Association for Computers and the Humanities and the Association for Literary and Lin- guistic Computing (ACH/ALLC), Victoria, Canada, pp. 129-131. University of Victoria.

Peinado, F. and P. Gervas (2006a). Evaluation of automatic generation of basic stories. New Generation Computing 24(3), 289-302.

Peinado, F. and P. Gervas (2006b). Minstrel reloaded: From the magic of Lisp to the formal semantics of owl. In 3rd In- ternational Conference on Technologies for Interactive Digital Storytelling and Entertainment (TIDSE), Volume 4326, Darmstadt, Germany, pp. 93-97. Springer.

Peinado, F., P. Gervas, and B. Diaz-Agudo (2004). A description logic ontology for fairy tale generation. In T. Veale, A. Cardoso, and F. Camara Pereira (Eds.), 4th International Conference on Language Resources and Evaluation, Procs. of the Workshop on Language Resources for Linguistic Creativity, Lisbon, Portugal, pp. 56-61. ELRA.

Pereira, F. (2007). Creativity and Artificial Intelligence: A Conceptual Blending Approach. Mouton de Gruyer.

Pereira, F. and A. Cardoso (2003). The Horse-bird Creature Generation Experiment. AISB Journal 1(2).

Pereira, F. and A. Cardoso (2006). Experiments with Free Concept Generation in Divago. Knowledge Based Systems 19, 459-471.

Perez y Perez, R. (1999). MEXICA: A Computer Model of Creativity in Writing. Ph. D. thesis, The University of Sussex.

Petric,I., T. Urbancic, B.Cestnik, and M.Macedoni-Luksic (2009). Literature mining method rajolink for uncovering relations between biomedical concepts. Journal of Biomedical Informatics 42(2), 219-227.

Petrovic, S., M. Osborne, and V. Lavrenko (2012). Using paraphrases for improving first story detection in news and Twitter. In Proceedings of NAACL-12, Montreal, Canada.

Raskin, V. (1985). Semantic Mechanisms of Humor. D. Reidel.

Riedl, M. and C. Leon (2008). Toward vignette-based story generation for drama management systems. In Workshop on Integrating Technologies for Interactive Stories - 2nd International Conference on INtelligent TEchnologies for interactive enterTAINment.

Riedl, M. and M. Young (2010). Narrative planning: Balancing plot and character. J. Artif. Intell. Res. (JAIR) 39, 217-268.

Riedl, M. O. and N. Sugandh (2008). Story planning with vignettes: Toward overcoming the content production bottleneck. In Interactive Storytelling, Volume 5334 of Lecture Notes in Computer Science, pp. 168-179. Springer.

Ritchie, G. (1999). Developing the Incongruity-Resolution Theory. In Proceedings of the AISB Symposium on Creative Language: Stories and Humour, (Edinburgh, Scotland).

Ritchie, G. (2003). The Linguistic Analysis of Jokes. Routledge Studies in Linguistics, 2. Routledge.

Ritchie, G. (2007). Some Empirical Criteria for Attributing Creativity to a Computer Program. Minds and Machines 17.

Ritchie, G., R. Manurung, H. Pain, A. Waller, and D. O'Mara (2006). The STANDUP Interactive Riddle Builder. IEEE Intelligent Systems 21(2), 67-69.

Rubenstein, H. and J. B. Goodenough (1965). Contextual correlates of synonymy. Computational Linguistics 8, 627-633.

Shutova, E. (2010). Metaphor Identification Using Verb and Noun Clustering. In Proceedings of the 23rd International Conference on Computational Linguistics.

Smailovic, J., M. Grcar, N. Lavrac, and M. Znidarsic (2013, pending). Predictive Sentiment Analysis of Tweets: A Stock Market Application. Data Mining and Knowledge Discovery.

Steedman, M. (2000). The Syntactic Process. Cambridge, MA: The MIT Press.

Steffe, L. and J. Gale (Eds.) (1995). Constructivism in Education. Lawrence Erlbaum Associates, Inc.

Stock, O., C. Strapparava, and A. Valitutti (2008). Ironic Expressions and Moving Words. International Journal of Pattern Recognition and Artificial Intelligence 22(5), 1045-1057.

Strong, C. and M. Mateas (2008). Talking with NPCs: Towards Dynamic Generation of Discourse Structures. In Proceed- ings of the 4th Artificial Intelligence and Interactive Digital Entertainment Conference.

Suchanek, F. M., G. Kasneci, and G. Weikum (2007). Yago: A core of semantic knowledge. In Semantic Web track of WWW-07, Banff, Canada, pp. 889-894.

Swanson, D. R. (1986). Undiscovered Public Knowledge. Library Quarterly 56(2), 103-118.

Theune, M., E. Faas, A. Nijholt, and D. Heylen (2003). The virtual storyteller: Story creation by intelligent agents. In Proceedings of the Technologies for Interactive Digital Storytelling and Entertainment (TIDSE) Conference, pp. 204-215.

Thibodeau, P., J. McClelland, and L. Boroditsky (2009). When a bad metaphor may not be a victimless crime: The role of metaphor in social policy. In Proceedings of the 31st Annual Conference of the Cognitive Science Society, pp. 809-814.

Tong, R. M. (2001). An Operational System for Detecting and Tracking Opinions in On-line Discussion. In Working Notes of the ACM SIGIR 2001 Workshop on Operational Text Classification, pp. 1-6.

Trabasso, T., P. vand den Broek, and S. Suh (1989). Logical necessity and transitivity of causal relations in stories. Discourse Processes 12, 1-25.

Turner, S. R. (1993). Minstrel: a computer model of creativity and storytelling. PhD. thesis, University of California at Los Angeles, Los Angeles, CA, USA.

Turney, P. (2002). Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In Proceedings of the Association for Computational Linguistics, pp. 417-424.

Turney, P. D. and P. Pantel (2010). From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research 37, 141-188.

Urbancic, T., I. Petric, and B. Cestnik (2009). RaJoLink: A Method for Finding Seeds of Future Discoveries in Nowadays Literature. pp. 129-138. Sringer.

Valitutti, A., C. Strapparava, and O. Stock (2008). Textual Affect Sensing for Computational Advertising. In Proceedings of the AAAI Spring Symposium on Creative Intelligent Systems.

Veale, T. (2005). Incongruity in Humor: Root-Cause or Epiphenomenon? Humor: International Journal of Humor Research 17(4), 419-428.

Veale, T. (2006). Tracking the Lexical Zeitgeist with Wikipedia and WordNet. In Proceedings of the 17th European Conference on Artificial Intelligence.

Veale, T. (2011). Creative Language Retrieval. In Proceedings of ACL 2011, the 49th Annual Meeting of the Association for Computational Linguistics.

Veale, T. (2012a). A Computational Exploration of Creative Similes. In F. MacArthur, J. Oncins-Martinez, M. Sanchez-Garcia, and A. Piquer-Piriz (Eds.), Metaphor in Use: Context, culture, and communication, pp. 329-344. John Benjamins publishing company.

Veale, T. (2012b). Detecting and Generating Ironic Comparisons: An Application of Creative Information Retrieval. In Proceedings of AAAI Fall Symposium Series 2012, Artificial Intelligence of Humor. Arlington, Virginia.

Veale, T. (2012c). Exploding the Creativity Myth: Computational Foundations of Linguistic Creativity. Bloomsbury Academic.

Veale, T. and Y. Hao (2008). A Fluid Knowledge Representation for Understanding and Generating Creative Metaphors. In Proceedings of the 22nd International Conference on Computational Linguistics.

Veale, T. and Y. Hao (2010). Detecting Ironic Intent in Creative Comparisons. In Proceedings of ECAI 2010, the 19th European conference on Artificial Intelligence.

Veale, T. and G. Li (2011). Creative Introspection and Knowledge Acquisition: Learning about the world thru introspec- tive questions and exploratory metaphors. In Proceedings of AAAI-2011, the 25th Conference of the Association for the Advancement of Artificial Intelligence, San Francisco.

Vijay-Shanker, K. and D. Weir (1993). Parsing some con- strained grammar formalisms. Computational Linguistics 19, 591-636.

Vladimir Propp (1928). Morphology of the Folk Tale. University of Texas, USA.

Weeber, M., R. Vos, H. Klein, and L. T. W. de Jong-van den Berg (2001). Using Concepts in Literature-based Discovery: Simulating Swanson's Raynaud-fish oil and Migraine-magnesium Discoveries. Journal of the American Society of Information Science and Technology 52(7), 548-557.

Wiggins, G. A. (2006a). A Preliminary Framework for Description, Analysis and Comparison of Creative Systems. Journal of Knowledge Based Systems 19(7).

Wiggins, G. A. (2006b). Searching for Computational Creativity. New Generation Computing 24(3), 209-222.

Wojtinnek, P. R., J. Volker, and S. Pulman (2012). Building semantic networks from plain text and Wikipedia with application to semantic relatedness and noun compound paraphrasing. International Journal of Semantic Computing (IJSC). Special Issue on Semantic Knowledge Representation.

Wrobel, S. (1997). An algorithm for multi-relational discovery of subgroups. In Proceedings of the 1st European Conference on Principles of Data Mining and Knowledge Discovery (PKDD'97), pp 78-87.

Yannakakis, G. (2008). How to Model and Augment Player Satisfaction: A Review. In Proceedings of the 1st Workshop on Child, Computer and Interaction.