Pages

Stylized Phrase Generator

From time to time I experiement with new technologies and new ideas.

This idea was inspired by Alessandro, a dear and very creative old friend that wrote some very nifty Italian text generators.

Stylized Phrase Generator, or SPG, is an experimental software prototype that I wrote for fun in my rare spare time. It is designed to read a text (or book), learn the style and the vocabulary and generate any number of new sentences using the same “style” and vocabulary learnt from the original. The result is designed to be “readable”, to look and sound like the original, but not necessarily make any real sense. In other words, SPG is a generator of Stylized nonsense.

In its current incarnation SPG is written in Java. It can create any number of pages of output composed by any number of sentences. The output can be formatted based on text templates and encoded based on any filter that can be expressed in Java. Currently the available encodings I wrote are for XML and HTML escaping and formatting.

The current template that I’m using is designed to generate an Atom XML feed that can be burnt into an RSS feed and distributed for the amusement of subscribers.
The first feed generated by SPG that I’m going to publish and update periodically can be found here:

http://feeds.feedburner.com/StylizedGeneratedFun

It is published by FeedBurner and you can subscribe to it; please post here your comments on the feed. I am very interested in knowing what you think.
If enough people show interest I will keep adding new styles and adjust the algorithm for the amusement of the readers (keep in mind that this is a spare-time project, I do this for fun and I am a busy guy, so progress may be slow).

How does it work? In brief the algorithm takes the original text and builds a vocabulary of all the words and puntuaction marks found in it. For each word the algorithm builds a list of all words that immadiatly followed it at any point in the text. This list is prioritized based on the relative probability. For example if the word "the" is followed 2 times by "house" and 1 time by "cat" in the text, then the probability of "house" to follow "the" is 66%, and the probability of "cat" to follow "the" is 33%. Once this learning phase is completed, the algorithm can use this knoweldge to generate phrases. Words are randomly picked based on that knoweldge and respecting the probability to occur in the available learnt succession.

If there is any interest in the idea I’ll eventually describe the algorithm in more detail and perhaps publish the Java source code and make it Public Domain. The method described here may change as I refine it. As you know, the devil is in the details!!

Let me give you some example of the first results I obtained. For more examples just subscribe to the feed and keep following it as I add new styles and as I refine SPG.

I fed to the software the text of “As You Like It” (the script of the William Shakespeare play), and here is a snippet of what it generated:

ROSALIND. How say you have mercy, and happiness; for fashion bequeathed me thou liest in some hope. And if thou, and wip'd with my heart Lie Direct; I am not so.
Here is an example generated from the text of the English constitution:

The moderate Liberal rule by a rectified perception of a great majority of the rarest. English are not good at social power. When the repeated influence equivalent to expect what is scanty, in the sudden emergency, this change is undoubtedly positive.
As you can see the text is readable but it speaks nonsense. On the other hand it can be entertaining to read especially when it happens to acquire a meaning that is, or can be interpreted, as real.

One of the interesting things I learnt is that the same program works equally well, without modifications, if applied to text written in other languages. So far I have tested it with some English and Italian texts.

I also discovered that, in general, the more the reader doesn’t understand the subject and the language in the original text, the more the sentences generated by SPG seem to make sense.

This tells you something interesting about human mind. When you do not understand a subject you tend to believe and find meaning of what you hear, even if what you hear is a complete nonsense.