Neural-Network Generated SCA Names and Devices

Neural networks are the new hotness in a lot of machine learning stuff, but they can also be used for amusement because they can generate natural-looking but deeply wrong output.  You might be familiar with deep dream, the result of an image recognition neural net gone wrong:

https://upload.wikimedia.org/wikipedia/commons/4/49/Deep-dream-white-noise-0050.jpg
You can also use them on text, and do things like make fake Magic The Gathering cards ("Mointainspalk AND Tromple?"), or fake Pokémon.

I decided to take a crack at this, and used the same neural network tool as those two, but pointed it at SCA names and devices.  Conveniently, the College of Heralds puts their database online (although it is perhaps not perfectly fresh, as I'm not in it).

I apologize in advance if any of the names or devices closely resembles yours - these things can sometimes just memorize their inputs.

Names

I've picked out what I think are some of the funniest names and listed them below.

I trained the network for 10 epochs (so 10 times trying to get better), and generated a bunch of names.  Some of the better ones (full list):
  1. Kirsten Banalarius Gollomartyn
  2. Jonethere of Elfrock Rocklowe
    Not to be confused with Jonethere Alfred Prufrock.
  3. Mybheard Urvegarðsson
    My beard!
  4. Murduinn of Severevach
    Sounds like a serious fellow.
  5. Ískáfn branda Bjótsingr of Sloiseaugulla
    Is that a place in Australia?
  6. Alexander the Red
    They're not all exciting names, sadly.
  7. Kinshandtjamzkeva
  8. Jonelaye von Snachwald
    „Snachwald„
I also tried training it only for one epoch, which produced noticeably more nonsensical (and, oddly, more Gaelic?) results:
  1. Kiane McConnos ap Stulffìy
  2. Ceanlen inghean Ui-Seonn y'Siondizon Mac Bkain
  3. C itsina Mirceverick la mexic Briotene Wood
  4. Tancrich Pevargouss Tryk Debonned

Devices

Likewise, a few of the funnier devices., run for 10 epochs.  More here.  Running for fewer epochs was mostly gibberish.

(Heraldry primer for those not familiar with this particular way of describing images)

Some of these are quite good devices:

  1. Azure, a hawk rising Or.
  2. Argent, a pall argent.
  3. Argent, an escallop inverted Or.
    Except that "Or, an escallop Or." is also a suggestion.
  4. Sable, two suns Or.
  5. Gules, a crab between two flaunches Or.
  6. Sable, two fish naiant contourny and on a chief Or three fleurs-de-lis vert.
But then they start to get weirder...
  1. Or, a hammer sable, on a chief azure three acorns counterchanged.
    I'm not sure how you counterchange this unless they're escaping the chief.
  2. Sable, in pale two piars argent.
    ...what's a piar?
  3. Argent, yigrant counterchanged.
  4. Purpure, a wine of diverse lynx Or.
    Furry wine.
  5. (Fieldless) On a mullet of five martlets argent.
    a mullet of five *points* makes sense, but I guess we can put a bird on it.
The longer ones are insane.
  1. Sable, a grey wolf argent rising from flames proper, an orcagh argent on a chief indented gules a sinister wing affronty, each maintaining in its tail entwined at point nowed Or, and on a chief rayonny gules a hawk's head erased to sinister Or, three Latin crosses Or.
  2. Or, a wolvet paviling charged with a three-leaf chess vert.
  3. Per pale sable and gules, a chevron cotised Or three thishles preying a roundel argent a tree blasted Or attired purpure.
  4. Per chevron azure and gules, a willoh voided tree blasted proper and a capite chatosed Oat argent batted and membey in chief two coneys salient respectant argent sustaining a boreall and Or, a bordure dovetailed gules.
  5. Or, two dragons rampant and a pithon between four brown his rays and drawn with a candle and a chain gules, surmounted by a thisle argent.
  6. Gules, a winged heron, tails to center, and on a chief vert three loaves gules, in dexter base to a sword withilien issuant from the tout in chief, all within a bordure Or.
  7. Or, a barrulet between two Maltese crosses Or and a chalice Or pits to sinister purpure charged with treess of arrows Or and a tower Or, each charged with a mullet of five greater and five two Or and three keys fesswise Or, two crosses crosslet fitchy argent, houndel Or.
    We really like gold, I guess.
  8. Azure, two cats fesswise a crown of ivy vine vert and a cannon beaste transfi between three pheons Or.
  9. Per bend sinister gules and sable, a chevron sable fimbriated argent and on a chief Or a demi-yak couchant vert and a papele gules a lut an ushlfet sable, a bordure counterchanged.
    DEMI-YAK
  10. Per pale azure and argent, a star of David, slipped and leaved, proper enflamed argent, a chief urdy argent fimbriated, in chief a mullet of six points and a natural dolphed dog and on a chief sable two compass stars azure.
    so a blue and white "star of David, slipped and leaved, enflamed argent..." this is actually genius.
I'd love to get some of these drawn.  I might print out and bring the list to Pennsic to see if I get any artists amused.  I did find a picture of #7 though:


Also, I should admit that I had to look up half of the words to see if they were real.  This thing is creepily good at making up plausible-sounding heraldry terms.

Device Prediction

If you give the model a starting text, it'll try to generate follow-up text.  Training it on "name|device", when we feed it "name|" it should start generating a device.

For myself,

Þórfinnr Hróðgeirsson|Azure, four gores purpure.

A shame that's not legal since it's color-on-color, and you can't really have more than one gore, but it's creative!

Let's see what it thinks for their Majesties, the current King and Queen of the East Kingdom:

Ioannes Aurelius Serpentius|Sable, a portcullis Or and a fir tree blasted and eradicated counterchanged.

Sadly, a far cry from his real arms, "Per pale gules and sable, a hydra and on a chief argent three frets couped gules."

Ro Honig von Sommerfeldt|Per bend sinister argent and sable, a lion's head cabossed argent and two sheaves Or.

Some similarities to her real arms, "Sable, a chief ore, in base a peacock in his pride ore."

If you post a comment, I can run this on YOUR name as well, at least until I head off to Pennsic.

How to do it yourself

These instructions work on my Ubuntu machine.  Your mileage may vary on other systems.
  1. Install TensorFlow.  I also installed CUDA to use my graphics card to do the processing faster, which was huge pain because NVidia is awful.
  2. Install TensorFlow-Char-RNN.
  3. Download the armorial database from oanda.
  4. The database is in the Latin-1 encoding scheme because it's apparently still the 90s.  Convert it to utf8 with
    iconv -f ISO8859-1 -t UTF8 oanda.db > oanda.utf8.db
  5. Parse out the devices, and the names from the database.  It's pipe-delimited for some reason.
    cat oanda.utf8.db | grep '|\([abBdDgs]\|BD\)|' | cut --delimiter='|' -f 4 > devices.txt
    cat oanda.utf8.db | grep '|\(AN\|ANC\|B\|D\|N\|NC\|Nc\|v\|vc\)|' | cut --delimiter='|' -f 1 > names.txt
  6. I had to patch sample.py here on line 110 to replace "sample" with "sample.encode('utf-8')".  I'll mail this patch upstream if I get the energy.
  7. Actually train the network and get it to generate a sample.  Use more epochs for more sensical results (and to take longer).
    TARGET=names; python tensorflow-char-rnn/train.py --data_file=${TARGET}.txt --output_dir=$TARGET --num_epochs=10 && python tensorflow-char-rnn/sample.py --init_dir=$TARGET/ --start_text="" --length=10000 | tee ${TARGET}.out
    TARGET=devices; python tensorflow-char-rnn/train.py --data_file=${TARGET}.txt --output_dir=$TARGET --num_epochs=10 && python tensorflow-char-rnn/sample.py --init_dir=$TARGET/ --start_text="" --length=10000 | tee ${TARGET}.out
  8. Read the output files (names.out, devices.out).

Comments

  1. The database is current through the most recently published results. When new results come out, the delay is usually measured in hours unless I am not in a position to take care of it right away.

    The database is Latin-1 for now because legacy reasons. Likewise the pipe delimited format. It's handy and useful. The master database is reduced to ASCII with a custom scheme for encoding everything else (Da'ud encoding, we call it). You'll see bits of it in the database where there are characters in the name that are not Latin-1.

    ...and when I say "legacy", I mean, more than two decades of format stability here. Maybe three. Also search code that is not greatly different than from two decades ago. Because it still works.

    ReplyDelete
    Replies
    1. I understand the legacy issue, and that format stability is important - I'm a software engineer by trade and see those issues all the time. Doesn't mean I'm not going to snark about it :)

      Unfortunately, using Latin-1 means that many European languages are missing characters. French, Latin with macrons, Hungarian, Finnish, Czech, Romanian, a common orthography for Old Norse (ǫ is missing), and others.

      If I were to wave a magic wand, I'd port the database to unicode, and publish a Latin-1 legacy table to not break things, but I understand that that may not be practical.

      On the other hand, Pennsic troll still can't handle Latin-1, so...

      Delete

Post a Comment