Breaking a Heraldric Log Jam: Applying The China Biographical Database Project to The SCA
In the SCA, it's conventional to assume a "medieval name," so that we don't wind up having Duke Joey the Fourth fighting for the throne against Sir Moondust. It helps the atmosphere, and we've been doing it long enough that we don't really question why anymore.
To facilitate this, the SCA has a College of Arms which helps people choose names which are documentably historical. This is a good idea, and their rules are pretty good for European names.
They don't really work well for Chinese names, though. Historically there haven't been a lot of people creating Chinese personas* because the SCA centers Europe. But with the recent change in the mission statement to explicitly break this centering:
The Society for Creative Anachronism (SCA) is an international non-profit volunteer educational organization. The SCA is devoted to the research and re-creation of pre-seventeenth century skills, arts, combat, culture, and employing knowledge of history to enrich the lives of participants through events, demonstrations, and other educational presentations and activities.
Interest is growing. Unfortunately for people with this interest, Chinese historical naming practice doesn't align well with the rules the heralds have laid out. There are two broad categories of problems:
- Current rules require registering a name in Latin characters, maybe with accents or a couple of funny letters, but no tone markers, and certainly not Chinese characters. This is a problem, but is hard for me to do much currently about it.
- Chinese naming practice does not generally repeat given names in whole cloth. In fact, doing so was often taboo (see also).
Because The Rules require a certain strength of documentation that a name existed historically, which typically means that it appeared in multiple places, across different people, it's very difficult to document Chinese names to the satisfaction of The Rules.
However, we can play the two problems off of one another. If Chinese names have to be registered in a romanization, say, pinyin, and we can't use tone marks, then the space of names condenses substantially from the space where they are written in characters.
Through another stroke of luck, a kind SCAdian pointed out that a project out of Harvard and a bunch of other institutions, the China Biographical Database Project, has collected some 422,600 people, their names, estimated dates, and other associated information. About 200,000 of them were born before 1600. This is tremendous. I had found some paper name dictionaries in the past, but without a digital copy it's impossible to perform this kind of analysis.
I've got a few files to share. Let me explain what you're looking at. All of the grouping below is by pinyin, not characters.
- The files start with the labeled gender of the people in them
- The second word of the file name is either "surname," which is indexed by surname, or "mingzi" which is indexed by the given name of the person.
- The file is either _counts, which contains the number of occurrences of that surname or given name in that gender, or does not contain counts, and instead contains a list of people with that name and their "index year" which is approximately the year they were 60 years old. Their name is then rendered in characters so you can go look it up on CBDB if you want more information.
- The rest of the file name is intended to document the CBDB dump I used to generate these files.
You should be able to open the files in Google Drive, or download them and use notepad or whatever text editor the kids are using these days.
Please bear in mind that there are a lot of rules about what makes names good or bad, and this does nothing to try to address that. You should also read this paper on Chinese onomastics, which surveys a few more topics: this dump doesn't address styles (zi) or nicknames (hao), while any sensible (male?) noble in medieval China would have had a style. This database has some, but analyzing them will be future work.
I think the database is licensed under CC BY-NC-SA 4.0, so this view of the data is too.
* And, yes, a white person creating a Chinese persona carries with it a substantial risk of harmful cultural appropriation. That is not the topic of this blog post.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.