Questions about character encoding
- Greek characters: How should these be encoded? Just put in as Unicode characters? Currently, they're listed like "&gras;&grn;&grt;" which doesn't validate because they haven't been defined. If I'm to replace them with unicode characters, I'm not sure what the mapping is.
Greek characters will be entered as Unicode. John Walsh will take care of this.
- When a page break occurs at the very beginning of <text>, before the <front>, I assume it should be moved into the <front>?
Yes, these breaks should be moved into front.
- If a page break occurs between the <front> and the <body>, should I leave it there, or move it into the front or the body?
These breaks should be moved into <front>. In general, it's best to have these empty pages at the end, rather the beginning of a section. Thus, if there are empty pages between <body> and <back> they should go at the end of <body>, rather than the beginning of <back>. But, the first page of the new section should go inside the new section. That is, if the <body> starts on page 6 then then the <pb n="6"/> shoudl go inside <body>, not inside <front>. Or if <back> begins on page 374, then <pb n="374"/> should go inside <back>, not <body>.
- There are are large number of <pb/> tags that occur within a <body> section, but between the <divs> for scenes and acts. Is that okay? Or should they be moved into a scene/act?
Questions about my validation workarounds
- Validation: <head> tags within <front>: There are some <head> tags in the <front> before the epigraph. But this causes a validation error. My current workaround was to follow up the empty <head> tags with an empty <p/> tag, and enclose them all in a <div>. Enclosing in a div seems appropriate, but the empty <p/> tag seem like a bit of an ugly hack. Should I solve this differently?
Yes, this should be handled differently. Then emtpy <p/> tag is no good. What's in the head? Please post examples or point to example files.
Q: These <head> tags tend to contain the title of the play, and precede the dedication. An example is in Marino Faliero. If I uncomment the <head> tags (and move them into the <front>), they actually validate okay, but they make the subsequent <div> invalidate.
- Validation: <head> tags before <front>: There are some <head> tags in the <text> element, occuring before <front>. I imagine these should get moved into the <front> section.
Yes, these should be moved to <front> and fixed as in the above bullet point.
- Validation: <head> tags within <castList>: These don't validate, and the <div><head>...</head><p/></div> trick doesn't work here. What's to be done?
Again, can we get an example? <head> elements are allowed immediately after <castList> and <castGroup>. Perhaps we need to add <castGroup> for each section with a <head> within <castItem>. Looking at the marinofa01 example, it looks like adding a <castGroup> is the appropriate solution.
- Validation: <stage> tags within <castList>: These don't validate. Wrapping them in a <div> makes it validate. Is that appropriate?
I see a <stage> outside of <castList> but within <front> in marinofa01. In this case, keeping the <stage> outside of <castList> and wrapping <stage> tags in a <div type="stage"> is the appropriate solution.
- Validation: <stage> tags within <lg>: I had to enclose the <stage> tags within a <l> tag to validate. Was this the right work-around?
Putting <stage> within an <l> is inappropriate. Better to close the <lg> and start another <lg> after the <stage>.
Q: This occurs twice in Mary Stuart, on lines 6731 and 6750:
<l n="26">Bid them come in. </l>
<!-- added <l> tag around the stage direction so it'd validate-->
<l><stage rend="i right">[Exit <hi>Mary Beaton</hi>.</stage></l>
<l n="27" rend="right">I cannot tell at last</l>
So, you're saying I should close off the <lg> and make a new one, keeping existing n= values for <l> elements, and end up with this:
<l n="26">Bid them come in. </l>
<stage rend="i right">[Exit <hi>Mary Beaton</hi>.</stage>
<l n="27" rend="right">I cannot tell at last</l>
Questions about rend tags, <speaker>/<stage> reorderings
- In a <sp> element, <speaker> has to come before <stage>; neither can contain the other, and you can't have <stage> then <speaker>. Originally, a <speaker> element was sometimes contained within a <stage> element, so I broke them apart and rearranged them. I also applied the orignally outermost rend attribute (from <stage>) to both elements. However, I'm not exactly sure how rend works, so while I think this method should be fine for the attribute rend="i", I'm not sure what happens with "center" if I use it on 2 elements in a row - will both be printed on the same line, centered? Or will a line break be added after the first centered element? I suppose I could contain them both in an wrapper element that was centered, but I don't know of one that would validate (<div> doesn't work). So the question is: Should I:
- Just forget the <speaker> tag, put putting the entire line in a <stage> tag which can therefore be rendered "center" without problem?
- Or keep the separate <stage> and <speaker> tags? If so, should I do anything different for the rend attribute?
Please post some examples or pointers to files for this one.
<speaker> and <stage> should be separate elements with their own rend attributes. By default, the stylesheets will put <stage> on a new line. If <stage> appears on the same line as <speaker> we'll need to be more creative. If this is the case, I recommend the following:
<speaker>Hamlet <seg>(to keep away form Ophelia)</seg></speaker>
In the above case, the speaker's rend values would apply to the child seg as well, though you can add additional rend values to the child elements.
Q: An example from the raw Mary Stuart XML, line 5080:
<stage rend="i center"><speaker>Shrewsbury</speaker> (to Kent apart).</stage>
This doesn't validate, because the <stage> and <speaker> elements have to be separated.
So, in the TEI-valid Mary Stuart on line 6936 I have this:
<speaker rend="i center">Shrewsbury</speaker><stage rend="i center"> (to Kent apart).</stage>
The problem is - will applying the "center" rend tag across two elements put them both centered on different lines, instead of run together on the same line? The alternative that comes to mind to preserve the rendering involves sacrificing the <speaker> tag:
<stage rend="i center">Shrewsbury (to Kent apart).</stage>
(Although the <speaker>/<stage> reording problem occurs frequently, most of the time there's either no rendering or italic rendering, which should not present any problem to splitting up the elements. I'm only concerned about the "center" values.)
- So, howsabout I split up and rearrange the <stage> and <speaker> tags when there's no rend="center" value, to be more accurate when I can, and drop the <speaker> tag and just to a <stage rend="center"> tag to preserve centering when necessary?
- Or is there some method to preserve both the stage/speaker distinction and still maintain accurate rendering?
Modifications to be made based upon the original text:
- For duped headings ("9" and "IX"), see which one the book uses. Drop the other
- Possibly use text to verify rend tags for <stage>/<speaker> reorderings
Contents of teiHeader
- What all is supposed to go into this? The one's I'm using currently are largely empty, and the information they do have pertains to John Walsh and the XML document, rather than the original publisher and publishing information. Should we use the actual text publishing information?
- The first few documents have headers from the original SGML (commented out) which have lots of citation information. If this does end up going in the teiHeader, what should I do about the later tragedies, which don't seem to have that information?
The teiHeader sections can be completed and modeled on any of the existing swinburneArchive documents. To get the XML go to a document, select the "[view document information]" link at the top of the page, and follow the "xml source" link.
Q: Okay, I'll mimic those, though I suspect questions may arise in the process.
- I've mimicked the <teiHeader> style from the other Swinburne Archive documents, and posted an example in an updated version of The Sisters. I've also posted just the <teiHeader> section below for ease-of-access. Let me know if something is wrong here. My basic question is: "Atlanta in Calydon", there are 2 monograph sections, the second having an attribute of n="originallyPublishedIn". I've just mimicked the first one; should I have the second, as well, and what should it refer to? I'm not sure what this section means, exactly.
<title n="dc.title">The Sisters</title>
<persName reg="Swinburne, Algernon Charles, 1837-1909">Algernon Charles
<persName reg="Walsh, John A.">John A. Walsh</persName>
<publisher n="dc.publisher">Library Electronic Text Resource Service (LETRS) / Digital
Library Program, Indiana University</publisher>
<p>Copyright © 1997-2006 John A. Walsh and the Trustees of Indiana University.</p>
<p>Permission is granted to copy, distribute and/or modify this document under the terms
of the GNU Free Documentation License, Version 1.2 or any later version published by
the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and
no Back-Cover Texts. A copy of the license is included in the section entitled "GNU
Free Documentation License."</p>
<title>The Swinburne Project: An Electronic Edition of the Works of Algernon Charles
<name>John A. Walsh</name>
<author>Swinburne, Algernon Charles</author>
<title level="m">The Sisters</title>
<title>The Tragedies of Algernon Charles Swinburne</title>
<publisher>Chatto & Windus</publisher>
<!-- second monogr section goes here, if applicable -->
<tagsDecl> &renditionElements; </tagsDecl>
- The Duke of Gandia doesn't seem to be in the 5 volume collection. So, I take it we're not producing two different versions of that one? And what should I put for the <monogr> entry in the <teiHeader>?