All has not been idle on this front, though I did go through two very uncertain weeks, which were followed by two weeks of reorientating and conceptualizing.
I’m using this blog again to just sort out some thoughts from looking through the data I am compiling. I don’t think I will be able to exhaustively tag everything, but I need to note some general trends and describe how some of the categories are emerging.
At least from a quick first look, possessive determiners (pd) are the most common NPs. Their properties tend to be [+def, +spec]. This preference is really quite evident. Possibly worth comparing to CoSiB corpus – actually may not be too difficult to implement a search, since pd is a closed class. If it is super common, this would be a variational feature of note, that definiteness gets marked this way preferentially, over the article.
Unusual nouns and N-Ellipsis
I haven’t been able to find that many unmarked bare common nouns which are ambiguously (in)definite. Many are mass or kind nouns, or seemingly proper nouns. A possible exception is ‘O’-level, which seems like a proper noun, but part of the case for interpreting it as a common noun being that it can be pluralized – but probably that won’t work. It usually means ‘~ results’ or ‘~ exams’, too, so it’s easier to interpret it as N-ellipsis (nominal ellipsis).
Pro-drop is very prevalent, even in mid-sentence. Comparing mid-sentence Pro-drop with the usual kinds of ellipsis in English, we see that it is usually verbs, modals, their respective phrases, dependent clauses, or entire clauses that are gapped or sluiced – but not NPs.
Some sentences are structured, ‘Got NP?’, and it seems like pro drop applies, i.e. ‘e got NP?’ However, ‘Got NP’ can also be in the topic position, e.g. ‘Got one time they went KL.’ ‘Got’ in the latter construction functions in a way comparable to expletive-‘There’.
Abandoned this post half-way; I decided I probably should not do an exhaustive analysis of trends, since there are too many dimensions and too much data to tag.
Now it’s Week 10, essentially. I’m about 5k words in on the Overleaf document. Some ways to go.