What is a word?

The way I intend to develop this question is contextualized by another question-and-potential-answer, namely, Q: ‘What is language?’ and, A: ‘Language is words.’

Broadly speaking, ‘language is words’ it not a totally terrible thesis. Many frames we have for thinking about language conceive of ‘language’ as words put together, in bits or en masse, perhaps in some ordered way. At the same time, it is possible to think of things which are very like language which might not strictly involve ‘words’, but in these cases I would wager that we would tend to say that they work like language, though their objects are different, i.e. their objects are not strictly words.

In these cases as well, asking the question of what makes something word-like usually works as a re-framing device, which brings us back to the question of, ‘What is a word?’

Thinking mainly in terms of information and dependencies (what I might think of as underlying the ‘structural’ tradition), I would propose the following:

  • Form is perhaps the most basic level at which we might register something as being a word. A word is sounded or printed, is either speech or script. Any other information that we might conceivably think of as having to do with a particular word is necessarily the subject of this form.
  • An assumption that I have admitted is that word-forms signify, or otherwise contain retrievable information.

At the same time, the considered combination of words appears to give rise to meaning on a higher level of complexity than random word-sequences. It could be a strong illusion, but if not then it appears that some kinds of dependencies between words must exist. About these dependencies, there appear to be at least two broad kinds.

  • The first kind involves thinking of words as forming broad categories, e.g. nouns, articles, counting words, etc. It seems valid to think about words as forming categories due to certain patterns in how they may be meaningfully combined, e.g. in English articles modify nouns. These patterns appear to be repeated at scale, or within a degree or two of fractal complexity. The test for membership of a category is usually substitution, i.e. that a word that is alleged to be an English mass noun (e.g. ‘shade’) can be substituted by another word which is known to be a mass noun.
  • The second kind involves words selecting other words according to a test other than categorial substitution. For example, it seems much less natural to think of ‘the red herring’ than ‘a red herring’. We may describe such dependencies as ‘associative’, or ‘collocative’, etc. These terms bespeak the suspicion that the underlying heuristic is frequency of association of words, within some sort of psychological reality, or other dynamic reality (e.g. social reality).

The paragraph immediately preceding happens to refer to what I think is the other way to think about what words are (i.e. not as structured information), namely that words are the artefacts of some sort of dynamic process, e.g. a cognitive process (possibly an evolved one, though my position is agnostic), a signification (semiotic) process, a social process, a discursive process, a market process (e.g. words as currency), etc. This way-of-thinking is impossible to avoid, insofar as I think that no matter how you slice it, words in use represent some sort of situationally dependent correlation between a form, an image or idea, and real objects or situations. However, it would probably be best to leave developing this idea to another post on another day.

Returning our focus to the two broad kinds of dependencies I mentioned, the point I wanted to get to was that it seems the difficulty is in deciding what the next most important level-of-distinction between kinds of dependencies is. Within either paradigm there is no difficulty in recognizing exceptions and marginal cases, and any of these can be taken (though not very usefully) as recommending the other paradigm. It is also easy to see how one paradigm is better equipped to handle certain kinds of problems than the other.

However it is more difficult to say, for example, what qualifies as a ‘compound word’, or define what ‘compounding’ might involve as compared to simple co-occurrence. It is easy to imagine how the attempt to develop this could be made within the second paradigm, but it is also important to questions of what the starting points are in developing some sort of categorial grammar.

Within the first paradigm, how substitutability gives rise to derivations seems to apply fractally to not just words, phrases, and ‘upwards’, but also to parts of words. This might be taken as indication of the importance of developing and refining a theory of derivations, with global principles applying to elementary and then more complex particles.

This place, from which I tried to ‘read’ the trends in how ideas about language are developing, is about where I wanted to go, and so it shall be where I stop.

Lexeme: ‘spoil market’

My target phrase was ‘spoil market’, which we (speakers of Singapore English) use idiomatically to describe an act or performance that is far more elaborate or troublesome than the norm, e.g. marriage proposals – among other things that a GloWbE search (of the sort I describe in the previous post) might reveal.

All in all, we have evidence for two kinds of ‘spoil market’ in web English, business/economic and love/relationships. These seem mutually exclusive except in Singapore. Interestingly, ‘spoil market’ turned up no hits from Malaysia.

‘spoil market’ in this extended sense might be distinguished from the restricted Singaporean idiomatic usage in that the extended sense would allow constructions like ‘spoil the market’ and ‘spoil my/your/our market’. However, it is possible that there might be a semantically related construction (i.e. related to the idiomatic ‘spoil market’) in spoken Singapore English that is extendible in this way, e.g. ‘sorry for spoiling market guize’.

I posted a query for ‘spoil’ with only right-collocate ‘market’ at maximum distance 9, which turned up the following results:

4 hits for .sg (Singapore), excl. 1 rejected hit.: I had hoped for more hits here, but this is understandable given that the corpus search will turn up mainly blogs and not spoken English.

  • Here there are 2 hits for ‘spoil market’, one in a romantic context to describe a ‘guy’ whose actions might cause ‘every girl’ to ‘want something epic also’. The other is someone who blogs fashion reviews, arguing that her fellow bloggers should not accept free stuff from merchants to review, in lieu of presumably better payment.
  • There were also 2 hits for ‘spoil the market’, referring to rent and wages.
  • I rejected 1 hit, for ‘spoil your market image’.

3 hits for .∅ (the US), excl. 1 rejected hit:

  • The context of the 3 accepted hits was business analysis, i.e. ‘spoil the market’, ‘spoil our market’, ‘spoil both of their market strategies’
  • The rejected hit was about transportation delays causing vegetable produce to spoil on the way to the market.

3 hits for .ng (Nigeria), 1 hit for .gh (Ghana), 2 rejected hits from .gh: 

  • The 4 hits were all about the love market (marriage and relationships). Here there is ‘spoil my market’, and less fortunately how your family could ‘spoil your market’, as well as ‘dey spoil market’ (‘dey’ presumably for ‘they’).
  • The 2 rejected hits were about vegetables.

1 hit for .bd (Bangladesh)

  • In this case, the writer is discussing how risk-taking firms can “spoil” markets (the writer uses scare-quotes), citing Stiglitz (an economist) on markets.

1 hit for .in (India)

  • ‘investors can spoil the market’

Rejected 1 hit each for GB and .au (Australia)

  • These were false collocates, in that they occurred within 9 words to the right but in a different clause or sentence, e.g. ‘spoil a wonderful spin-off to please a different market’.

Lexeme: ‘mischievous’

It was Singapore’s law minister’s use of the word some years ago which first struck me as interesting. I think most of us are familiar with the sense of the word that suggests playful troublemaking, but it was the sense of ‘intended to cause harm or trouble’ that Mr. Shanmugam was using, to refer to the actions of someone or other who was being sued for defamation/contempt/something else in that vein. Hearing the word used while being less familiar with the second sense, it had an interesting effect, because, as I interpreted the situation, he was attempting to highlight the seriousness of the litigant’s actions. However, then I had associated the word with playfulness.

The question I considered with this GloWbE search was how likely my experience was to be shared by other Singaporeans, or English speakers elsewhere.

Geographical Spread

One interesting result is regarding the absolute popularity of the word (as indicated by the darkness of the shading). The darkest shades were for Nigeria, Ghana, and Sri Lanka. In the next tier were India, Pakistan, Jamaica, Malaysia, and Singapore. The word was light-shaded everywhere else, though interestingly it does better in GB (677) than in the US (464). This might be significant especially if the US-attributed corpus is more extensive.

One interesting pattern is that the popularity of the word is low across Kachru’s inner-circle countries (the first six ccTLDs in the GloWbE results). As for the other places where it is unpopular, we had two African countries where it was popular, but it is less popular in Tanzania, Kenya, and South Africa. It is fairly popular in South Asia, except for Bangladesh. As for the Caribbean, Jamaica is the only major ccTLD represented in GloWbE.

These are interesting results, in that British colonial history is not clearly a good explanatory factor, since all the countries in the African region were under British rule at some point (probably why they’re in GloWbE in the first place). This may partly be a problem with sample size, however, because the numbers outside of the inner-circle are low across the board, with the probable exception of India. (To illustrate, light-shading applied to Australia with 257 and the US with 464, while India is a shade darker at 257. Dark-blue shaded Nigeria and Ghana are at 212 and 187 respectively.)

What might be a less weak conjecture is that the word is more popular where there has been influence from India or the diaspora. This would account for Jamaica (the West Indies having been a major destination of Indian emigration) and South Asia (with the exception of Bangladesh), as well as the low popularity of the word in Hong Kong and the Philippines. However, this works less well for Africa, since Kenya and South Africa have had similar or greater influence than Ghana or Nigeria.

Usage Data

While the results about geographic spread are interesting though inconclusive, the semantic analysis is less equivocal. Where ‘mischievous’ is more frequently used, the second (negative) sense seems more likely to be used.

In Nigeria and Ghana, it is often a testimony, accusation, ploy, etc. that is ‘mischievous’, though the other sense is not absent either (‘mischievous little raccoon’). In Jamaica, we get ‘deletrious and mischievous’, and in Sri Lanka, we get ‘mischievous and presumptive’, ‘mischievous oversimplification’, and (my favorite) ‘mischievous mystification of history’. In Jamaica and Sri Lanka the less malign sense is common too (e.g. ‘mischievous grin’ and ‘mischievous smile’). However, even where the less malign sense is employed, one could argue that its influence is not absent, e.g. ‘with a wicked yet mischievous smile’.

In contrast, in the US (and also the Philippines), this sense appears far less popular, relative to the other sense. A dog is mischievous, and a piece of music is called ‘playful and mischievous’. A contrasting example would be the ‘mischievous fact’ a book reviewer points out, in an argument he appears to be critical of. (While I’m not entirely sure from the sample snippet if this reviewer is against ‘cyber-utopianism’, I think it is safe to assume that he is not suggesting that the fact is playful, or that playing with facts is more culturally acceptable.)

Comparing this to the US, in the UK a semantic ambiguity, a survey about religion, and a ‘sinister purpose’ are called ‘mishievous’. We also get ‘mischievous’ in two related articles about a medical disease (one appears to be the same article as the .jm-attributed ‘deletrious and mischievous’). However, the other sense is also present (e.g. dogs, birds, a Playboy compilation).

In Singapore, Malaysia, and Hong Kong, the results are more evenly mixed (though I might declare a slight positive leaning in Hong Kong). On the negative side, we have a mischievous comment (.sg), mischievous reasons for accessing a database (.my), and North Korea’s mischievous behavior (.hk). In Singapore, Malaysia, and Hong Kong, we have mischievous tricks and mischievous boys. Regarding this character of the mischievous boy, this was the familiar, pre-Shanmugam usage of the term for me, and interestingly it appears quite localized.

Finally, I looked at the results from Bangladesh and South Africa, two countries which appeared to be anomalies in our consideration of the geographic spread. However, here I found that the negative sense was common, with ‘mischievous motives’ (.bd) and this interesting example from South Africa (.za): ‘To say he is injury prone as a result of these two unfortunate injuries is mischievous!’ This was actually the top result in GloWbE for .za, and I think it’s an interesting example because it’s nominally about a sports league: a playful pursuit, surely? especially since the sentence ends with a ‘!’ – though I suppose there are those who hold that rugby is a serious business.

Semantic Analysis

Overall, it seems ‘mischievous’ might be something quite close to, if not exactly a clear example of, a contronym (non-technically, a word that means both itself and its opposite, like ‘sanction‘ – though this has been called an auto-antonym). On the one hand, we have (1) serious-mischievous, which tends to be bad-mischievous; while on the other we have (2) playful-mischievous, which is, presumably, lovable only insofar as it is not serious.

In this respect, ‘mischievous’ seems similar to the word ‘provocative’ – provocative good, or provocative bad? One can imagine both. Sometimes ‘mischievous’ and ‘provocative’ collocate, which I suppose gives us 4 ways to be mischievously provocative.

But perhaps a better analysis is that serious-mischievous is serious because of context. Reasons and motives are mischievous when the contention is legal, political, or otherwise public-related; and even the relatively uncommon medical context seems to bleed some of that seriousness into the bad-ness of ‘mischievous’. In the UK, serious-mischievous might also fairly high-brow. It applies to characters in plays (‘a mischievous slave who would do just about anything for his freedom’), and comes up fairly often in reviews of plays, books, and art (especially if it could be called a reinterpretation of something else). The cultured/intellectual frame of the word is also one of the less-infrequent uses of serious-mischievous in the US.

To attempt a sum up, it appears sense (1) applies to human reasons, motives, and considered conduct, while (2) applies to animals and the human personality (or at least, those parts of human personality commonly regarded as less-considered).

Comments on GloWbE-aided analysis

There are clearly some limitations to the GloWbE-aided analysis, e.g. size of country samples unclear, some very small samples, attribution via ccTLDs, etc. To illustrate, perhaps this WordPress blog post will push the frequency-count for mischievous higher once it gets incorporated, and if it does will it count under .∅ (I mean the US) or .sg? However I think my main takeaway from this exercise is that looking for collocations would not have given rise to the description of the (1)/(2) contrast I posited. More generally, I suspect a machine-reading of the corpus would not have easily identified this contrast, even with a sophisticated syntactic parser.


More recently (2013), Mr. Shanmugam used the word ‘mischievous’ again, but this time he made his meaning more transparent: ‘ “Any forms of cyber attacks or threats are actually threats on the people regardless if the intent was malicious or mischievous,” he added.’

Doing ‘Discourse’ (Part I): Situating and Qualifying CDA

The main problems Chilton considers in his chapter on ‘Missing links in mainstream CDA’ (published in this volume) have to do with CDA’s foundational claims. Taking inspiration from Chilton’s book chapter, I would identify three of these claims, namely that:

  1. Language-use is, among other things, social practice.
  2. CDA is useful or otherwise valuable to human society.
  3. Discourses exist, we know what they are, they are analyzable, and they are worth analyzing.

I see these claims as being ‘foundational’ in that they situate CDA relative to some starting point in the preexisting literature, as well as motivate (i.e. provide a rationale for) the project of CDA.

Beginning with the first claim, if language-use (and language-constituted ‘discourse’) is social practice, then one implication is that CDA is as relevant as anything else that has come in the tradition of social theory (power, consent, etc.) and social critique (exposing power structures, demystifying previously ‘opaque’ concepts, etc.). If one accepts the relevance, validity, or usefulness of the body of social theory I have broadly alluded to, it follows that one would appreciate the significance of the CDA project at least along those dimensions.

This particular implication is also relevant to the second claim I identify, about the usefulness of CDA. Thus, when Chilton brings the ’emancipatory’ mission and ‘demystifying’ function of CDA into question, we have a ready response, insofar as language is indeed social practice, and insofar as we buy into the general body of social theory. In our class discussion, as well, I found many of the arguments for the usefulness of CDA to be true for the usefulness of social theory in general, e.g. the value of being able to analyze and discuss social processes.

This is one sense in which ‘CDA’ claims to be ‘Critical’ – but we see that the function of ‘Critical’ here is also to situate CDA relative to the tradition of social critique. (Although I would also argue that the main ‘Critical’ it situates itself against is theory of literary criticism, as developed in the Western academy; Chilton doesn’t question this move in his article, however.)

However, even if we are able to achieve some understanding of social processes as enacted through language through the practice of CDA in the mode it currently adopts, Chilton questions our readiness to affirm the potential of this understanding in enacting social change. Can we really claim to be able to understand the bases of linguistic behavior and social behavior in a way that is comparable to how a financial analyst or policy-maker might claim to understand the behavior of economic actors in a certain system, or how a chemical engineer might claim to understand the physical principles governing the efficiency of a production process? Are social and linguistic behaviors ‘manageable’ in similar ways, and can we indeed claim to be contributing to the achievement of some socially beneficial outome(s) when we do CDA?

In Chilton’s evaluation, mainstream CDA has not addressed these questions, leaving an explanatory gap that undermines CDA’s claims about the ’emancipatory effects’ of the enterprise, and its claims about the social utility of the enterprise more generally. In comparison, Chilton discusses how an evolutionary-biological theory of cognition has greater explanatory power than CDA, when it comes to explaining social behavior and linguistic behavior (e.g. in the model explaining how apparently small ‘cultural inputs’ are enough to lead to persistent and significant cognitive biases).

But ultimately it is the third foundational claim which I wish to expand on, partly because I didn’t hear it come up during class discussion very much, and partly because Chilton is quite oblique when he alludes to this problem, to the point that it almost doesn’t appear as a problem in the discussion. I will expand on this in Part II.