Data collection method: |
To select individual texts, our initial approach was to apply Gabrielatos’ (2007) method which is especially useful to determine query words or phrases which favour the retrieval of a wide range of relevant texts from a restricted-access database. Briefly, Gabrielatos (2007) suggests using a core query consisting of two or three words/phrases as a starting point to compile a pilot corpus. This pilot corpus is then used to identify additional relevant query words/phrases. These are words/phrases that tend to occur in texts where the core terms are also used, thus they are at least in principle closely associated with the core terms in a significant number of contexts. The ultimate purpose of applying Gabrielatos’ (2007) method is to identify words/phrases that would return articles on the topic under investigation, even though core terms themselves are not used in them. At the same time, these additional terms should not create undue noise, that is, useful additional terms are those that retrieve a sufficient number of articles which do not contain the core terms but are still relevant. Given the restricted time period examined in this study (2014 only), we opted for compiling an initial corpus using all articles published in the chosen four newspapers (Folha de São Paulo, O Estado de São Paulo, Zero Hora and Pioneiro) in the entire period (Jan-Dec/2014). This initial corpus would then be used to identify additional relevant query words/phrases as suggested by Gabrielatos (2007). Our first attempt was to use the Portuguese equivalent for urban violence (violência urbana) and violence in cities/towns (violência na(s) cidade(s)) as our core query terms. However, these two terms did not retrieve as many texts as one would expect in a country where urban violence is a major issue. Overall, urban violence (violência urbana) appeared in 66 articles and violence in cities/towns (violência na(s) cidade(s)) in 10 articles. Neither was violence in the street(s) (violência na(s) rua(s)) frequently used: 22 articles in total. In an attempt to identify search terms that would lead to a higher number of texts on urban violence, we then searched for urban security (segurança urbana) and public security (segurança pública). Urban security (segurança urbana) is not frequently used in Brazilian newspapers either: 50 articles in total. Public security (segurança pública) on the other hand is frequently mentioned: 1,809 articles in total. Violência urbana (urban violence) and segurança pública (public security) were then used to compile a pilot corpus so that Gabrielatos’ method could be applied to identify additional search terms. The method pointed to three additional terms: criminalidade (criminality), homicídio (homicide), and roubo (robbery/theft). While relevant, using homicídio (homicide), and roubo (robbery/theft) as query terms would result in a biased selection of texts that would inevitably favour texts about these two crimes specifically. This would not allow us to have a clear picture of what crimes are most frequently mentioned in Brazilian newspapers, the project’s research question #1. Our decision was therefore to complement the list of query terms with crime names mentioned in official government statistics as well as other crimes the researchers would intuitively deem important. Also, in an attempt to gather as many relevant texts as possible, we opted for expanding the collection of texts to all word forms related to the selected crimes names. Thus, for example, rather than using roubo (robbery/theft) as a query term, we used roub* which retrieves texts containing roubo as well as roubos (plural form), roubar (to rob/steal), roubou (robbed/stole), roubado (robbed/stolen), etc. While useful to identify texts related to urban violence in Brazil, using crime related words as query terms has nevertheless introduced some undue noise. A number of texts in which these terms appeared referred to violence and crimes in other parts of the world, rather than in Brazil: murders in Iraq, kidnapping in Nigeria, homicides in war zones and so on. In addition, there were also a large number of texts referring to issues other than urban violence such as corruption, internet crimes and labour issues, in Brazil and somewhere else as well as articles related to cinema (especially thrillers) and crime fiction. To make matters more complicated, one cannot ignore the metaphorical nature of language. There was also a large number of texts in which our query terms were used metaphorically and not at all related to urban violence: roubar a cena (steal the scene), roubar meu lugar (take over my place), furtar-se a fazer alguma coisa (avoid doing something), etc. To minimize such noise, we have discarded a wide range of topics in the actual retrieval of texts from the Factiva news aggregator. The topics discarded are shown under the lave “subjects” in Figure 1. They were identified on the basis of a random analysis of the texts within such categories. We have also discarded texts containing one or more of the following words/phrases: comissão da verdade (truth commission – a committee established in 2012 to investigate violations of human rights by the Brazilian government between 18/Sep 1946 to 05/Oct 1988), Bolsonaro (a Brazilian congressman, infamous for his controversial comments on rape and human rights), Petrobrás or Petrobras (Brazilian oil company at the centre of a corruption scandal), ditadura (dictatorship), ditador (dictator), Al-Quaeda. These words are shown under “None of these words” in Figure 1. Also, within the Factiva search options, we have chosen to discard identical duplicates and also republished news, recurring pricing and market data, obituaries, sports, calendars. All texts meeting the criteria above were retrieved in full, including their headline(s). This means that there was not filtering according to the section of the newspaper in which the text was published. In other words, the corpus contains news reports as well as editorials, opinions, interviews, or any other text type. It is also important to stress that texts were selected irrespective of the number of query words/phrases it contained and their frequency within each text. This means that the texts included in the Brazilian Corpus on Urban violence vary in relation to the extent to which urban violence is discussed. Here, any reference to urban violence is considered relevant, even if urban violence is not the main topic discussed in the text. This enables us to look at both texts discussing urban violence issues in detail as well as those in which urban violence issues are mentioned in relation to another topic. Such approach broadens the scope of the analysis and enables us to examine situational contexts which are directly or indirectly associated with urban violence. |