Qualitative methods of behavioural assessment use observer rating scales to score the overall demeanour or body language of animals. Establishing the reliability of such holistic approaches requires test and validation of the methods used. Here, we compare two methodologies used in Qualitative Behavioural Assessment (QBA): Fixed-Lists (FL) and Free-Choice Profiling (FCP). A laboratory class of 27 students was separated into two groups of 17 and 10 students (FL and FCP respectively). The FL group were given a list of 20 descriptive terms (used by the European Union’s Welfare Quality (R) program), shown videos of group housed sows, and as a group discussed how they would apply the descriptive terms in an assessment. The FCP group were shown the same footage but individually generated their own descriptive terms to describe body language of the animals. Both groups were then shown 18 video clips of group-housed sows and scored each clip using a visual analogue scale (VAS) system. We analysed the VAS scores using Generalised Procrustes Analysis (GPA) for each observer group separately, which indicated high inter observer reliability for both groups (FL: 71.1 per cent of scoring variation explained, and FCP: 63.5 per cent). There were significant correlations between FL and HCP scores (GPA dimension 1: r(16) = 0.946, P<0.001, GPA dimension 2: r(16) = 0.477, P = 0.045). Additional analysis of the raw VAS scores for the FL group by Principal Component Analysis (PCA) produced four factors; PC1 scores were correlated with GPA1 (r(16) = 0.984, P < 0.001) and PC3 scores correlated with GPA2 (r(16) = 0.880, P<0.001). Kendall's coefficient of concordance (a measure of observer agreement) of the VAS scores indicated statistically significant agreement in use of the 20 descriptive terms (W range 0.37-0.64; all significant at P<0.001, although a value of W>0.7 is usually accepted to show strong agreement). This study demonstrates that, regardless of whether they are given their terms or are allowed to generate their own, observers score sow body language in a similar way. Strengths and weaknesses within the two methods were identified, which highlight the importance of providing thorough and consistent training of observers, including providing good quality training footage so that the full repertoire of demeanours can be identified. The study is from Murdoch Univ, Anim Prod Hlth & Welf, Sch Vet & Life Sci, Murdoch, WA, Australia.
Clarke T ,Pluske JR ,Fleming, PA. Applied Animal Behaviour Science 2016; 177: 77-83;doi: 10.1016/j.applanim.2016.01.022