Research - (2022) Volume 10, Issue 7
Received: 01-Jul-2022, Manuscript No. jnd-22-69502;
Editor assigned: 04-Jul-2022, Pre QC No. P-69502 (PQ);
Reviewed: 18-Jul-2022, QC No. Q-69502;
Revised: 25-Jul-2022, Manuscript No. R-69502;
Published:
01-Aug-2022
, DOI: 10.4172/2329-6895.10.7.504
Citation: Liang, Ning, Sizhan W, Simon R and Navnit
M, et al. “Critical Appraisal of the National Institute for Health and Care
Excellence (NICE) Guidelines for Spine Disorders using the Appraisal
of Guidelines for Research and Evaluation II Instrument (AGREE II)” J
Neurol Disord 10(2022):504.
Copyright: © 2022 Liang Ning et al. This is an open-access article distributed under the terms of the creative commons attribution license which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited.
Introduction: Disorders of the spine (as defined by the musculoskeletal structures surrounding the spinal neural elements) require evidence based, approach to their care. This evaluation used the Appraisal of Guidelines for Research and Evaluation (AGREE) II instrument to evaluate the methodological quality of evidence based guidelines on spine disorders published by The National Institute for Health and Care Excellence (NICE).
Materials and methods: We systematically searched clinical guidelines on spine disorders published by NICE until December 2019. Four appraisers across three international centers independently evaluated the quality of eligible clinical guidelines using AGREE II. Mean AGREE II scores for each domain were calculated. In higher quality domains scores for individual items were analysed. The guidelines were grouped according to type and year of publication. Comparative statistics and intraclass correlation (ICC) calculations were performed.
Results: A total of 37 guidelines published by NICE on spine disorders were identified. Mean scores for all six domains were as follows: Scope and Purpose (73.2%), Stakeholder Involvement (63.9%), Rigour of Development (68.1%), Clarity of Presentation (73.6%), Applicability (53.2%) and Editorial Independence (64.5%). The mean score for overall quality of all NICE spinal related guidelines was 68.8% (95% CI: 62.3-75.3). Interventional Procedure Guidelines were evaluated as possessing significantly lower overall quality than other types (p=0.007). Overall quality was significantly associated with year of publication (rs=0.476, p=0.0029). Evaluator ICC for each guideline ranged from 0.39 to 0.95.
Conclusion: NICE guidelines on spine disorders demonstrated acceptable or good quality across most domains. Despite deficiencies in the applicability domain, their quality has improved over time. We recommend use of NICE guidelines for assessment and treatment of spine disorders.
AGREE II; Clinical Practice Guideline; NICE; Spine
The World Health Organization reported in 2021 that musculoskeletal diseases are the main cause of global disability with approximately 1.71 billion people affected worldwide [1]. In 2019 the prevalence of low back pain was estimated as 568 million and neck pain 223 million [2]. An aging population will result in increasing numbers of people with these and other common spine disorders [3].
Clinical practice guidelines can provide health care providers with decision making recommendations from an evidence base [4]. Around the world, many institutions, organisations or groups have formulated and issued clinical practice guidelines. The National Institute for Health and Care Excellence (NICE), was established as a special health authority in 1999 and a non-departmental public body in 2013, performing statutory functions in the United Kingdom with the aim of ‘improving health and wellbeing by putting science and evidence at the heart of health and care decision making [5]. To date, more than 1750 different guidelines have been formulated under several headings, with spine disorders contributing significantly to the burden of disease, NICE guidance in this clinical area encompasses several guideline types, including: Clinical Guidelines (abbreviation CG) which were succeeded by NICE guidelines (NG) in 2015 these guidelines review the evidence across broad health care topics; Interventional Procedure Guidance (IPG) which review the efficacy and safety of procedures; Technology Appraisal Guidance (TG) which review clinical and cost effectiveness of new treatments; Medical Technology Guidance (MTG) which review new medical technologies for adoption in the UK National Health Service for multiple clinical conditions [5]. The effectiveness of a clinical practice guideline is dependent on its inherent quality. During guideline development the World Health Organization (WHO) recommended that “Prior to submission for clearance, the AGREEII appraisal instrument should be used to check whether the guideline meets international quality standards and reporting criteria” [6]. In 2010 AGREE II was developed and formulated by an international development and research team, based on quantitative scores to evaluate the quality of the guideline [7]. Since then NICE has undertaken internal audit to ensure that the processes and methods for guideline development are based on internationally accepted criteria of quality, as detailed in the AGREE II instrument [8]. However internal quality assurance processes may not reflect the evaluation of external auditors. Furthermore, many guidelines were developed prior to the adoption of these standards use this style when you need to begin a new paragraph.
The AGREE II tool has been used independently to evaluate several NICE guidelines, involving urological and endocrine system disorders [9-12], but there are currently no studies using the AGREE II tool to evaluate NICE guidelines for musculoskeletal or neurosurgical conditions. The purpose of this study is to use the AGREE II tool to assess quality of NICE guidelines for spine disorders.
Search keyword methodology
Our goal was to identify guidelines related to spine disorders, defined as disorders of the musculoskeletal structures surrounding the spinal neural elements. Two researchers independently performed keyword searches for spine disorders on the official website of International Classification of Diseases 11th Revision (ICD-11) (https://icd.who.int/browse11/l-m/ en). Search terms identified via dictionary linkage are shown in Electronic Supplementary Material S1. Search keywords retrieval process followed the PRISMA flow algorithm [13]. Inclusion and exclusion criteria for categories of disorders searched in the official website of ICD-11 are shown in Electronic Supplementary Material S2 and S3 respectively. In our study, a total of 28 search keywords for spine disorders were formed from ICD-11 and are shown in Electronic Supplementary Material S4. Use this for the first paragraph in a section, or to continue after an extract.
Guideline identification
The search for spine guidelines was carried out by two researchers (first author and second author) independently using the NICE website (https:// www.nice.org.uk/) up to 31 December 2019. The guideline search used keywords obtained in the aforementioned process. Both keyword and manual searches were performed.
Inclusion criteria were:
1. Literature related to spine disorders
2. Literature meets the guidelines standard of the National Guidelines Clearinghouse [14].
Exclusion criteria were:
1. Disorders secondary or metastatic without causing spinal cord compression
2. Systemic diseases including sites other than the spine
3. Physiological or pathological abnormalities of spinal cord, spinal neural structures, and vertebral vascular conditions caused by disorders not primarily of the spine. The guidelines retrieval process followed the PRISMA flow algorithm [13].
AGREE II instrument
The AGREE II assessment system is an internationally validated tool for assessing guideline quality, including 23 main items in 6 domains and 2 overall assessment items (Table 1). Each domain addresses an aspect of guideline quality, namely: "scope and purpose", "stakeholder involvement", "rigour of development", "clarity of presentation", "applicability", and "editorial independence". The 23 field items and one of the overall assessment items are graded on a seven point Likert scale. Item scores range from 1 (no information or very poor quality) to 7 (all conditions met and of excellent quality) [7].
Ethics statement
This study is an evaluation of existing literature without human subjects; hence it is not subject to ethics committee evaluation.
Evaluation of the guidelines
Each guideline was assessed by a panel of four appraisers. All appraisers were familiar with the AGREE II instrument having used it before to evaluate clinical guidelines and completed the online training tools recommended by AGREE II (www.agreetrust.org) [7]. No communication between appraisers occurred during the rating process. Data analysis was performed after completing the evaluation of all NICE spine related guidelines.
Statistical analysis
The score for each domain was obtained by the sum of all scores of the individual items in a domain and then standardized as follows: (obtained score minimum possible score)/(maximum possible score minimum possible score) [7]. Mean values and 95% confidence intervals (CI) for all raters were calculated. Although domain scores can be used to compare different guidelines and to help determine whether guidelines should be recommended, the AGREE II tool does not set a minimum domain score, nor does it define the boundary criteria for identifying the quality of the guidelines. These decisions are made by the user. According to convention in existing research reports, the domain score criteria we used were: <40% very low quality, 40%~59% low quality, 60%~79% acceptable quality, ≥ 80% good quality [15,16].
Scores obtained for individual items within domains were calculated using the same method as for domain scores. Since the purpose of the AGREE scale is to emphasise and encourage best practice, we took the view that domains which failed to reach ‘acceptable’ quality threshold should not be the focus of detailed critique. Consequently, although all domain scores are presented, individual item scores are only displayed and discussed for domains which reach at least ’acceptable’ quality.
For overall guideline assessment, appraiser scores for item 1 of the ‘overall assessment’ section were used to derive a score for each guideline by the same method as for domain scores.
Statistical analysis of the data was performed using Statistical Package for the Social Science (SPSS Inc, Chicago, Illinois, USA) version 22.0 and Stata (Version 16.1, StataCorp LP, College Station, TX) software programs. Correlation between overall score and guideline publication date was calculated using Spearman’s test. Inter rater reliability of domain scores was assessed using intra-class correlation coefficient (ICC). ICC values less than 0.40, between 0.40 and 0.59, between 0.60 and 0.74, and greater than 0.75 were indicative of poor, moderate, good, and excellent reliability, respectively [17]. Mann-Whitney U test was used to investigate the quality differences between guideline types. The level of statistical significance was set at p<0.05.
A total of 37 guidelines fulfilled the inclusion criteria, including 29 IPG, 1 MTG, 4 NG or CG and 3 TA (Table 1). The scores of each domain and overall scores in the 37 guidelines after evaluation by AGREE II criteria are shown in Table 2. All four CG/NG guidelines were categorized as having ‘acceptable’ quality in all domains. Seven (7/29) IPG guidelines and one (1/3) TA guideline were categorized as having ‘acceptable’ quality in all domains. The overall scores in 26 NICE spine related guidelines were above the “acceptable” level. The mean overall score of all NICE spine related guidelines was 68.8% (95% CI: 62.3%~75.3%).
Title of Guidelines | Published Year | Reference Number | |
---|---|---|---|
Automated percutaneous mechanical lumbar discectomy | 2005 | IPG141 | |
Balloon kyphoplasty for vertebral compression fractures | 2006 | IPG166 | |
Direct C1 lateral mass screw for cervical spine stabilisation | 2005 | IPG146 | |
Endoscopic laser foraminoplasty | 2003 | IPG31 | |
Epiduroscopic lumbar discectomy through the sacral hiatus for sciatica | 2016 | IPG570 | |
Functional electrical stimulation for drop foot of central neurological origin | 2009 | IPG278 | |
Golimumab for treating non-radiographic axial spondyloarthritis | 2018 | TA497 | |
iFuse for treating chronic sacroiliac joint pain | 2018 | MTG39 | |
Insertion of an annular disc implant at lumbar discectomy | 2014 | IPG506 | |
Interspinous distraction procedures for lumbar spinal stenosis causing neurogenic claudication | 2010 | IPG365 | |
Intramuscular diaphragm stimulation for ventilator-dependent chronic respiratory failure caused by high spinal cord injuries | 2017 | IPG594 | |
Lateral interbody fusion in the lumbar spine for low back pain | 2017 | IPG574 | |
Low back pain and sciatica in over 16s: assessment and management | 2016 | NG59 | |
Metastatic spinal cord compression in adults: risk assessment, diagnosis and management | 2008 | CG75 | |
Minimally invasive sacroiliac joint fusion surgery for chronic sacroiliac pain | 2017 | IPG578 | |
Nerve transfer to partially restore upper limb function in tetraplegia | 2018 | IPG610 | |
Non-rigid stabilisation techniques for the treatment of low back pain | 2010 | IPG366 | |
Percutaneous coblation of the intervertebral disc for low back pain and sciatica | 2016 | IPG543 | |
Percutaneous electrothermal treatment of the intervertebral disc annulus for low back pain and sciatica | 2016 | IPG544 | |
Percutaneous endoscopic laser cervical discectomy | 2009 | IPG303 | |
Percutaneous endoscopic laser thoracic discectomy | 2004 | IPG61 | |
Percutaneous insertion of craniocaudal expandable implants for vertebral compression fracture | 2016 | IPG568 | |
Percutaneous interlaminar endoscopic lumbar discectomy for sciatica | 2016 | IPG555 | |
Percutaneous intradiscal laser ablation in the lumbar spine | 2010 | IPG357 | |
Percutaneous intradiscal radiofrequency treatment of the intervertebral disc nucleus for low back pain | 2016 | IPG545 | |
Percutaneous transforaminal endoscopic lumbar discectomy for sciatica | 2016 | IPG556 | |
Percutaneous vertebroplasty | 2003 | IPG12 | |
Percutaneous vertebroplasty and percutaneous balloon kyphoplasty for treating osteoporotic vertebral compression fractures | 2013 | TA279 | |
Peripheral nerve-field stimulation for chronic low back pain | 2013 | IPG451 | |
Prosthetic intervertebral disc replacement in the cervical spine | 2010 | IPG341 | |
Prosthetic intervertebral disc replacement in the lumbar spine | 2009 | IPG306 | |
Spinal injury: assessment and initial management | 2016 | NG41 | |
Spondyloarthritis in over 16s: diagnosis and management | 2017 | NG65 | |
Therapeutic endoscopic division of epidural adhesions | 2010 | IPG333 | |
Therapeutic percutaneous image-guided aspiration of spinal cysts | 2007 | IPG223 | |
TNF-alpha inhibitors for ankylosing spondylitis and non-radiographic axial spondyloarthritis | 2016 | TA383 | |
Transaxial interbody lumbosacral fusion for severe chronic low back pain | 2018 | IPG620 |
Number of Guidelines | Domain 1 Scope and purpose (%) | Domain 2 Stakeholder involvement (%) | Domain 3 Rigour of development (%) | Domain 4 Clarity of presentation (%) | Domain 5 Applicability (%) | Domain 6 Editorial independence (%) | Overall score (%) |
---|---|---|---|---|---|---|---|
IPG 31 | 47.2 | 55.6 | 57.8 | 50.0 | 36.5 | 52.1 | 41.7 |
IPG 141 | 54.2 | 54.2 | 60.9 | 65.3 | 30.2 | 47.9 | 62.5 |
IPG 146 | 55.6 | 54.2 | 60.4 | 55.6 | 41.7 | 52.1 | 58.3 |
IPG 166 | 65.3 | 54.2 | 60.4 | 62.5 | 44.8 | 50.0 | 58.3 |
IPG 278 | 66.7 | 51.4 | 58.9 | 55.6 | 40.6 | 47.9 | 54.2 |
IPG 365 | 61.1 | 48.6 | 60.4 | 51.4 | 52.1 | 54.2 | 54.2 |
IPG 506 | 62.5 | 47.2 | 61.5 | 55.6 | 28.1 | 52.1 | 54.2 |
IPG 570 | 56.9 | 52.8 | 60.9 | 56.9 | 40.6 | 56.3 | 54.2 |
IPG 574 | 59.7 | 45.8 | 63.0 | 51.4 | 44.8 | 56.3 | 54.2 |
IPG 594 | 65.3 | 45.8 | 62.0 | 54.2 | 34.4 | 56.3 | 58.3 |
MTG 39 | 70.8 | 59.7 | 45.8 | 69.4 | 47.9 | 62.5 | 66.7 |
NG 59 | 83.3 | 76.4 | 64.6 | 84.7 | 67.7 | 58.3 | 79.2 |
TA 497 | 77.8 | 59.7 | 45.3 | 75.0 | 63.5 | 52.1 | 62.5 |
CG 75 | 93.1 | 88.9 | 86.4 | 94.4 | 82.3 | 62.5 | 87.5 |
IPG 578 | 79.2 | 76.4 | 77.1 | 86.1 | 60.4 | 77.1 | 79.2 |
IPG 610 | 77.8 | 79.2 | 76.0 | 81.9 | 63.5 | 77.1 | 79.2 |
IPG 366 | 79.2 | 75.0 | 78.1 | 83.3 | 50.0 | 72.9 | 70.8 |
IPG 543 | 81.9 | 75.0 | 77.1 | 84.7 | 61.5 | 70.8 | 79.2 |
IPG 544 | 72.2 | 70.8 | 76.0 | 80.6 | 57.3 | 70.8 | 75.0 |
IPG 303 | 75.0 | 73.6 | 78.6 | 84.7 | 64.6 | 72.9 | 79.2 |
IPG 61 | 75.0 | 66.7 | 73.4 | 83.3 | 58.3 | 77.1 | 75.0 |
IPG 568 | 79.2 | 73.6 | 78.1 | 77.8 | 59.4 | 85.4 | 79.2 |
IPG 555 | 80.6 | 76.4 | 77.6 | 84.7 | 61.5 | 85.4 | 79.2 |
IPG 357 | 83.3 | 73.6 | 80.2 | 77.8 | 59.4 | 66.7 | 75.0 |
IPG 545 | 81.9 | 72.2 | 81.3 | 81.9 | 60.4 | 79.2 | 79.2 |
IPG 556 | 76.4 | 79.2 | 81.3 | 80.6 | 60.4 | 85.4 | 79.2 |
IPG 12 | 61.1 | 50.0 | 54.7 | 77.8 | 42.7 | 47.9 | 58.3 |
TA 279 | 86.1 | 65.3 | 62.0 | 76.4 | 79.2 | 81.3 | 83.3 |
IPG 451 | 81.9 | 68.1 | 74.5 | 81.9 | 59.4 | 79.2 | 70.8 |
IPG 341 | 65.3 | 54.2 | 67.2 | 70.8 | 41.7 | 50.0 | 66.7 |
IPG 306 | 69.4 | 65.3 | 66.7 | 70.8 | 43.8 | 58.3 | 66.7 |
NG 41 | 87.5 | 61.1 | 72.4 | 86.1 | 60.4 | 68.8 | 75.0 |
NG 65 | 94.4 | 79.2 | 87.5 | 90.3 | 75.0 | 72.9 | 83.3 |
IPG 333 | 77.8 | 52.8 | 66.1 | 75.0 | 39.6 | 58.3 | 62.5 |
IPG 223 | 69.4 | 50.0 | 61.5 | 70.8 | 36.5 | 54.2 | 54.2 |
TA 383 | 86.1 | 81.9 | 59.9 | 83.3 | 77.1 | 75.0 | 87.5 |
IPG 620 | 69.4 | 48.6 | 64.1 | 69.4 | 41.7 | 60.4 | 62.5 |
Mean | 73.2 | 63.9 | 68.1 | 73.6 | 53.2 | 64.5 | 68.8 |
(95%CI) | (67.0 ~ 79.5) | (57.0 ~ 70.7) | (62.3 ~ 73.9) | (66.6 ~ 80.5) | (45.4 ~ 61.0) | (57.7 ~ 71.3) | (62.3 ~ 75.3) |
Mean domain and item scores are shown in Table 3. Mean domain score is highest for ‘Clarity of Presentation’ (73.6%), followed by ‘Scope and Purpose’ (73.2%), ‘Rigour of Development’ (68.1%), ‘Editorial Independence’ (64.5%) and ‘Stakeholder Involvement’ (63.9%). Mean domain score for ‘Applicability’ (53.2%) is the lowest.
Five domains exceeded the threshold for ‘acceptable’ quality (mean score of 60%): ‘Scope and Purpose’, ‘Stakeholder involvement’, ‘Rigour of development’, ‘Clarity of presentation’, and ‘Editorial Independence’. The quality evaluation of two items in these five domains fell below this 60% threshold for acceptability; item 5 ‘the views and preferences of the target population (patients, public, etc.) have been sought’ and item 13 ‘the guideline has been externally reviewed by experts prior to its publication’ (Table 3) (Figure 1).
Domain and Domain item | Mean(95%CI) |
---|---|
Scope and Purpose | 73.2(67.0 ~ 79.5) |
1. The overall objective(s) of the guideline is (are) specifically described. | 76.4(73.1 ~ 79.6) |
2. The health question(s) covered by the guideline is (are) specifically described. | 73.9(70.1 ~ 77.6) |
3. The population (patients, public, etc.) to whom the guideline is meant to apply is specifically described. | 69.5(65.2 ~ 73.8) |
Stakeholder involvement | 63.9(57.0 ~ 70.7) |
4. The guideline development group includes individuals from all the relevant professional groups. | 70.4(66.9 ~ 73.9) |
5. The views and preferences of the target population (patients, public, etc.) have been sought. | 59.0(53.8 ~ 64.2) |
6. The target users of the guideline are clearly defined. | 62.2(57.7 ~ 66.6) |
Rigour of development | 68.1(62.3 ~ 73.9) |
7. Systematic methods were used to search for evidence. | 69.9(65.3 ~ 74.6) |
8. The criteria for selecting the evidence are clearly described. | 74.9(71.1 ~ 78.7) |
9. The strengths and limitations of the body of evidence are clearly described. | 74.5(70.6 ~ 78.5) |
10. The methods for formulating the recommendations are clearly described. | 62.8(59.8 ~ 65.8) |
11. The health benefits, side effects, and risks have been considered in formulating the recommendations. | 77.8(75.0 ~ 80.6) |
12. There is an explicit link between the recommendations and the supporting evidence. | 67.2(63.6 ~ 70.8) |
13. The guideline has been externally reviewed by experts prior to its publication. | 56.6(52.9 ~ 60.4) |
14. A procedure for updating the guideline is provided. | 60.7(54.8 ~ 66.6) |
Clarity of presentation | 73.6(66.6 ~ 80.5) |
15. The recommendations are specific and unambiguous. | 76.5(72.4 ~ 80.6) |
16. The different options for management of the condition or health issue are clearly presented. | 61.3(56.5 ~ 66.0) |
17. Key recommendations are easily identifiable. | 83.0(79.1 ~ 86.9) |
Applicability | 53.2(45.4 ~ 61.0) |
18. The guideline describes facilitators and barriers to its application. | |
19. The guideline provides advice and/or tools on how the recommendations can be put into practice. | |
20. The potential resource implications of applying the recommendations have been considered. | |
21. The guideline presents monitoring and/or auditing criteria. | |
Editorial independence | 64.5(57.7 ~ 71.3) |
22. The view of the funding body have not influenced the content of the guideline. | 62.5(58.9 ~ 66.1) |
23. Competing interests of guideline development group members have been recorded and addressed. | 66.4(61.9 ~ 71.0) |
Mean domain score for IPG and non-IPG documents are 64.1 (95% CI: 58.0~70.1) and 73.4 (95% CI: 67.3~79.4) respectively (z=-2.085, p=0.037) (Table 4). Significant differences in domain scores for domains 1, 2, 4 and 5 were also found (Table 4). Non-IPGs also have higher overall scores (78.1) than IPGs (66.2) (z=-2.687, p=0.007) (Table 4) (Figure 2).
IPG | Non-IPG | p- value | |
---|---|---|---|
Domain 1 | 70.0 (66.5 ~ 73.6) | 84.9 (79.9 ~ 89.9) | 0.001 |
Domain 2 | 61.7 (57.5 ~ 66.0) | 71.5 (64.1 ~ 79.0) | 0.037 |
Domain 3 | 68.8 (65.8 ~ 71.9) | 65.5 (55.1 ~ 75.9) | 0.567 |
Domain 4 | 71.1 (66.7 ~ 75.5) | 82.5 (77.0 ~ 87.9) | 0.024 |
Domain 5 | 48.8 (44.9 ~ 52.8) | 69.1 (61.7 ~ 76.6) | <0.001 |
Domain 6 | 63.9 (59.3 ~ 68.6) | 66.7 (60.5 ~ 72.9) | 0.448 |
Mean Domain Score | 64.1 (58.0 ~ 70.1) | 73.4 (67.3 ~ 79.4) | 0.037 |
Overall Score | 66.2 (62.3 ~ 70.2) | 78.1 (72.0 ~ 84.2) | 0.007 |
The overall scores in 26 NICE spine related guidelines were above the “acceptable” level. The mean overall score of all NICE spine related guidelines was 68.8% (95% CI: 62.3%~75.3%) (Table 2). Intraclass correlation (ICC) values for each NICE guideline ranged from 0.393 to 0.953 (Table 5).
IPG31 | IPG141 | IPG146 | IPG166 | IPG278 | IPG365 | IPG506 | IPG570 | IPG574 | IPG594 | MTG39 | NG59 | TA497 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ICC | 0.695 | 0.657 | 0.73 | 0.733 | 0.688 | 0.672 | 0.687 | 0.671 | 0.679 | 0.694 | 0.393 | 0.623 | 0.498 |
-95%CI | 0.291 | 0.260 | 0.306 | 0.313 | 0.260 | 0.244 | 0.270 | 0.245 | 0.237 | 0.272 | 0.054 | 0.220 | 0.128 |
+95%CI | 0.941 | 0.931 | 0.950 | 0.951 | 0.940 | 0.936 | 0.939 | 0.935 | 0.938 | 0.941 | 0.836 | 0.922 | 0.881 |
CG75 | IPG578 | IPG610 | IPG366 | IPG543 | IPG544 | IPG303 | IPG61 | IPG568 | IPG555 | IPG357 | IPG545 | IPG556 | |
ICC | 0.95 | 0.946 | 0.947 | 0.943 | 0.932 | 0.941 | 0.916 | 0.931 | 0.944 | 0.935 | 0.929 | 0.942 | 0.932 |
-95%CI | 0.842 | 0.830 | 0.829 | 0.825 | 0.790 | 0.815 | 0.728 | 0.735 | 0.827 | 0.782 | 0.784 | 0.820 | 0.783 |
+95%CI | 0.992 | 0.991 | 0.991 | 0.991 | 0.989 | 0.990 | 0.986 | 0.989 | 0.991 | 0.990 | 0.935 | 0.991 | 0.989 |
IPG12 | TA279 | IPG451 | IPG341 | IPG306 | NG41 | NG65 | IPG333 | IPG223 | TA383 | IPG620 | |||
ICC | 0.795 | 0.761 | 0.953 | 0.731 | 0.712 | 0.639 | 0.892 | 0.761 | 0.706 | 0.709 | 0.71 | ||
-95%CI | 0.364 | 0.430 | 0.852 | 0.329 | 0.287 | 0.240 | 0.566 | 0.370 | 0.264 | 0.357 | 0.262 | ||
+95%CI | 0.965 | 0.956 | 0.992 | 0.950 | 0.946 | 0.926 | 0.983 | 0.957 | 0.945 | 0.943 | 0.946 |
The purpose of NICE at its inception was ‘to create consistent guidelines and end rationing of treatment by postcode across the UK [18]. Ours is the first study to use the AGREE II instrument to assess the quality of NICE guidelines for spine related disorders. Up to end of 2019, we identified 37 guidelines which fulfilled our inclusion criteria as related to spine disorders. This represented approximately 3% of the total guideline cohort in the NICE library at that time point.
Reliability
Intraclass correlation (ICC) of overall guideline scores in our study ranged from 0.39 to 0.95. 35 out of 37 guideline evaluations (94.6%) were categorized as exhibiting good or excellent inter rater reliability.
Domains reaching acceptability threshold
The highest mean score was for the domain “clarity of presentation”, with mean scores of a further four domains (‘scope and purpose’, ‘rigour of development’, ‘editorial independence’ and ‘stakeholder involvement’) also exceeding the 60% threshold. Evidence based recommendations in NICE guidelines include that they should be ‘developed by independent committees, including professionals and lay members, and consulted on by stakeholders [19]. Since the first guideline was released in 1999, NICE has published over 1750 guidelines, and technology appraisal guidance alone exceeds five hundred in number [18]. After more than 20 years of development, NICE has accumulated considerable experience in their formulation and publication [20]. However, items 5 and 13 failed to reach acceptability threshold in any of the guidelines; finding patients willing and able to provide input into guideline development has proved difficult, for example several guidelines report that ‘NICE’s Patient and Public Involvement Programme were unable to gather patient commentary for procedures under evaluation’. NICE recognises the need to support patients, nursing staff, and the public to participate in the development and formulation of the guidelines, and has taken measures to increase the participation of these personnel, such as establishing a Public Involvement Programme and Citizens Council project [21]. High quality guidelines should be externally reviewed by experts prior to their publication; however it appeared to some assessors that there may have been a lack of engagement from key stakeholders identified as important contributors by NICE in the consultation phase of guideline development.
Domains not reaching acceptability threshold
Although mean domain ‘Applicability’ scored below 60%, more than onethird (14/37) of the guidelines scored this domain at ‘acceptable’ level or above. All NG/CGs scored over 60%. All guidelines contained accessible documents to assist doctors putting ‘guidance into practice’. However in many cases, assessors may have found these documents non guideline specific. Four previous NICE guidelines evaluated using AGREE II resulted in a wide range of scores in this domain, with some authors agreeing that applicability represented their weakest domain (scores of 48 and 56 [11- 12]), whereas others rated it highly (scores of 82 and 100 [9,10]). The applicability of a guideline is key to its success but may be dependent on heterogenous structural factors within the National Health Service systems, and therefore requires independent consideration.
Overall guideline assessment
Regarding overall evaluation, the AGREE II manual does not describe how to perform quantitative scoring [22]. Previous studies have applied domain score calculation methods to calculate mean scores for the item ‘rate the overall quality of this guideline’ without reference to the item ‘recommendations of the guideline for use’ [23,24]. In others, assessors have scored this based on the average rating given to the six domains [9,25]. In our study we gave no instructions to reviewers about providing overall recommendations. The mean overall score for all NICE spine disorder guidelines exceeded the 60% threshold for acceptability and a majority of assessors recommended every guideline for use.
Overall guideline scores were significantly lower in IPGs and this was observed across several domains. IPGs are of more limited scope than other guidelines, especially in particular CG/NGs in which supporting evidence can run to several thousand pages.
Overall guideline evaluation scores correlated significantly with year of publication, suggesting a dynamic process of continuous guideline quality improvement.
Our study has several limitations. First, AGREE II remains a subjective evaluation tool. Second, the method for calculating a consensus derived overall guideline score using AGREE II is established neither by developer instructions nor by precedence in the literature. Consequently, the method we chose may be considered arbitrary (although consistent with the calculation method of AGREE II domain and item scores). Although inter rater reliability was good or excellent in 95% of evaluations, the lack of agreement between assessors in a small number may weaken reliability of the method. We attempted to overcome this by fulfilling training and assessor number recommendations beforehand.
Our consensus is that the NICE spine disorder guidelines should be recommended for clinical practice as they demonstrate either acceptable or good overall quality. Evidenced ongoing quality improvement over time continues to be reassuring.
The authors report there are no competing interests to declare.
The International Classification of Diseases 11th Revision database was publicly available during the study period. The authors confirm that all methods were carried out in accordance with relevant guidelines and regulations.
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
None.
None.
Neurological Disorders received 1343 citations as per Google Scholar report