This website is hosted by MRS and uses cookies to help us improve our services. Any data collected is anonymised. If you continue using this site without accepting cookies you may experience some performance issues. Read about our cookies here.
As a sector, we're committed to a collaborative approach to enhance data quality
The GDQ data quality pledge advocates a framework to deliver:
To address data quality issues collaboratively, a need has emerged to adopt a shared approach to categorising those issues, building a more complete quality picture.
This led to the idea of an industry-wide feedback loop delivered through a code frame.
This is applicable for online quantitative projects where sample has been purchased from a 3rd party sample provider.
As part of the GDQ, MRS and SampleCon have developed a new GDQ Feedback Loop tool to enable the identification and classification of online sample quality issues in a consistent and transparent way. To find out more about the background to the Feedback Loop see the attached PowerPoint deck.
The focus of the feedback loop project so far has been building out a proposed code frame of 18 reasons that may lead to a participant being removed from a project for quality reasons (all reason may not apply to all projects). This could be during fielding or post-fielding during quality audits/data cleaning. The reasons identified for the code frame are:
1) Bot Detection
Failure at checks designed to ensure transactions are with humans rather than bots (incl. Honeypot/Re-CAPTCHA)
Consideration
Bot Detection
Several options exist - ranging from free to paid bot detection systems. Evaluate each for accuracy and suitability: fraudsters can bypass; real participants may be falsely marked or make mistakes.
2) Third Party Fraud Tool Failure
Flagged by 3rd party fraud detection tools
Consideration
Third Party Fraud Tool Failure
Many companies offer their own fraud detection toolkit. Any toolkit can cause false positives or negatives. Consistently monitor and assess the implementation and thresholds used. Consider outcomes together with other signals and participant data.
3) Geo-location Check
Failure at checks used to determine participant location is correct
Consideration
Geo-location Check
The more granular the data (e.g., city versus country), the less accurate using IP's to identify a participants geolocation becomes. IP's at country level can be problematic at the for participants who live in border regions/ small country. Privacy conscious individuals are increasingly using proxies and VPN's. Participants may attempt surveys whilst abroad. Any failure should be investigated further in combination with other checks.
4) Participant Duplication
Indication of duplicate participants based on an identified such as IP address or cookies
Consideration
Geo-location Check
Using single data points for deduplication can lead to false positives. Shared devices can lead to overlap in cookies & IP's. The type of research may invoke high overlap (e.g. B2B or students from specific campuses). IP availability and distribution varies by country. Duplicate IP's should be investigated, but fraudsters also typically know to change their IPs. Consider using deduplication methods that use a blend of signals to identify duplicates - especially when using multiple suppliers.
5) Suspicious Survey Entry Time
Time of entry into survey is suspicious
Consideration
Suspicious Survey Entry Time
Sole entries at an an odd time of day/night are not indicative of fraud. High volumes of entries during unsociable hours and/or around the same time (especially from same supplier) should be investigated further. Consider whether the audience being targeted may lead to unusual response times e.g. night shift workers.
6) Speeding/Racing
Completing a questionnaire faster than reasonably expected
Consideration
Speeding/Racing
Speed check threshholds need careful consideration. Thresholds can be established after soft launch to create acceptable standards based on real participant behaviour. Survey design and set up can influence LOI's. Participants familiar with survey structures may answer familiar demographic questions quickly. Consider if the survey time is humanly possible in conjunction with other quality flags.
7) Excessive interview length
Participant's length of interview was above the length that could reasonably be expected for the survey
Consideration
Excessive interview length
There can be many valid reasons as to why someone takes longer than expected to complete a survey, so this should only be used in combination with other quality flags. Thresholds can be established after soft launch to create cut-offs based on real participant behaviour.
8) Straight Lining/Flat Lining
Providing the same answer to the majority of surey grid/scale questions
Consideration
Straight Lining/Flat Lining
Consider if it is valid to respond with the same response across a question, as well as potential cultural responses. The usage of straightlining checks should be planned for when designing grid questions for surveys. Any straightlining should be considered in conjunction with other quality flags
9) Inconsistent/Contradictory Answers
The data does not align within one question or across multiple questions. This could be within the survey or when comparing answers from the suppliers' platform and buyer survey
Consideration
Inconsistent/Contradictory Answers
Take care not to remove valid participants and create bias in the data. The types of questions consistency checks are being run on can create false positives. e.g. what brands were purchased in the last 30 days versus basic demographics such as age, income or job title. Individuals who are inattentive will not impact data. Large volumes who are inattentive can indicate that the data quality may be affected: Further investigation would be required.
10) Red Herring/Explicit Trap Question
Failure at a question designed to check whether participants are paying attention
Consideration
Red Herring/Explicit Trap Question
These questions should be relevant and not overused. Ensure they are testing attention, not unfairly testing memory/ knowledge. For failures, check the rest of a participant’s data for any other quality issues first before removing them.
11) Knowledge Question
Participants failing to correctly answer questions that they should have knowledge on based on them passing survey screening
Consideration
Knowledge Question
Knowledge checkers are best used when looking for participants in specialised fields. Consider combining a knowledge checker with a genuine screener e.g. insert a fake answer that an honest &/or knowledgable participant would not select. Fraudsters may google the responses; reaction times in combination with other quality flags could be used.
12) Over-qualification/Over-claiming
Participants who appear to have qualified for an excessive range of categories and/or appear to be exaggerating to qualify
Consideration
Over-qualification/Over-claiming
Care must be taken to (a) not add too many additional questions for this purpose, and (b) not remove/flag too many people, leading to bias in remaining sample. Consider if the questions would genuinely evoke a high qualification rate. When designing surveys avoid any questions which may lead participants to over select to qualify. This should be used in combination with other quality flags.
13) Open End - Poor Quality
Open end is insufficient in length/detail, is irrelegant, is vulgar, is nonsensical, in a different language or is comprised of random strings of letters, words or sentences
Consideration
Open End - Poor Quality
Experienced practitioners can highlight manually/review participants flagged by any automated approaches that can aid practitioners in their reviews. For short, blank or gibberish responses care needs to be taken as it does not necessarily mean that participants are poor overall: it can be an indicator that participants' other responses may be in doubt. Consider both context and survey design. A long survey with a lot of open ends, especially overly technical details, will lead to participant fatigue and poorer answers. If a survey is not mobile optimized it can cause issue issues for participants using smaller screens.
14) Open End - Duplicate
Open ends are show the same answers for all/multiple open ends, or the same answers for multiple participants
Consideration
Open End - Duplicate
Responses that are similar, even if not identical across participants should be assessed. Consider context & whether the survey design may invoke similar or duplicate responses. E.g. top of mind brands or questions that evoke answers such as "don't know" or "none". A combination of technology and human review can facilitate an efficient review, as manual checking can be very time consuming and inefficient, but technology can yield false positives.
15) Open End – AI Completed
Open end appears to be a direct copy from AI generated text
Consideration
Open End – AI Completed
Accurate detection of AI is currently limited, especially in surveys which evoke short responses from participants. Well known signals such as copy & paste, long dashes (—) and overly long/ technical responses could indicate AI usage. There are valid use cases for these signals. AI models are becoming increasingly advanced. Consider AI Open end detection in combination with other flags. Work with suppliers to validate participant profiles.
16) Duplicate Responses In Closed Questions
Answers to closed questions suggest multiple instances of the same participant, detected after survey completion
Consideration
Duplicate Responses In Closed Questions
Consider the volume of similar/identical responses and the timeframe in which they were submitted. Consider other signals in combination when cleaning data. Be conscious of how survey design could influence results e.g. randomization within a survey can disguise this behaviour.
17) Ghost Complete
A complete survey responses recorded in a participant system and not recorded as complete in the survey data
Consideration
Ghost Complete
Ghost completes will not be captured in a survey as participants will have partial or no responses. They should be identified by regularly confirming completes with suppliers during fielding as the presence of these can impact fieldwork delivery success. The presence of ghost completes suggests redirect security measures needs to be upgraded.
18) Unspecified Issue/Other
Other reason not covered by the other codes
Consideration
Unspecified Issue/Other
Any other issue not captured by the above reasons should be recorded here. Ideally a separate written record of the reason is kept for historical tracking purposes and to inform the GDQ of any gaps in the codeframe.
Download an Excel variant of the feedback loop here.
What is the GDQ Feedback Loop?
The GDQ Feedback Loop is a mechanism that enables buyers and sample suppliers to share structured feedback on data quality issues. It uses a common code frame to categorise for what reasons participants are removed from a study for quality reasons.
Why is the Feedback Loop needed?
Currently, buyers and suppliers often identify quality issues at different stages and describe them in different ways. This fragmentation makes it hard to develop a holistic view of data quality or to collaborate as true partners, which depends on a shared understanding. The Feedback Loop helps align perspectives through a shared framework.
How does this relate to the Global Data Quality Toolkit?
The Feedback Loop is one component of the GDQ Toolkit, alongside the glossary, transparency checklist, and internal approaches guidance. Together, these tools support a more consistent and collaborative approach to data quality.
How does this relate to the Global Data Quality Excellence Pledge?
The Feedback Loop supports adherence to the Global Data Quality (GDQ) Excellence Pledge, particularly commitment one (upholding rigorous data quality standards and commitment two (providing transparency).
What types of projects does this apply to?
The Feedback Loop is intended for online quantitative research projects using third-party sample. It is not designed for qualitative research, customer database research or methodologies outside online survey data collection.
Is the Feedback Loop global?
Yes. The framework is intended to be applicable globally, while allowing flexibility to account for regional differences in market practices and regulations.
What is included in the code frame?
The proposed code frame currently includes 17 named codes that may lead to participant removal for quality reasons, as well as a catch-all “Other – specify” code. The codes cover issues identified before survey entry, during fieldwork and post-field data review.
Do all codes apply to every project?
No. Not all codes will be relevant to every project. The code frame is designed to be flexible and applied where appropriate.
Is the code frame designed to be single coded or multi coded?
The code frame should be set up to be multi coded. While a small number of codes may justify removal on their own (such as participant duplication detection), most should only be used as grounds for removal when combined with other codes. This reflects the nuanced nature of data quality assurance, which requires context. Multi-coding provides a stronger foundation for building shared understanding.
Can the code frame evolve?
Yes. The code frame is expected to develop over time based on industry feedback, testing, and emerging quality challenges. If you would like to offer any feedback or get involved, please refer to the “Future direction” section at the bottom of the page.
How is the code frame implemented?
While individual companies may implement this in different ways, the code frame should be programmed as a multi-coded variable within an online survey or data processing workflow. Programmers can easily incorporate the code frame during survey scripting. Some codes can be triggered automatically through flags set up at the scripting stage, while others are applied during manual quality checks. Any manual removals must then be recorded in the code frame variable. The process for doing so will vary depending on the data collection software, hosting solution, or approach to managing manual removals, and we recommend that companies deliberately design this process into their data collection platform templates and workflows.
Whose responsibility is it to implement the code frame?
The code frame variable should be created by the team responsible for data collection and online survey programming. Codes will be assigned through a combination of automated processes within the survey and post-survey coding carried out by the data collection team.
Does this increase operational burden?
The pilot conducted ahead of the code frame release demonstrated that the code frame can be integrated into existing processes with minimal disruption, often building on existing quality workflows rather than adding entirely new ones.
What data is shared?
The Feedback Loop focuses on pseudonymous, categorised quality feedback, supporting transparency without compromising confidentiality. Some data points captured (such as IP address) are considered personal data and therefore should be treated appropriately. This involves being securely stored, being retained only as long as required, and not being shared without appropriate permissions from the respondents themselves, and participating companies may choose not to share/receive sensitive data.
Who owns the feedback data?
Each organisation retains ownership of its own data. The Feedback Loop facilitates structured sharing, not data transfer or centralised ownership.
Can results be compared across suppliers or projects?
The Feedback Loop is not designed for league tables or simplistic comparisons. Context, methodology, and project design must always be considered.
Does a higher removal rate indicate worse quality?
Not necessarily. A higher removal rate may indicate more effective detection processes rather than lower underlying sample quality. Using this approach over time allows you to track how data quality evolves within your own systems and supply chain.
A. Buyer-Focused FAQs
How does this help me as a buyer?
It gives you clearer insight into why participants are removed, helps distinguish survey design issues from participant behaviour, and supports more productive conversations with suppliers.
Will this limit my flexibility in quality control? No. Buyers retain full control over their quality standards and decision-making.
Does this require new technology?
In most cases, no. The code frame can be integrated into existing survey platforms and QC workflows.
Can I ask/should I ask my research provider to share information about how many people were removed from my study and why?
Research providers can supply this information, so this is not a mechanical issue. However, the results should be interpreted with caution: differences in methodologies mean the numbers are not comparable across vendors or across individual projects due to variations in design, market, or target audiences.
As a result, we generally do not recommend using this as a standard KPI, but instead reviewing and discussing the figures with the provider at key points in the study lifecycle (e.g., the initial wave of a tracker or when results appear inconsistent).
I want the best quality responses - can I ask my research provider to remove anyone who triggers ANY of the code frame dimensions?
The appropriateness of this approach depends on the questionnaire design and which code frame elements have been implemented. The goal is to avoid overcleaning, which can distort results.
Immediate, stand-alone removal is only justified for certain codes, such as duplicate participant or honeypot failure. Other flags, like straightlining in grids, require context, as similar responses may be logical, and are best combined with other indicators.
These rules should be set during the survey design/programming phase with the research provider, and cleaning rates monitored to prevent overcleaning.
What happens to the participants who are removed?
Sample suppliers typically monitor the history of respondents and remove any with a poor track record from their panel. The code frame provides sample suppliers with information that will allow them to determine if the panel removal process should be accelerated, as well as aid the sample supplier in assessing any necessary changes to quality & anti-fraud measures on their end.
B. Supplier-Focused FAQs
Is this a way to audit or penalise suppliers?
No. The Feedback Loop is intended to improve shared understanding, not to penalise or publicly evaluate suppliers.
Will buyers see more detail about our sample sources?
The framework supports structured feedback on outcomes, not disclosure of proprietary sourcing methods.
How does this benefit suppliers?
It helps identify root causes, separate participant-level issues from survey design problems, and supports faster resolution through clearer feedback.
Future steps include broader testing, refinement of the code frame, and continued collaboration across the industry to build shared understanding.
For questions about implementing the code frame, or to share feedback or express interest in contributing more broadly, please contact Rebecca Cole at rebecca@cobalt-sky.com, Chair of the subgroup overseeing the feedback loop.