Research Design: Survey Design, Demographics, Validity, and Reliability

Surveys are prominent in quantitative educational research for measuring students’ opinions or self-reported data, such as study habits. Data collected from survey research can be any level of measurement. For example, if you ask the question “How much time did you study per week?” you could collect ordinal data (e.g., “none” “some,” or “a lot”) or ratio data (e.g., number of hours). This example also illustrates the most common criticism of survey data, which is that it can be unreliable. Even if students don’t skew their answers to be socially desirable based on what they think the researcher wants to hear, they might not be able to give an accurate response from memory. For validity, it is better to have direct measurements, but in education, this is often impractical or invasive. Evaluate the trade-offs in your work.

Interval data in surveys is commonly collected with Likert-type scales. The classic Likert (pronounced lick-ert) scale is a 5-point scale: 1- Strongly Disagree; 2 – Disagree; 3 – Neither agree nor disagree; 4 – Agree; 5 – Strongly Agree. Likert-type scales can range from 3 to 7 points, depending on how much sensitivity is desired. People tend to be less reliable when making more than 7 distinctions, so providing more choices can lead to unreliability. If you want to force people to choose an option other than neutral, provide an even number of choices to avoid a neutral option. The anchors/endpoints for Likert-type scales can be anything, but people are most familiar with “Strongly Disagree” to “Strongly Agree.”

It’s important to note that there is a debate about whether scale data is interval or ordinal, and there is a case for both. The ordinal side would argue that “4 – Agree” cannot be interpreted at 2 points higher than “2 – Disagree,” which makes a lot of sense. The interval side would argue that the scale represents a range of agreement with even intervals, making it most like interval data. I tend to treat scale data as interval if 1) the number of points (i.e., 3-7 points) is high enough to be more discerning than ordinal data and 2) the number of participants is high enough to overshadow the error that is inherent when asking people to pick an option on a scale.

There are plenty of easily accessible resources on the internet to help design better surveys, but I’ll discuss a few tips below.

Tips for Writing Effective SurveysExamples of Bad QuestionsBetter Alternatives
Don’t use one question to ask more than one thing.Do you think the instructor is kind and knowledgeable?Do you think the instructor is kind? Do you think the instructor is knowledgeable?
Don’t use jargon or complicated sentence structure.How often did you receive formative feedback?How often did the instructor give you feedback before you submitted assignments?
Don’t use ambiguous questions.Did you have several meetings with your group?Did you have more than 5 meetings with your group?
Don’t overlap answers.What is your age? 20-25, 25-30, 30-35, etc.What is your age? 20-24, 25-29, 30-34, etc.
Don’t use leading language.Should the instructor spend more one-on-one time with students?Were you satisfied with the amount of one-on-one time spent with students?
Don’t use strong language.Was the instructor fantastic?Was the instructor competent?
Don’t use false dichotomies.Do you think the instructor is kind or knowledgeable?Do you think the instructor is kind? Do you think the instructor is knowledgeable?
Be selective about which questions to include. Long surveys are abandoned surveys.  
Include a non-committal option such as “N/A” or “Prefer not to answer.”  

Demographic Data

Demographic data are typically also collected in surveys, but not as independent or dependent variables. Demographic data are used to describe relevant characteristics of the participants included in your research. We collect demographic data to describe which population our sample represents. We might also run correlational analyses between independent or dependent measures and demographic data to determine whether groups are skewed or participant characteristics affect the results. Otherwise, these data typically aren’t used in the analysis, unless that is part of the research questions. Instead, they are primarily used to describe the sample. Demographic characteristics can include

  • Age
  • Grade/year in school
  • Gender
  • Race or ethnicity
  • Nationality
  • Employment status
  • Socioeconomic status
  • Highest level of education achieved
  • Academic major/program
  • Grade point average (GPA) or grade in previous course
  • Prior experience
  • Primary language or fluency with language used in the study

To be clear, I am not advocating that you collect all of these demographics for every study. Pick those that are relevant. Make sure that if you ask somewhat invasive questions, like socioeconomic status, it is justified.

A Quick Note on Validity and Reliability

When we talk about validity in research measurement, we’re talking about the extent to which the measurement we use measures what we think it measures. When we talk about reliability in research measurements, we’re talking about the extent to which the same participant would give the same score on a measurement that was administered more than once. In other words, validity and reliability question whether your measures will be legitimate and consistent. If you have low validity or reliability, then you’ll have additional error in your data and more likely to make errors in your conclusions.

Discussing the four types of validity (internal, external, construct, and conclusion) and four types of reliability (inter-rater, test-retest, parallel-forms, and internal consistency) is outside the scope of this post, but you’ll generally pass validity and reliability standards if you let prior research be your guide. For example, if there’s an established way of measuring and analyzing learning outcomes, it’s likely because it’s been deemed valid and reliable by the scientific community. You’ll be able to build upon prior work much better if your data are collected using the conventions in your area (e.g., using validated concept inventories instead of tests that you made). There are statistical methods for demonstrating validity and reliability such as principle components, factor analysis, Cronbach’s alpha, etc., but they tend to require large amounts of data. More information about reliability and validity can be found in any human subjects research methods textbook.

To view more posts about research design, see a list of topics on the Research Design: Series Introduction.

One thought on “Research Design: Survey Design, Demographics, Validity, and Reliability

  1. Pingback: Research Design: Series Introduction | Lauren Margulieux

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s