We have probably all been told to collect some basic demographic data on every survey, but what data and why? The most important thing you can do with your demographic data is to demonstrate that the people who responded to your survey represent – in some basic ways – the people that you want to generalize about.
For example, if you have surveyed students in a school that has half male and half female students, then you’ll want to demonstrate that the students who responded to your survey are roughly half male and half female. If all the respondents were female students then you can not draw conclusions about all students’ experiences. If you are implementing a survey in a neighborhood, you’ll want to ensure that the people who responded to the survey represent the diversity of the neighborhood. That might mean asking about the specific ethnicities and cultures that make up the community in addition to the standard race/ethnic categories used on the census.
The other thing that you might want to do with your demographic data is to present the responses of different groups separately or compare them to each other. For example, do boys and girls feel differently about safety at their school? These choices should be guided by theory, previous research, or a question that you are testing. Is there a good reason to think that boys and girls perceive their safety differently? Are there previous studies or examples where that was an issue? If so, then it’s probably useful to disaggregate those responses.
There are dozens of other pieces of demographic data that you might ask about, for example family income, age, employment status, area of residence. If these are important to understanding whether your sample represents your population or important in the analysis, you should include them in your survey. But be cautious about asking about potentially sensitive information; if you don’t really need it, don’t ask. Furthermore, if the number of responses in a single category is sure to be small, don’t ask. Having a very small number of responses in a single category leads to potentially identifiable data, which you won’t publish or present for privacy reasons. I once heard of a survey that asked for the respondents’ gender and gave about 20 answer choices. I while I whole-heartedly agree that gender is not binary, this was not the right thing to do for that organization. The list of choices is overwhelming and respondents will have a hard time finding the answer they want. For people who are going to check something other than “male” or “female”, the question might be sensitive or even scary. Giving so many answer choices increases the chances that there would be only one or two responses in a given category, leading to privacy concerns. Since the organization in this case had no plans to serve students with diverse genders differently or better, there was no benefit to the respondents in collecting that data. A better way to ask this question might be to ask "which gender do you mostly identify with?" with response choices of: male, female, not listed, and prefer not to say. Here is an excellent blog post on gender questions.
Recently, researchers have gone to asking demographic data at the end of the survey. This is for two main reasons. First, fatigue. Sometimes, respondents get tired of our questions and stop answering them, especially if they find the questions intrusive or offensive. If the demographic questions are less important than other questions, you might want to ask those other questions first, rather than wearing out your respondents before they get to the important stuff.
A second reason is a concern about stereotype threat . This is a phenomenon where being reminded of their membership in a group will cause respondents to answer questions differently than they would have otherwise. This is particularly a concern for cognitive tasks like math problems, and probably less so for other kinds of questions like ones about opinions or experiences. I prefer to be cautious and leave my demographic questions to the end unless they’re crucial for directing respondents to other questions.
In presenting your data, you may want to skip presenting the demographic data altogether. There is a tendency to present every single question on your survey with a bar chart or pie graph. But sometimes, the demographic data doesn’t really tell us anything except how well the sample matched the population that you’re talking about. In your report or presentation, simply mention that the sample was representative of the population and link to the data in an appendix.