Microcensus
The microcensus is an official statistical survey which is carried out annually since 1957 using a representative sample of one percent of the German population and households. In total, about 370,000 households with 810,000 household members participate in the survey. Being designed as a multiple-subject survey, the microcensus provides important statistical information on the population structure, on the economic and social situation of the population, families and households, on the employment market, on the occupational outline and the training of the workforce and on living conditions.
There is an obligation to provide information for the majority of questions.
Given its broad range of variables and its large sample size, the microcensus forms an appropriate data base to analyse small subpopulations, such as single migrant or occupational groups. Detailed regional analyses, for example with regard to life chances of different social groups, increasingly gain significance in scientific research. Results of regional analyses can, for example, be displayed on the level of regional adjustment shifts - regional units of 500,000 inhabitants on average. Furthermore and in addition to cross-sectional analyses, the high continuity of the survey design allows for analyses over time (trend analyses), by which historical developments can be revealed. Being designed as a rotating panel, some survey years of the microcensus can also be used for panel analyses. The microcensus is also suitable for comparisons in an international context as various subjects are adapted to international standards (e.g. the labour force concept).
Characterized by its large sampling size, its variety of subjects and temporal continuity, the microcensus constitutes an important data source for the social sciences.
Available microdata
The following survey years of this statistic can be used via the different ways of access. Please note that at the safe centre, the municipality code for cities in the state Bavaria is only available as a pseudonymised code.
Please note that only survey years with English metadata are listed below. Please check the German language page for all available data.
Microcensus (EVAS 12211)
| Reporting year | Available forms of access | |||
|---|---|---|---|---|
| Off-site use | On-site use | |||
| Campus File (CF) | Public Use File (PUF) | Scientific Use File (SUF) | Safe Centre (GWAP), Remote Execution (KDFV) | |
| 2020 | - | - | - |
available Metadata/DOI |
| 2019 | - | - | - |
available Metadata/DOI |
| 2018 | - | - | - |
available Metadata/DOI |
| 2017 | - | - | - |
available Metadata/DOI |
| 2016 | - | - | - |
available Metadata/DOI |
| 2015 | - | - | - |
available Metadata/DOI |
| 2014 | - | - | - |
available Metadata/DOI |
| 2013 | - | - | - |
available Metadata/DOI |
| 2010 |
available Metadata/DOI |
- | - | - |
FAQ - Frequently asked questions
General Questions
How do I gain access to the microcensus data?
The microcensus data can be requested from the Research Data Centres of the Statistical Offices of the Federation and the Federal States. Data usage is subject to a fee. You can find more information about the data offering, data access, https://www.forschungsdatenzentrum.de/en/request and terms of use on the websites of the Research Data Centres of the Federal and State Statistical Offices.
Who can I contact if I have questions about the microcensus?
You can find information about the entire data offering and data access under the orange button on the right side of the screen and on the respective product pages of the individual survey years and access methods. Click on the products that interest you in the overview table above. You will find the questionnaires, the codebook, the quality report and the metadata reports on the German language page.
In the metadata report part 1, you will find basic information about the survey and methodology. This includes, among other things, the upscaling factors used in the microcensus data as well as an overview of the classifications used. Part 2 of the metadata report refers to the product and contains, among other things, details about data preparation, the coding of missing values and the anonymisation measures applied. It also includes a chapter on the comparability of the characteristics over time.
Detailed information and evaluation aids for the Scientific Use File of the microcensus – including tools for implementing social science concepts, variable-time-point matrix, linking MZ cross-sectional surveys to panels – can be found on the product pages of the respective microcensus survey years. They are also available on the Microdata Information System (MISSY) of GESIS.
If you have further questions, you can contact the RDC locations responsible for the microcensus, NRW and Federation.
What methodological changes were implemented starting 2020?
From the survey year 2020, in addition to the Labour Force Survey (LFS), which has been integrated since 1968, the previously separate Survey on Income and Living Conditions (SILC) is also integrated into the microcensus survey. The Labour Force Survey has been supplemented since 2020 by a sub-annual repeat survey. Furthermore, starting from the survey year 2021, the survey on the use of Information and Communication Technology in private households (ICT) is also integrated.
Starting from 2020, in addition to personal (CAPI), telephone (CATI) and postal interviews, web interviews (CAWI) are also used as a new survey method.
Further information can be found in the publication „Die Neuregelung des Mikrozensus ab 2020“ (“WISTA - Wirtschaft und Statistik”, 6/2019; available only in German).
What is covered in the core programme of the microcensus?
The microcensus is a survey on several topics. The survey consists of a core question programme and additional survey parts.
The questions of the core programme relate to the following subject areas:
- Details about the household (such as household size), the accommodation and the individual (such as gender, year of birth, marital status, nationality)
- Employment, profession
- School, studies
- Income and employment situation, job search
- Training and further education
- Childcare
- Pension provision
More detailed information on the survey contents can be found, among others on the German language page on the product pages of the individual survey years in the respective metadata reports Part I „Statistic“ for 2005-2019 and from 2020 onwards. Click on the products of interest in the overview table there to find them.
What classifications are there in the microcensus?
All occupations are categorised using the database of documentation key figures (DKZ), the classification of occupations (KldB), and the International Standard Classification of Occupations (ISCO).
For the economic sectors (branches), within which the aforementioned activities were carried out, the classification of economic sectors (WZ) is applied.
The highest level of school or vocational education is classified using the education scale ISCED (International Standard Classification of Education).
What should be considered when reconciling the data with the tables from the GENESIS-Online database and other official publications?
To replicate the household tables and tables of living arrangements, the data must be filtered by main residence households. The tables from the GENESIS-Online database and the statistical report on the final results of the microcensus only consider main residence households. These can be selected using the variable th0201h. In the survey parts SILC and ICT, only main residence households are surveyed.
For tables in which age is displayed, the age variable tpalter_1 is used. For publications from the SILC subsample, rb081 or rb082 is used.
For gender, tpgeschlecht is used. For SILC, rb090 is used. In the survey characteristic ab0200p, gender is listed with four possible characteristics. However, characteristics 3 and 4 are significantly overestimated and show much higher proportions than in the population registers. The survey characteristic itself is not used for official publications. For a person who has indicated "Diverse" or "No statement according to birth register" for the variable ab0200p, the characteristics "male" or "female" were randomly assigned to the variable tpgeschlecht.
The survey characteristics from SILC are not plausibilised and are not used directly for publications. For delivery to the EU and official publications, the SILC target variables db100 to ps102 are used. These are also recommended for use in the RDC. To be able to replicate published tables, the SILC target variables must be used. An overview of the SILC target variables can be found in the key directory, which you can find on the product pages of the individual waves.
If the actual total population is to be represented, this can be selected using the variable tpbevlkrg in c(1,2,4). This filter excludes persons at secondary residences, i.e. only persons at the main residence and in communal accommodations are considered.
Questions about data access
Through which access methods are the microdata of the microcensus available?
The microcensus data can be used through various ways of access with different levels of anonymity.
The formally anonymised data of the microcensus can be used in full via the On-Site ways of access Remote Execution and Safe Centre.
Via the Off-Site ways of access, the microcensus can be used as factually anonymous Scientific Use File (SUF) and absolutely anonymous as Public Use File (PUF) or Campus File (CF).
Additionally, the microcensus can be used via remote access. With the Remote Scientific Use File (Remote SUF), the data is also factually anonymous, but with a higher informational content (e.g., full sample and regionalised down to the level of regional adjustment layers) than with an Off-Site SUF.
Which ways of access are available for the individual waves of the microcensus can be found in the overview further up on this page.
What is the difference between the SUF and the On-Site material of the microcensus?
The data differ in terms of the range of characteristics and the information content.
Via the On-Site ways of access Remote Execution and Safe Centre, the complete microcensus can be used formally anonymous. That is, the entire 1% sample is available, the deepest spatial level of aggregation contained is the municipalities. For the federal state of Bavaria, the municipal level is only available pseudonymised at the Safe Centre.
The Scientific Use File (SUF) is a 70% subsample from the microcensus dataset. Selected characteristics have been removed, pseudonymised, or aggregated. The lowest spatial level of disaggregation included is the government districts (NUTS 2). Information on the implemented anonymisation measures can be found on the German language page in the metadata report part 2 of the respective survey years. The respective feature set can be found in the SUF data handbooks and the key directories of the other ways of access. Click on the products of interest in the overview table above to find out more.
Questions about upscaling and filtering
How do I know which upscaling factor is the right one for my evaluation?
In the microcensus, various upscaling factors are available, which are scaled to the entire resident population. In this process, upscaling is carried out depending on various known demographic key figures for the population, such as age groups, gender, nationality, and regional distributions. Detailed information on upscaling from 2020 onwards can be found in Schmidt and Stein 2021 (available only in German).
The “correct” upscaling factor depends on which subsamples the evaluated characteristics originate from. Detailed information on the individual upscaling factors can be found on the German language page in the metadata report part 1 (section “Hochrechnungen”) and the metadata report part 2 (“Wann wird welcher Hochrechnungsfaktor verwendet?”). You can find the metadata reports of the individual survey years on the respective product pages. Click on the products of interest in the overview table there to find out more.
What should be considered when comparing the data with the publications of the Federal Statistical Office?
For official publications, filtering is done for main residence households (th0201h = 1).
In LFS, SILC and IKT, only main residence households are surveyed.
For evaluations at the household level, filtering is done for the main income earner (th0501h = 1).
Depending on the evaluation, the following additional filters are applied:
- The variable tpalter_1 is used for age
For SILC publications, rb081 or rb082 is used for this purpose - The variable tpgeschlecht is used for gender. In this case, ab0200p = 3 or 4 is recoded into "male" or "female" with a 50% probability each
For SILC publications, rb090 is used for this purpose
The survey characteristics from SILC are not plausibilised and are not used directly for publications. For delivery to the EU and official publications, the SILC target variables db100 to ps102 are used. These are also recommended for use in the RDC. To replicate published tables, the SILC target variables must be used. An overview of the SILC target variables can be found in the codebook.
If the actual total population is to be represented, this can be selected using the variable tpbevlkrg in c(1,2,4). This filter excludes persons at secondary residences, i.e. only persons at the main residence and in communal accommodations are considered.
Which filters does the official statistics use for the additional housing programme in the new microcensus from 2020?
Results for the additional housing programme include exclusively:
- Main residence households: th0201h = 1
- If the number of households is to be evaluated instead of the number of persons: th0501p = 1
- In buildings with residential purposes excluding residential homes: ba0100h in (1 2 3)
- With only one household in the apartment: tw0022w = 1
Results on rents and rental burden additionally include exclusively main tenant households: ba1901h = 3
How are households selected in the microcensus data?
From 2020 onwards (On-Site and Remote Access) and in the Scientific-Use-Files from 2015, IDs for person (idpers, idpersx), household (idhh, idhhx), and selection district identification (idawb, idawbx) are already included in the dataset.
In the IDs of the longitudinal data (IDs without x at the end), there may be duplicates of the IDs due to repeat surveys of the LFS part from 2020 and due to carry-over interviews until 2019. The IDs with x at the end are duplicate-free, i.e. unique, and can be used for cross-sectional data. The repeat surveys must be included in the creation of annual results. This does not apply to LFS structure variables, which are only collected in questionnaires 2 and 3.
When linking over several survey years, it may be desired to consider each household only once per survey year. For this purpose, the removal of repeat surveys awbauswahlteil = 4 is recommended. The formation of these IDs is carried out from 2020 for the respective units by concatenating the following variables:
- idpers: land awbnummerfremd hhnummerfremd pernr
- idpersx: land awbnummerfremd tpberichtsquartal hhnummerfremd pernr
- idhh: land awbnummerfremd hhnummerfremd
- idhhx: land awbnummerfremd tpberichtsquartal hhnummerfremd
- idawb: land awbnummerfremd
- idawbx: land awbnummerfremd tpberichtsquartal
Spaces are replaced by zeros in the formation of the identifiers.
Why do the extrapolated total case numbers differ between different variables or between different extrapolation factors?
Die Unterschiede in den hochgerechneten Gesamtfallzahlen kommen aus verschiedenen Gründen zustande:
- Differences between annual and quarterly extrapolation factors may arise because the annual extrapolation factors tend to include more known key figures for the population (e.g. age groups, gender, nationality, and regional distributions) than the quarterly extrapolation factors. More detailed information can be found in the metadata report part I Statistics in Chapter 2.6 (only on the German language page, click on the products of interest in the overview table there) and in the publication "Die Hochrechnung im Mikrozensus ab 2020" ("WISTA - Economy and Statistics", 6/2021, only in German).
- Discrepancies in the LFS sample arise from European consistency requirements. These require that quarterly and annual results of the LFS subsample must be consistent with each other. These requirements were implemented in preference to the national requirement (result consistency between the individual subsamples). To meet the European requirements, the LFS structural features (i.e. LFS annual results) are extrapolated to the microcensus core quarterly average for official publications. The MZ core annual result does not correspond to the quarterly average but is extrapolated independently.
Due to the drawing of the 70% subsample, there are also smaller discrepancies between the extrapolated sums of the MZ-SUF and the figures published in the specialist series of the Federal Statistical Office or the original microcensus data.
Questions about the integrated survey parts
How is the rotation scheme of households from the LFS subsample structured?
To better analyse changes in the labour market throughout the year, the rotation scheme in the microcensus was adjusted from 2020 onwards. Unlike the core and SILC programmes, which are surveyed once a year, the LFS part rotates at shorter intervals.
Households selected for the LFS part are surveyed using a 2-(2)-2 scheme. This means that households are surveyed in two consecutive quarters, then pause for two quarters, and then are surveyed for two consecutive quarters again. The questionnaires used are in the order 3-4-2-4. A more detailed description can be found in the publication „Die Neuregelung des Mikrozensus ab 2020“ („WISTA - Wirtschaft und Statistik“, 6/2019, only in German).
Why are some variables less represented in the integrated survey parts than others?
With the implementation of the new features from the 2020 microcensus, sub-samples were introduced, which result in not all questions being asked to all households to be surveyed.
For the microcensus, 1% of the German population is surveyed. All respondents in private households are asked the questions of the microcensus core programme, regardless of their affiliation with a sub-sample.
In addition, according to the Microcensus Act (in German), surveys on labour market participation (LFS) are conducted in up to 45% of the selected districts. These also receive questions on the additional programmes, except for housing, for which everyone is surveyed.
Up to 12% of the microcensus sample receive the survey on income and living conditions (SILC). Since 2021, the statistics on the use of information and communication technologies (ICT) have also been integrated into the microcensus. This comprises up to 3.5% of the respondents. The realised selection samples may deviate from this. All sub-samples are non-overlapping. Accordingly, LFS, SILC, and ICT variables are less frequently represented than core variables.
It should also be noted that some questions are voluntary, which can result in a lower number of observations. The coronavirus-related restrictions on the quality of the microcensus in the survey years 2020 and 2021 also contribute to missing values in the surveys for 2020 and 2021.
The publication "Die Neuregelung des Mikrozensus ab 2020" ("WISTA - Wirtschaft und Statistik", 6/2019, only in German) contains further information on the survey components. Information on the subsamples as well as the number of voluntary and mandatory questions per survey programme can be found in the quality report on the microcensus. The quality reports can be found with the metadata of the respective reporting years. To do this, click on the products of interest in the overview table above. Please also see the German language page for more information.
Questions on Definitions
How is migration background defined in the microcensus?
In the microcensus, migration background is based on citizenship at the time of birth. Thus, a person is considered to have a migration background if they were born as a German and at least one parent had a different citizenship. This definition is only conditionally internationally compatible. Descendants of expellees who immigrated before 1950 are considered as persons without a migration background.
Detailed information on this topic can be found in the publication "Die Umsetzung des Konzepts ‚Einwanderungsgeschichte‘ im Mikrozensus 2022" ("WISTA - Wirtschaft und Statistik", 4/2023, only in German).
How is immigration history defined in the microcensus?
On the recommendation of an expert commission, the concept of immigration history was included in the microcensus. This is based on migration experience. This definition is compatible with the definition of the UN and Eurostat, according to which all persons who immigrated before 1950 are not considered immigrants and their children are not considered descendants of immigrants. In essence, the concept only captures persons with bilateral immigration history (i.e., from both parents). However, in order to cover as many needs of users as possible with the new definition and to enable better comparison with the migration background, the category of those with so-called „unilateral immigration history“ (i.e., persons born in Germany who have only one parent who immigrated since 1950) is listed separately.
Detailed information on this topic can be found in the publication "Die Umsetzung des Konzepts ‚Einwanderungsgeschichte‘ im Mikrozensus 2022" („WISTA - Economy and Statistics“, 4/2023, only in German).
Questions about specific contents
At which regional levels can the data be reasonably evaluated?
The variables from the core material can be evaluated from NUTS 0 to the regional subgroup. In the LFS subsample, evaluations are possible down to the regional adjustment layer. The SILC and ICT subsamples can be evaluated up to the NUTS 1 level (federal states). If variables from these subsamples are evaluated at finer regional levels, distortions may occur.
Why is there inconsistent personal information between different survey parts?
The different survey components have different extraction dates. For example, data extraction from SILC takes place much earlier due to EU delivery obligations than the extraction of the core material. Therefore, it is possible that corrections to personal characteristics (such as tpalter_1) are made in the core material that are no longer subsequently corrected in the corresponding SILC target variables (e.g. rb081). Thus, it can occasionally happen that a person in the SILC subsample is assigned to a different household (hb030) than in the core material (idhh).
Why are there people in the data from 2022 onwards for whom only the SILC or ICT variables are filled, but not the core variables?
Here too, the different extraction dates are responsible for the inconsistencies. For SILC and ICT, data extraction takes place much earlier than for the core material. Thus, it occasionally happens that data records that are included in the SILC or ICT part are subsequently filtered out as implausible cases in the core part.
What are HWS households?
HWS households stands for main residence household (German: HauptWohnSitz-Haushalte). Publications in official statistics are fundamentally filtered according to main residence households. Depending on the research project, it may be advisable to proceed in the same way.
A private household is a main residence household if at least one household member who is 16 years or older has their main residence there. Accordingly, people can also have a secondary residence at main residence households and be counted both at their main and secondary residence. If you want to avoid this double counting, you can filter the TPBevlkrg feature by the values 1, 2 and 4. This way, you can also include people in communal accommodations. Communal accommodations, such as nursing homes or monasteries, do not belong to private households and thus do not count as main residence households.
Further information can be found in the publication "Haushalte in der Berichterstattung des Mikrozensus ab 2020“ ("WISTA - Economy and Statistics", 3/2020, only in German).
How can information about the mother living in the household and the father living in the household be assigned to a surveyed person?
From 2020 onwards, the personal number of the mother or father can be determined using the household identification number (idhhx) and the characteristics tl0702p and tl0802p.
The corresponding individual data set of the mother/father can be found using the personal number pernr. The information can then be passed on to the respective reference person across units of observations.
How can the number of children in the household be determined?
First, the data set must be grouped by the household identification number (idhhx). Then, the characteristic tl0101p = 3 can be used to determine whether there is a child in the household (and, if necessary, filtered by age using tpalter_1). This information can then be transferred to the entire group.
Why is there inconsistent personal information between different survey parts?
The different survey parts have different extraction dates. For example, the data extraction from SILC takes place significantly earlier due to EU delivery obligations than the extraction of the core material. Therefore, it can happen that corrections of personal characteristics such as tpalter_1 take place in the core material, which are no longer subsequently corrected in the corresponding SILC target variables (e.g. rb081). This can occasionally lead to a person being assigned to a different household in SILC (hb030) than in the core material (idhh).
Are comparisons between the microcensus data before and after the 2020 methodology change possible in terms of subject matter and area?
Given the significant methodological changes, comparability with previous years is only possible to a limited extent. Nevertheless, there is considerable continuity in many variables, particularly in the core programme. The survey years 2020 and, to some extent, 2021 were also characterised by pandemic-related contact restrictions and methodological-technical innovations, which led to limited sample coverage and thus interpretability of year-on-year results (see also Chapter 2.1 in the Metadata Reports Part II "Product" on the German language page click on the survey years of interest in the overview table there for this purpose).
From 2020 onwards, variable names follow a different system and no longer start with EF. Names of survey characteristics consist of two letters and four digits. This is followed by the identifier of the survey level (p: person, h: household, l: living arrangement). If necessary, further subdivisions follow with the letter u. Typified characteristics start with t. The key directories of the individual survey years refer to the old designations, provided that a variable was also present in years before 2020.
In the 2020 microcensus, only the KldB is included. Why is the ISCO classification of occupations missing?
For the survey year 2020, no ISCO variables are included, as the official statistics refrained from publication due to quality concerns in the 2020 microcensus.
For most survey years of the microcensus, syntaxes for implementing the socio-scientific concepts ESeG (European Socioeconomic Groups), ESeC (European Socioeconomic Classification), ISEI (International Socioeconomic Index of Occupational Status) and CASMIN (Education Classification) for the software packages SPSS and Stata are available on the Microdata Information System (MISSY) of GESIS. As these are based on the ISCO, they are suspended once in 2020.
Why does the number of surveys (approximately 995,500) exceed 1% of the German population?
The actual number of surveys is greater than the sample size of 1%, as 7/9 of the LFS share is surveyed twice in the survey year due to the intra-annual repeat survey.
More information on sampling at the level of selection districts and the number of surveys can be found in the Quality Report of the Mikrozensus 2023 (see German language page) under Chapter 3.
The Federal Statistical Office has changed the calculation of the at-risk-of-poverty rate. Can I replicate the old calculation using RDC data?
Until 2023, two different data sources were used to determine the extent of the at-risk-of-poverty rate measured by the median income at the federal level: the microcensus core programme and EU-SILC. This resulted in different outcomes for the level of the at-risk-of-poverty, which was mainly due to the different definition and recording of household net income. The official main data source for measuring income and the derived at-risk-of-poverty rate in the member states of the European Union is EU-SILC.
With the RDC On-Site data, at-risk-of-poverty rates can be calculated both on the basis of the microcensus core programme and on the basis of EU-SILC. Since 2025, the official statistics have published the rates for Germany on the basis of the EU-SILC data to communicate uniform values to the general public. The calculation based on EU-SILC achieves a Europe-wide output harmonisation, which enables a comparison across the EU.
Note: In the Statistics Portal (only in German), at-risk-of-poverty rates from the MZ core based on the state median can still be retrieved.
Due to the detailed survey of individual income components in EU-SILC, income is specified more precisely by respondents (for example, including state benefits such as child benefit, child allowance, BAföG, care allowance or housing benefit) and non-monthly income or smaller amounts are also recorded. While the figures in the core material are given in 24 classes, the EU-SILC data are recorded rounded to the euro. This is a decisive qualitative advantage of the income figures in EU-SILC over the microcensus core programme.
To calculate the at-risk-of-poverty rate, all figures for household income are summed and the total household income is determined. Subsequently, a needs weighting is carried out for all persons in the household. In this process, the main beneficiary is given the weight 1. Other adults and children aged 14 and over are given the weight 0.5, children under 14 the weight 0.3. The disposable household income is now divided by the sum of the weights. The quotient determined is referred to as equivalent income or needs-weighted per capita income per household member. Subsequently, the at-risk-of-poverty threshold is determined, which is at 60% of the median of equivalent incomes. The at-risk-of-poverty rate describes the proportion of persons with an equivalent income below this at-risk-of-poverty threshold.
More detailed information can be found on the homepage of the Federal Statistical Office.