WORKING PAPER:

MixingPot.ai Survey 1: “Quick Assessment of Non-public Information Spread” – Overview and Initial Insights

BACKGROUND

The first pilot survey by MixingPot.ai was conducted April 18, 2019. A small, paid sample of respondents were recruited by SurveyMonkey via the SurveyMonkey Audience service. The survey was titled “Quick Assessment of Non-public Information Spread” and contained only 5 questions, which were accompanied by a vector of demographic and socio-economic contextual data, provided by SurveyMonkey. The code used to analyze this data is available on GitHub. The anonymized (masked) raw data is available by request.

This pilot survey had the following goals:

  • To explore Survey Monkey as a platform for designing surveys pertaining to business-based monoculture, related trends in other fields, and the practice of so-called “indoctrination”
  • To explore Survey Monkey as a platform for recruiting audiences
  • To be a first iteration in the evolutionary process of designing questions pertaining to complex, non-public topics
  • To provide a first, very rough, estimate of the spread of non-public (censored, repressed, obscured, etc.) information about the business-based monoculture and/or “indoctrinations”

OVERVIEW

The survey was conducted between 7:30 p.m. Eastern DST (11:30 p.m. GMT) – 2:30 a.m. Eastern DST beginning April 17, 2019 and ending on April 18, 2019. MixingPot.ai paid for a “Standard” SurveyMonkey membership and 63 responses at a rate of approximately $1 USD per response. In exchange for their participation, each respondent selected a charity to receive a $0.50 donation. Further information on SurveyMonkey audiences may be found here.

Figure 1: Responses by Date and Hour

The size of this sample was too small to receive automated census-based panel balancing by SurveyMonkey, however sample weights were manually calculated based on demographic attributes. Results from the raw and balanced data follow. Future analysis of the current data sets may leverage simulation to garner additional insights from the data and future surveys should focus on increasing sample size, as well as improvements in question wordings, design specifications, and completeness.

The sample was recruited subject to the following constraints:

  • Must be 18+ years of age
  • Physically located in the United States, regardless of citizenship

Essentially, the overall adult US population was sampled, in contrast to the targeted design of our 2nd survey. The selection criteria, along with respondent locations, are shown in Figure 2, below.

Figure 2: Respondent Locations and Ages

ANALYSIS

Analytical Strategy

Our analytical strategy was to:

  • First, describe the survey questions and raw responses
  • Second, describe the weighted results, to account for imbalances in our sample
  • Third, to produce rough estimates of information spread, based on the weighted sample

Survey Questions

5 custom questions were asked of respondents, which was supplemented with previously ascertained demographic profile data provided by SurveyMonkey. The questions were as follows:

  1. Which (if any) of these cities/areas have you worked in? (Atlanta, Baltimore, San Francisco, Any city with an Ivy League university)
  2. In the time since your 1st day of college (or other post-secondary formal education) through yesterday, do you recall hearing people use the word “indoctrination”?
  3. Have you ever worked in California? This includes temporary assignments in California by your regular employer, if you work in another location for an employer with a presence in California. If so, were you
    1. approached by any individual or group that stated or strongly implying connection with a secret group or network within business, medicine, law enforcement, or other “white collar” fields? and/or
    2. Indoctrinated. If you aren’t sure, then this does not apply to you.
  4. If you heard about “indoctrination” in connection with a specific sector, which was it?
  5. Question 5 offered a text box for respondents who wished to provide contact information or additional comments.

Question 1 was optional and supplemented geographic data from the demographic profiles. The intent was to “flag” (denote) users in areas with special features known to the researchers, to conduct additional geo-based hypothesis testing.

Only Question 2 was required, and survey logic was used to direct those answering “no” to the end of the survey. Questions 3 and 4 were required for those answering “yes” to Question 2. Question 5 was optional, and results will be omitted from reporting for confidentiality.

Results

Our (raw) sample was disproportionately younger than the overall US population, as shown in Figure 3, providing motivation for the use of sample weights.

Figure 3: Age Range Proportions in Raw Data

The results of Question 1 – which is a custom-coded geo-history variable, not useful to a broad audience but intended for an internal hypothesis test – were both very limited in sample size (n = 13) with a high skip count (n = 50) as a proportion of the data, however the SurveyMonkey demographic profile provided insights on respondents locations (see Figure 2).

Questions 2-3 were the core of the survey. Question 2 was required and was the end of the survey for those without self-reported exposure to some amount of, related non-public information. It allowed for distinction between indirect or plausibly unrelated general culture references to non-public topics (e.g. indoctrination) and directly related information. Only the 6 respondents with self-reported direct-knowledge proceeded to Question 3. A review of 2 free response values to Question 2 indicated only indirectly related knowledge and those records were flagged so that could be removed from consideration when appropriate.

Question 3 (Q3) checked if the respondent’s knowledge stemmed from or included California, where the monoculture has traditionally been mostly powerful and able to conduct illegal indoctrinations of workers. 2 of 6 respondents to Q3 replied in the affirmative. Q3 allowed for distinction between being approach with and without indoctrination, but 100% of those with direct knowledge (n = 2) were victims of indoctrination.

Table 1: Summary of Key Variables

According to the U.S. Census Bureau’s Population Division there were over 252 million adults living in the US in 2017 (age 18 or greater). This represented about 77% of the overall US population – a figure which was stable within 1% point for the 10-year look-back period. If we applied the 77% figure to the newest population estimates for 2019 (approximately 328 million total, projected to rise over 329 million by the end of the year), this means did the current adult population is approximately 253 million. Of those, if the entire adult population was exposed to non-public data at a rate equivalent to that in our survey (2/63, or about 3.17%) then over 8 million adults would be in the affected population alone.

We urge caution in interpretation of these results. The calculation above does not account for sample imbalance, size, or power. Additionally, when we attempt to get a sense of the scale within the United States, this is known to be a global phenomenon with confirm cases in Australia and elsewhere.

Balanced Results from Population Weighting

Using population statistics from the Current Population Survey we balanced our sample in R via the survey library.

# Sample Weighting --------------------------------------------------------
# Unweighted Survey (raw)
data.svy.unweighted <- svydesign(ids=~1, data=survey.summary)

# U.S., 18+ Population Age (marginal probabilities)
gender.dist <- data.frame(Gender = c("Male", "Female"),
                          Freq = nrow(data) * c(0.492, 0.508))

# Raking of replicate weights
data.svy.rake <- rake(design = data.svy.unweighted,
                      sample.margins = list(~Gender),
                      population.margins = list(gender.dist))

data.svy.rake.trim  <- trimWeights(data.svy.rake, lower=0.3, upper=3,
                                  strict=TRUE)

svymean(survey.summary, data.svy.rake.trim)

saveRDS(survey.summary, "survey.summary.RDS") # For .Rmd latex table

stargazer(weighted_means,title="Weighted Means", type = "latex", summary = F, align = T) # Create formatted table

This resulted in the following weighted means.

Table 2: Weighted Means

Note: Currently this is weighted on gender *only*.

Weighting of the sample led to a small increase in the estimated proportion of respondents who indicated that they had been approached and indoctrinated in response to Question 3.

LIMITATIONS

Sample size was a major constraint. SurveyMonkey does not offer survey panel balancing (weighting to make the sample more reflective of the population) for surveys with under 100 respondents, however future research will both conduct further analysis using custom survey weights using R and/or Python and (funding permitting) scale up sample sizes in future surveys. Standard deviations were large compared to means.

CONCLUSIONS

The limitations of small sample surveys are non-trivial, as discussed above. The value of even a small initial amount of public, scientifically collected and analyzed data on an ubiquitously censored topic may also prove non-trivial, especially if it serves as a springboard for more broad research.

The code used to analyze this survey is available on GitHub:

https://github.com/mixingpot/survey_non_public_info/blob/master/main.R

The raw data is available by request.