23.1 Reducing the number of records
If your goal is to reduce your dataset in size, you can do this by taking a simple random sample of your data. This approach means each record has an equal likelihood of being selected.
This jupyter notebook shows how to sample a data set in Python.
The following video shows how to do this in JMP.
The following video shows how to do this in R.
You can also do this with a simple R script. An example is shown in the image below. In this example, R is used to read in a large data file ("brooklyn_homes.csv") that contains 24,209 records. Then it creates a sample of 10,000 records and write the sample out to a new file ("Brooklyn10k.csv").