CISER header image
Cornell University Cornell University CISER

CISER Computing

Q: How can I run Stata with more memory? How can I deal with large Stata datasets?  Why am I getting the message "No room to add more observations?"

A.  When you open a dataset in Stata it loads the entire file into its memory.  If the file is larger than the default settings for memory and/or for the maximum number of variables you will get a message indicating that to be the case. However (when using Stata on the research servers) even when you can successfully open your data file, running a job that ends up requiring more memory for processing than is available, Stata will sometimes shut down unexpectedly and without warning!

Solution:

  • On any research server you can run the command "query memory" to find out what the default settings are for the amount of memory available to the program (memory), the maximum number of variables allowed in the dataset (maxvar), and the maximum number of variables allowed in a model (matsize).
  • You can use the SET MEM, SET MAXVAR, and SET MATSIZE commands to adjust these settings (instructions below).

Using the SET MEM command:

  • By default the memory allocation for Stata jobs on the research servers is 10 MB.
    • At present there are no memory restrictions for research server users, though we we expect you to request additional memory very carefully and only when necessary.
    • Important: Keep in mind that requesting more memory than you need can slow your job down and affect other users.
  • To request more memory for a Stata job, you can use the "SET MEMORY #K|M|G"  command to change the amount of memory available to you for the current Stata session:  
    • Example:  set mem 20M
  • Important: Once your job has completed it is very important that you *close stata* OR clear the memory and set the memory limit back to the default amount of 10 MB.
  • User Tip: Increase your memory requests in relatively SMALL increments!
    • There is a very good explanation of how Stata uses memory on the Stata web site at http://www.stata.com/support/faqs/win/memory1.html  That document also tells you how to estimate the amount of memory you truly need.   However their best suggestion is as follows:
    • Rather than spending the time to try to calculate the exact amount of memory to give to Stata, it is usually easier to just experiment a little. If you are working with 20 MB datasets, give Stata 25 MB. If you get a "no room to ..." error message, [or your job closes without warning] you know you need to give Stata a little more: try 30 MB. Don't give Stata 60 or 80 MB — this is overkill and can only lead to possible use of virtual memory and slow performance.

Using the SET MAXVAR command:

  • To increase the maximum number of variables allowed, you can use the SET MAXVAR command:
    • Example:  set maxvar 10000   (the maximum possible is 32,767).  

Using the SET MATSIZE command:

  • To increase the maximum number of variables allowed in a model you can use this command.
    • Example: set matsize 500 (the upper limit is 11,000)

More tips for working with LARGE Stata Datasets (whether on your PC or on the research server):

  • Use the "compress" command on larger data sets to optimize memory usage and save the compressed dataset.
  • Use the syntax gen byte newvar to generate new dummy (0,1) and categorical (0,1,2,3,4) variables and use less space. The variable can take on integer values in the interval (-127,100).
  • Consider eliminating the use of string and text variables. You can label numeric variables using the label command.