Presuming the data is tabular and has rows and columns...<p>I look for a manifest or README file that usually explains what the columns are.<p>I look for columns that could be used as unique identifiers or could be primary/frgn keys in a db table.<p>I look at the names of all the columns to understand the domain and if I don't know what a column represents then I make a note of it to find out more.<p>I look for the data type used for each column.<p>I look for each numerical column what the range of values are, what are some basic stats - min/max/mean/mode/std.dev.<p>If the data is in a domain I know then I make a note of if each columns numerical values make sense (does a temperature of -9000 degrees make sense or is it a sensor malfunction / no-read value.)<p>I look for incomplete rows and if anything is blank, why is that?<p>I suppose if you understand all of those you should be ready to load the data into a db or for further analytics.<p>Practically you want to understand the magnitude of the data how many columns and rows does an average payload or batch contain?<p>Can the data fit in memory or not?<p>Does the data come in chunks or is it streamed somehow?