A lot depends on what kind of cleanup you need to do. If each cleanup step can be done in the context of a single XML file, then your simplest approach is probably to skip databases completely, just process one file at a time in your favorite language.<p>If you need set-oriented operations, it's hard to imagine you can do better than use SQL, although that presumes that you have normalized the XML into SQL, which may or may not be trivial. Depends on the structure of your XML docs.<p>After the cleanup: hard to say what's best, but it depends on the structure of the data, what you want to do with it, and how much of it there is. To me, SQL is the tool of choice nearly always, unless you have requirements for data volume or data structuring that are incompatible. On the latter point (data structuring), since you have XML, I would guess that it would not be difficult to define a SQL schema. I.e., the schemaless aspect of NoSQL systems might not be important for you.