|
DBMS/Compare Executive Summary...
Identify differences or
similarities across databases, analytical tables or mailing lists.
Typically
Overview...
DBMS/Compare can help you answer all these questions. Download an evaluation copy of DBMS/Compare Order a copy of DBMS/Compare (and or DBMS/Copy) As hard drives have increased in size we tend not to delete data files. Why should we? They might contain useful information. What's a little wasted disk space? In a world of faster computers we are able to play "what-if" by building subsets, computing new fields and testing ideas with the data. The end result is a disk drive full of data. Some of it may be duplicates, some may be subsets and some of it the original source data. DBMS/Compare, an exciting new program from the developers of DBMS/COPY, is here to help you make sense of your data files. How does it work?Select the FilesDBMS/Compare can search any directory and subdirectories looking for your files. Here are the sample steps to follow:
The Database and Variable ViewAfter DBMS/Compare finds all the files, it builds the following window. The window has two views: Databases and Variables. The Database view, shows the following information on each file: the directory, filename, extension, creation date, record count, variable count, and a list of the variables on the file. The Variable view, shows the variable names, the number of databases containing the variable, and a list of the databases. In the screen shot below, the Database view shows the variable names in a "compressed" view. The Variable view shows the database names in an "expanded" view. The database names are in the column heading and each variable/database intersection has a coloured block. You can switch back and forth between compressed and expanded views.
Database Sorting OptionsThe database display can be sorted by: directory, filename, extension, creation date, or variable count. Here is the list of databases sorted by the record count.
Looking at the display you can quickly see that the 2nd and 3rd databases have the same record counts, same creation date and what is probably the same variable list. (There is a sort option that will check for variable name matches.) You can also see that the 4th and 5th databases look like they might be related to the other two databases because the variable names are the same except for the last two digits. It might be that the data is for 1997 and 1998.) Variable Sorting OptionsThe variable display can be sorted by variable name or the number of databases which contain a variable. There is one more sort option. You can highlight one variable name and then sort the others based how many databases each variable has in common with the highlighted one. The same sort can be done with the Database View -- highlight one database and see what other databases share the same variables. This is great when you have one database in mind and want to see what other databases share the same variables. Below is the Variable View with the variables sorted by closeness to the ABSID variable. As you can see, variable BATCH is on 19 databases and 5 of them also have the ABSID variable.
Variable Name MappingFrequently variables that contain the same type of data have different names in different databases. This might be because of naming conventions across projects or limitations of the package (name length for example). When you want to compare these variables, it is necessary that they have the same name. DBMS/Compare gives you the power to map variable names to other names. Below is the Variable View showing that Address1 is mapped to ADDR1 and Address2 is mapped to ADDR2. All mapping is accomplished with point and click. Any number of variables can be mapped to any number of other variables. Once you are done mapping, DBMS/Compare will rebuild the database and variable views to show the mapping.
Creating SubsetsOnce you have identified a set of related databases, DBMS/Compare can create a "subset" view consisting of only those databases and their variables. Any number of subsets can be created. Subsets can be further subseted. Subsets are created by highlighting either database names or variable names. If you click on a variable name, all the databases containing that variable will be added to the subset. The first column in the "database" view indicates that the database will become part of the subset. Below is a subset view of the databases containing the ABSID variable. By specifying a title we make it easy to keep our subsets organised.
What's In the DatabaseSo far, the DBMS/Compare tools we have shown organize general database information (name, creation date, number of records, number of variables and variable names on the database). These tools help you understand the structure of the data and how the data files might relate to one another. Now, we are going to show you the DBMS/Compare tools that help you see the data inside the file -- variable values and individual records. Value ListsOnce you have a set of related databases, you might want to know what are the variable's values. DBMS/Compare can build value lists for all variables on a set of databases. The following table, shows the values of the ABSID variable across 5 databases. Each table cell shows the frequency of that value on the database. Each value is color coded based on the matching of counts. The color coding aids in finding related databases. For example, value 1502 is on 5 records for mfraprot and mrafsum, 31 records for mrafdrug and 11 times for mrafsurg. It would appear that mrafprot and mrafsum are related, mrafprot always has the same or lower value count than mrafsum, may be it is a subset.
We can isolate the two databases in one value view so the comparison is easier.
Record ListingsIn addition to value lists, DBMS/Compare can display a record by record listing. In the screen shot below, you can see the 5 databases for each record down the rows and across the columns are the variables. Each cell shows the variable's values. This display colour codes the values that match -- variable ABSID has a value of 102 for all but one database.
If you find the colour matching confusing, just turn it off.
The record list can also be rotated to put the variables down the rows and the databases across the columns. It all depends on what you find helpful.
*** Output Options ***Not only does DBMS/Compare let you interact with the information but it can be saved to data files for further analysis, manipulation and dissemination. Database and Variable Output OptionsThe Database and Variable views can be written to Excel spreadsheets, HTML files or ASCII text files. The views can be saved in expanded mode, compressed mode or "unravelled" mode. You've already seen expand and compressed modes previously -- they generate one row per database. Unravelled mode generates one line per variable (for the database view) or one line per database (for the variable view). Here is a short segment of the HTML page generated for the databases in unravelled mode. (not all variables on all databases are shown.) Each variable gets one row, that row shows the file information, and one variable.
Value List OutputDBMS/Compare can write the value lists to either an Excel spreadsheet, ASCII file or HTML page. You can select one variable, all variables, or a specified list of variables. If you write more than one variable to an Excel spreadsheet, each variable will go on a separate page. Here is an example, as you can see the variable names are the page tabs.
In SummaryDBMS/Compare is an exciting new program to help you make sense of your data files. Click here to download an evaluation copy of DBMS/Compare
|