DBMS/Compare Executive Summary...

Identify differences or similarities across databases, analytical tables or mailing lists.  DBMS/Compare can examine the field content across multiple databases and extract the unique reference numbers for any records that have (or do not have) the same content. Records that have changed in some way across snapshot databases can be extracted for detailed analysis.

Typically used as extension to data audit, this utility has the ability to spot change – both on a record and field basis – and produce HTML and Excel output identifying the records that are different. The software aids the data mining process by identifying just records where data has changed, allowing the user to isolate these data in order to minimise the analysis required to track changes in customer behaviour.

Click here to view other software for data improvement

Click here to see an overview of our data improvement and enrichment services


Overview...

  • Do you have tens, hundreds, or thousands of databases and you aren't sure what's in them?

  • Have you made copies of tables and asked yourself:
            Are they exact copies?
            Have I deleted some fields?
            Did I add extra calculated fields?
            Is it a subset?
            Does it contain more records for the same project?

  • Do you want to compare snapshots of two or more customer segments saved at different time intervals, spot which records have changed (or have remained constant), then extract a file of unique reference numbers and use this file to reduce any subsequent data analysis process?

  • Would you like one spreadsheet, text file or html page showing all variables on all databases?

  • Would you like to build an output file with all the values of all the variables on any number of databases?

  • Have you inherited a long term project and have no idea what is contained in what file?

DBMS/Compare can help you answer all these questions.

Download an evaluation copy of DBMS/Compare 

Order a copy of DBMS/Compare (and or DBMS/Copy)

As hard drives have increased in size we tend not to delete data files.  Why should we?  They might contain useful information.  What's a little wasted disk space?  In a world of faster computers we are able to play "what-if" by building subsets, computing new fields and testing ideas with the data.  The end result is a disk drive full of data.  Some of it may be duplicates, some may be subsets and some of it the original source data.  DBMS/Compare, an exciting new program from the developers of DBMS/COPY, is here to help you make sense of your data files.

How does it work?

Select the Files

DBMS/Compare can search any directory and subdirectories looking for your files.  Here are the sample steps to follow:

  • Specify the starting directory

  • Decide if you want subdirectories scanned

  • Define the filename filter

  • Pick the file types you want to search or locate

The Database and Variable View

After DBMS/Compare finds all the files, it builds the following window.  The window has two views:  Databases and Variables.

The Database view, shows the following information on each file: the directory, filename, extension, creation date, record count, variable count, and a list of the variables on the file.

The Variable view, shows the variable names, the number of databases containing the variable, and a list of the databases.

In the screen shot below, the Database view shows the variable names in a "compressed" view.  The Variable view shows the database names in an "expanded" view.  The database names are in the column heading and each variable/database intersection has a coloured block.  You can switch back and forth between compressed and expanded views.

Database Sorting Options

The database display can be sorted by: directory, filename, extension, creation date, or variable count.  

Here is the list of databases sorted by the record count.

Looking at the display you can quickly see that the 2nd and 3rd databases have the same record counts, same creation date and what is probably the same variable list.  (There is a sort option that will check for variable name matches.)  You can also see that the 4th and 5th databases look like they might be related to the other two databases because the variable names are the same except for the last two digits.  It might be that the data is for 1997 and 1998.)

Variable Sorting Options

The variable display can be sorted by variable name or the number of databases which contain a variable.  

There is one more sort option.  You can highlight one variable name and then sort the others based how many databases each variable has in common with the highlighted one.  The same sort can be done with the Database View -- highlight one database and see what other databases share the same variables.  This is great when you have one database in mind and want to see what other databases share the same variables.

Below is the Variable View with the variables sorted by closeness to the ABSID variable.  As you can see, variable BATCH is on 19 databases and 5 of them also have the ABSID variable.

Variable Name Mapping

Frequently variables that contain the same type of data have different names in different databases.  This might be because of naming conventions across projects or limitations of the package (name length for example).  When you want to compare these variables, it is necessary that they have the same name.  DBMS/Compare gives you the power to map variable names to other names.  Below is the Variable View showing that Address1 is mapped to ADDR1 and Address2 is mapped to ADDR2.  All mapping is accomplished with point and click.  Any number of variables can be mapped to any number of other variables.  Once you are done mapping, DBMS/Compare will rebuild the database and variable views to show the mapping.

Creating Subsets

Once you have identified a set of related databases, DBMS/Compare can create a "subset" view consisting of only those databases and their variables.  Any number of subsets can be created.  Subsets can be further subseted.

Subsets are created by highlighting either database names or variable names.  If you click on a variable name, all the databases containing that variable will be added to the subset.  The first column in the "database" view indicates that the database will become part of the subset.

Below is a subset view of the databases containing the ABSID variable.  By specifying a title we make it easy to keep our subsets organised.

What's In the Database

So far, the DBMS/Compare tools we have shown organize general database information (name, creation date, number of records, number of variables and variable names on the database).  These tools help you understand the structure of the data and how the data files might relate to one another.

Now, we are going to show you the DBMS/Compare tools that help you see the data inside the file -- variable values and individual records.

Value Lists

Once you have a set of related databases, you might want to know what are the variable's values.  DBMS/Compare can build value lists for all variables on a set of databases.  The following table, shows the values of the ABSID variable across 5 databases.  Each table cell shows the frequency of that value on the database.  Each value is color coded based on the matching of counts.  The color coding aids in finding related databases.  For example, value 1502 is on 5 records for mfraprot and mrafsum, 31 records for mrafdrug and 11 times for mrafsurg.  It would appear that mrafprot and mrafsum are related, mrafprot always has the same or lower value count than mrafsum, may be it is a subset.

We can isolate the two databases in one value view so the comparison is easier.

Record Listings

In addition to value lists, DBMS/Compare can display a record by record listing.  In the screen shot below, you can see the 5 databases for each record down the rows and across the columns are the variables.  Each cell shows the variable's values.  This display colour codes the values that match -- variable ABSID has a value of 102 for all but one database.

If you find the colour matching confusing, just turn it off.

The record list can also be rotated to put the variables down the rows and the databases across the columns.  It all depends on what you find helpful.

*** Output Options ***

Not only does DBMS/Compare let you interact with the information but it can be saved to data files for further analysis, manipulation and dissemination.  

Database and Variable Output Options

The Database and Variable views can be written to Excel spreadsheets, HTML files or ASCII text files.  The views can be saved in expanded mode, compressed mode or "unravelled" mode.  You've already seen expand and compressed modes previously -- they generate one row per database.  Unravelled mode generates one line per variable (for the database view) or one line per database (for the variable view).  Here is a short segment of the HTML page generated for the databases in unravelled mode.  (not all variables on all databases are shown.)  Each variable gets one row, that row shows the file information, and one variable.

Directory

Filename

Ext

Date

Records

Variables

Sequence

Variable

C:\temp

a

sd2

2000/07/31

193

26

1

_INDEX_

C:\temp

a

sd2

2000/07/31

193

26

2

_STAT_

C:\temp

a

sd2

2000/07/31

193

26

3

C_DISPL

C:\temp

a

sd2

2000/07/31

193

26

4

C_FOREIG

C:\temp

a

sd2

2000/07/31

193

26

5

C_GRATIO

C:\temp

a

sd2

2000/07/31

193

26

6

C_HDROOM

C:\temp

a

sd2

2000/07/31

193

26

7

C_LENGTH

C:\temp

a1

sd2

2000/07/26

74

12

1

DISPL

C:\temp

a1

sd2

2000/07/26

74

12

2

FOREIGN

C:\temp

a1

sd2

2000/07/26

74

12

3

GRATIO

C:\temp

a1

sd2

2000/07/26

74

12

4

HDROOM

C:\temp

a1

sd2

2000/07/26

74

12

5

LENGTH

C:\temp\vern

a5

sav

1999/10/13

74

1

1

MAKE

C:\temp

abc

sd2

2000/04/18

210

3

1

A

C:\temp

abc

sd2

2000/04/18

210

3

2

B

C:\temp

abc

sd2

2000/04/18

210

3

3

C

C:\temp

abrdthrs

sd2

2000/02/14

241

4

1

CUM_HRS

C:\temp

abrdthrs

sd2

2000/02/14

241

4

2

DATE

C:\temp

abrdthrs

sd2

2000/02/14

241

4

3

DIFF_HRS

C:\temp

abrdthrs

sd2

2000/02/14

241

4

4

RDT_HRS

C:\temp\spss

aipq

sav

2000/09/19

696

11

1

CIGMOOD

C:\temp\spss

aipq

sav

2000/09/19

696

11

2

CIGNOW

C:\temp\spss

aipq

sav

2000/09/19

696

11

3

CONFID

C:\temp\spss

aipq

sav

2000/09/19

696

11

4

CONTROL

Value List Output

DBMS/Compare can write the value lists to either an Excel spreadsheet, ASCII file or HTML page.  You can select one variable, all variables, or a specified list of variables.  If you write more than one variable to an Excel spreadsheet, each variable will go on a separate page.  Here is an example, as you can see the variable names are the page tabs.

In Summary

DBMS/Compare is an exciting new program to help you make sense of your data files.

Click here to download an evaluation copy of DBMS/Compare


© Tech4T (Technologies4Targeting Ltd.) 2002/2004 All Rights Reserved.  www.tech4t.co.uk