crsp/compustat database changes
Transition Schedule >>
XpressFeed Summary >>
In the middle of 2007, CRSP began work to utilize Xpressfeed data as input to the CCM. Xpressfeed represents a significant change in Compustat data organization, with roughly 5000 new data items, 2700 data codes, and 2600 footnotes.
CRSP is approaching this conversion in two phases. The first phase is to map the Xpressfeed data to the existing CST format. This transitional solution will provide backward compatibility with existing programs and improve the quality and coverage of the data. It will not take full advantage of all data available in Xpressfeed packages however. The second phase is to create a new format for the CCM data based on the Xpressfeed data.
Because of the extensive scope of this project, CRSP has found it necessary to make revisions to our original targeted shipping dates.
In order to ensure a smooth transition, the release of the new CCM database format is postponed until some time in the first half of 2008, most likely in early May 2008. At that time, monthly and quarterly subscribers will receive their Compustat database in both old and new formats. Annual subscribers will receive the new CCM format along with the current CST format as part of their annual shipment in July 2008. The new CCM format will require new CRSPAccess and CRSPSift utilities which will ship when the new CCM database ships. Subscribers using the CRSP ‘C’ and FORTRAN-95 APIs will be required to make additional programmatic changes.
While we strongly recommend that subscribers begin using the new format databases as soon a possible, CRSP will ship old and new data formats for some transitional period, in order to ensure that subscribers have continuity in reporting and production needs and have ample time to revise their processes to accommodate the new CCM format.
Note that with the increased number of data items introduced in the CCM format, along with increased precision for standard items, the CCM database will significantly increase in size. The current CST format is roughly 2.5GB in size. The new CCM format will be roughly twice that size.
Transition Schedule
Note: For reference in the following, “CST” is the shorthand code for the old data and tools, “CCM” is the shorthand for the new data and tools.
Annual Subscribers
Spring 2008
Software
All subscribers will receive CRSPAccess Version 3.0 and CRSPSift Version 2.0. The new version of CRSPAccess will include a new CCM utility to accommodate the new structures and additional data items. CRSPSift will include variable selection tools based on new and old Compustat categories. CRSP will work with SAS to ensure that an updated version of the SASECRSP engine will work with the new CCM database format. While annual subscribers will not take full advantage of the new Compustat features until the annual data shipment, they can install the new software versions and take advantage of other features.
July 2008
The Annual 200806 cut, shipped in July 2008, will be provided in both old and new formats. Subsequent CCM database shipments will be shipped in the new CCM format only. CRSP will support both CST and CCM versions for one year.
Monthly Subscribers
October 2007
The CST database is created from the existing ftp process and shipped in early October as scheduled. Later in October, an X-cut version of the same database will be created using data provided through the Xpressfeed process. This database will be available to subscribers only on request. Subsequent months’ CST databases shipped will be created from the Xpressfeed process.
Through Spring 2008
Monthly subscribers will receive Xpressfeed data cuts in the old CST format. Due to this transition, there will be some minor changes in the data and universe. CRSP will be sure to inform subscribers of this transition date once it is confirmed.
Spring 2008
Software
Subscribers will receive CRSPAccess Version 3.0 and CRSPSift Version 2.0. The new version of CRSPAccess will include a new CCM utility to accommodate the new structures and additional data items. CRSPSift will include variable selection tools based on new and old Compustat categories. CRSP will work with SAS to ensure that an updated version of the SASECRSP engine will work with the new CCM database format.
Data
Monthly subscribers will receive two versions of the 200704 data cut shipping in May 2008: one in the old CST format and one in the new CCM format.
May 2008 through January 2009
Monthly subscribers will continue to receive both the CST and CCM format of the data for these data cuts.
Beginning February 2009
The transition from old to new formats will be complete. Users must access the January 2009 cut shipping in February 2009 and any cuts after that in the new CCM format databases.
Quarterly Subscribers
November 2007 through February 2008
Quarterly subscribers will receive quarterly 200710 and 200801 CCM databases, created from Xpressfeed and in the old CST format.
Spring 2008
Software
All subscribers will receive CRSPAccess Version 3.0 and CRSPSift Version 2.0. The new version of CRSPAccess will include a new CCM utility to accommodate the new structures and additional data items. CRSPSift will include variable selection tools based on new and old Compustat categories. CRSP will work with SAS to ensure that an updated version of the SASECRSP engine will work with the new CCM database format.
Data
Quarterly subscribers will receive two versions of the 200701 data cut shipping in February 2008: one in the old CST format and one in the new CCM format.
May 2008 through January 2009
Quarterly subscribers will continue to receive both the CST and CCM format of the data for these data cuts.
Beginning April 2009
The transition from old to new formats will be complete. Users must access the first quarter cut shipping in April 2009 and any cuts after that in the new CCM format databases.
Summary Xpressfeed Changes
- Xpressfeed introduces a large number of data items. The FTP files include roughly 1000 annual and quarterly items. Xpressfeed provides over 4000 data items.
CRSP will keep the look and feel of existing tools and APIs for the new CCM format while supporting all available Compustat data items. Data items will be presented in categories, such as reporting frequency and financial statement, to ensure useability of the large dataset. During the transition period, all existing APIs and utilities will be supported and preserved to work with the CST format databases. Programming access will not be fully backward compatible.
- Xpressfeed items are mnemonic-based with new secondary keys. Compustat will no longer rely on item numbers. For example, where the old format includes separate item numbers for original and restated values of an item, Xpressfeed will provide one mnemonic for each value, but will distinguish the original and restated values with a data format qualifier.
CRSP will utilize new mnemonic names and will not preserve old variable names and item slots in the CCM format database. CRSP variable selection tools will present all possible secondary key combinations for each mnemonic. CRSP will include mapping reference data within the data files to find the Xpressfeed mnemonic based on the FTP format item number.
- Xpressfeed organized data by data groups, each with a set of keys and mnemonics. Xpressfeed no longer divides files by inactive, active, and groups of companies. It divides data along current and backdata time frames.
CRSP will continue to merge data for all companies. Current and backdata time periods will be provided in one time series covering the entire available range. CRSP CCM databases will be divided into modules of similar data, generally following Xpressfeed groupings. In some cases, most notably the annual fundamental items, CRSP will subdivide a large data group into modules based on logical categories, such as financial statements.
- Xpressfeed introduces new security-level identifiers and additional security-level data.
CRSP will maintain existing company level link data and functionality that has been well-established. CRSP plans to add the new security level link data, but has not yet determined if it will be included in the restatement of the 200706 Annual CCM cut.
- In addition to annual and quarterly frequencies, data through Xpressfeed are now reported on year-to-date and semi-annual frequencies.
CRSP will make data for these groups available and will include a new semi-annual calendar for time-series data.
- New reference and global economic data are available in Xpressfeed.
CRSP will include Compustat reference data in the database. CRSP will also include CRSPAccess-specific reference data in the databases.
- Xpressfeed index data are in separate data groups with an identifier, GVKEYX.
CRSP records will include company and index data under the same structures and will use GVKEY and GVKEYX interchangeably.
- Xpressfeed introduces new data codes and different access to footnotes. For example, missing data codes are marked in separate fields instead of within the item data. Data codes and footnotes are available with all fundamental items and each has a unique mnemonic.
- Additional precision of data items is available in Xpressfeed.
CRSP will present the standard data items with this increased precision.
- Xpressfeed data are presented as calendar-based events instead of fiscal-based time series.
CRSP will continue to provide data items with a fiscal-based default in a time-series, and will provide tools to present data on a calendar basis, if desired.
- Xpressfeed does not support Operating Segment Data.
For the foreseeable future, CRSP will preserve the structures and access of Operating Segment Data from the FTP format.
