Guccifer 2.0 CF Files Metadata Analysis

Guccifer 2.0 CF Files Metadata Analysis

Introduction

Stephen McIntyre recently noted some interesting characteristics of a 7zip archive that Guccifer 2 published back in October, 2016.  (In case you are unaware of Stephen McIntyre, he has his own Wikipedia page.  Mr. McIntyre has published extensively on issues relating to climate data. He can also be followed on Twitter.)

Guc2-sm-cf-7z-tweet-1

McIntyre looks into the cf.7z file disclosure in more detail in his article, Guccifer 2 Document Dates (Sept. 18, 2017).

McIntyre refers to the publication of a large collection of documents and data by a persona known as Guccifer 2.0, which was announced on their blog on October 4, 2016.

Guc2-cf-announce

A link to download the data is shown as the second URL on that page, ending with “cqri63iyzrh6piv/cf.7z”.  We downloaded that large 7zip file, unpacked it and analyzed its metadata (file last modified times and internal metadata maintained by Microsoft Office products).

Technical note: the “cf.7z” 7zip file has a file size of 860,107,023 bytes and an MD5 sum of “c7574d3503ed3009ec918cf8b79c007b”.   Its unpacked size is approximately 1.2 GB and has 2,085 files.

In this report the notation, CF, will refer to the files found in the cf.7z archive that are dated 2016-07-05.  The notation, cf.7z, will refer to all the files in the 7zip archive that is the subject of this analysis.  The notation, NGP/VAN, will refer to the files in the “NGP/VAN” 7zip archive published by Guccifer 2 circa Sept. 13, 2016; that archive was the subject of a report authored by the Forensicator, titled Guccifer 2.0 NGP/VAN Metadata Analysis, published on July 9, 2017.

Feedback

You may leave comments on the blog entry that announced the publication of this report: Guccifer 2.0 CF Files Metadata Analysis.  Comments will remain open until October, 3.

Findings

  • The CF files (dated 2016-07-05) fall into gaps in the NGP/VAN file time line.  One large directory, OFA, precedes the earliest NGP/VAN file by about 1 minute.
  • The fact that the CF files’ last mod times generally fall into gaps in the NGP/VAN file time line affirms the Forensicator’s conclusion in the Guccifer 2.0 NGP/VAN Metadata Analysis report that the NGP/VAN time gaps were likely due to deliberate selection from a larger collection, and the gaps were not due to “think time”.  This confirmation will be fed back into the NGP/VAN analysis as an update.
  • The last mod times of all the files in the cf.7z archive are all even multiples of two (2) seconds, indicating that this material was copied to a FAT-formatted media (e.g., a USB thumb drive) before the final cf.7z 7zip file was built from the files on that media.
  • The last mod times in the CF files (dated 2016-07-05) appear to be one hour earlier than those recorded in the NGP/VAN files. The Forensicator proposes a scenario where a FAT-formatted media (e.g., USB thumb drive) was written while in a location where Central US time zone settings were in force.  This FAT-formatted media was then transported to a location where Eastern US time zone settings were in force.  There, the material on the thumb drive was copied to an NTFS-formatted hard drive and the final (cf.7z) 7zip file was built from this copy of the files present on the hard drive.  The result of this long chain of events is a series of CF files that appear to be time stamped one hour earlier than those in the NGP/VAN archive.
  • There are an extensive number of time gaps that are internal to directories in the CF files. This indicates that either the files were pulled from different source directories into a single destination directory (as is the case for the Donor Research and Prospecting directory), or the files were heavily curated/redacted (as appears to be the case for the OFA directory). (Worth noting, the NGP/VAN files did not have significant time gaps internal to any of the top level directories.)
  • The two (2) second granularity of the time stamps of the CF files prevents making a reliable transfer speed estimate for those files.

Observations

  • The earliest NGP/VAN file has a last modified time of “7/5/2016 18:39:03 EDT”, named newmedia/emails_w_contactinfo.zip.
  • The latest NGP/VAN file has a last modified time of “7/5/2016 18:53:18 EDT”, named eday/VAN-Bellwether Numbers.xlsx.
  • The earliest 2016-07-05 dated CF file has a last modified time of “2016-07-05 17:34:32”, (before adjustment), when viewed on the East Coast, named OFA/NDA-Vendors (2).docx.
  • The latest 2016-07-05 dated CF file has a last modified time of “2016-07-05 17:53:04” (before adjustment), when viewed on the East Coast, named emails/Thumbs.db.

From the above observations, we note that we have to add +1 hour to bring the CF time stamps into the range of the time stamps disclosed in the NGP/VAN archive (for East Coast researchers).  West Coast researchers need to add +4 hours.  After that adjustment the last dated file in the OFA directory found in the CF file collection precedes the earliest NGP/VAN file, by about one minute; it falls into close agreement with the last modified times recorded in the NGP/VAN files.

Those 2016-07-05 dated files are at the very end of the last mod sort of the CF files with a single file after them.  (The times shown are as they appear on the East Coast, before adjustment).

emails/Obama.rtf           1348   2016-07-05 17:53:04
emails/Price Lists.rtf      909   2016-07-05 17:53:04
emails/Template.xlsx      10824   2016-07-05 17:53:04
emails/TEXT.htm            9309   2016-07-05 17:53:04
emails/Thumbs.db           5120   2016-07-05 17:53:04
ngp/db1.mdb            17039360   2016-08-20 23:01:16

As McIntyre notes, all of the cf.7z files have 2 second granularity (typical of FAT-formatted media); this is true for the entire cf.7z  collection, not just the files dated 2016-07-05.  The directories extracted from cf.7z have 0.1 microsecond granularity (typical of NTFS-formatted media) and are dated 2016-10-03.

Mr. McIntyre also notes the following interesting characteristic of the larger cf.7z collection.

Guc2-sm-cf-7z-tweet-3

The Forensicator notes that this single outlier (ngp/db1.mdb) in the cf.7z collection is dated post 2016-07-05; it is dated 2016-08-20.  (The time shown below is relative to the East Coast, before adjustment.)

ngp/db1.mdb 17039360 2016-08-20 23:01:16

This report deals only with the files copied on 2016-07-05 (known as the CF files) — this last modified date matches the last modified date for the files in the NGP/VAN collection previously analyzed by the Forensicator.

Mr. McIntyre followed up with some observations, regarding the last saved time recorded in the internal metadata of a particular file (DonorsByMM.xlsx).

Guc2-sm-cf-7z-tweet-2

As will be explained later, the last modified times of the CF files dated 2016-07-05 need to be advanced by an hour (for researchers on the East Coast, US) so as to fall into the time range of the files present in the NGP/VAN archive.

The fact that the internal last saved time for the NGP/VAN version of this file is one (1) minute later than the CF version of the file raises questions as to whether one/other might be a “doctored” version of the other.  The analysis below addresses that question.

The size of this CF file is slightly larger, same date, rounded up to next 2 second interval.

4445812 2016-07-05 18:51:58.977994500
   ngp-van/DonorAnalysis/DonorsByMM.xlsx
4445841 2016-07-05 18:52:00.000000000
   cf_7z/Donor Research and Prospecting/DonorsByMM.xlsx

Here is a similarly named file with a _2 suffix. They have the same size.

4850650 2016-07-05 18:51:59.179329000
   ngp-van/DonorAnalysis/DonorsByMM_2.xlsx
4850650 2016-07-05 18:52:00.000000000
   cf_7z/Donor Research and Prospecting/DonorsByMM_2.xlsx

A closer look at the two versions of DonorsByMM.xlsx, leads to the following conclusion.

The NGP/VAN version of DonorsMM.xlsx was modified 1 minute later than the CFversion, and Sheet4 was renamed to Pivot.

This seems like a normal editing sequence and Sheet4 is in fact a pivot table.  To confirm this conclusion each worksheet in both spreadsheets was saved into a .csv file and and then the .csv files were compared.  The content of each corresponding worksheet was identical, except for the renaming of Sheet4 to Pivot.  This property difference is shown below.

Guc2-cf-7z-DonorsByMM-cmp

The output of a script that matches files in cf_7z in the 2016-07-05 date range with files in the NGP/VAN archive is shown below.  Only files that matched by name are shown

Guc2-cf-7z-file-name-matches-output

As we can see:

  • There are not that many files (4) which match by name.
  • The only file that is different in content is  DonorsByMM.xlsx
  • For the files that match by name and content, we cannot use their time stamps to determine whether they are new files or not, because their FAT-based times have a wide 2 second range.
  • Two NGP/VAN directories are referenced: DonorAnalysis and finance.  However, those matching (by name) files appear under the directory Donor Research and Prospecting in the CF collection.
  • In the NGP/VAN directories, finance has approximately a 10 second gap ahead of it which is more than sufficient to allow for other (CF) files.
  • The determination that the CF versions of DonorsByMM.xlsx and DonorsByMM_2.xlsx can fit into the NGP/VAN collection just ahead of the DonorAnalysis directory in the NGP/VAN collection is more complex, as shown below.  The result is that there is a sufficient time gap ahead of DonorAnalysis to fit in another directory holding the CF copy of DonorsByMM.xlsx and DonorsByMM_2.xlsx.
Guc2-cf-0_8s-gap

Below, we see situations in the NGP/VAN files, where the simple file name matches and the file is present in two different directories.  The highlighted spreadsheet, FinanceObamaDupes.xls is similar to the DonorsByMM.xlsx spreadsheet in that it has the same author and (internal) file creation time, but different content and a different (internal) “last saved” time.

Guc2-cf-7z-ngp-file-name-matches

Based on the preceding discussion, we will consider the four (4) CF files that match by name to be new content.

CF Files Fit Into NGP/VAN Time Gaps

The following chart demonstrates how the CF files fit into the NGP/VAN collection time line previously analyzed by the Forensicator.

Guc2-cf-ngp-fill-slots

From this chart, we observe:

  • The CF files are in blue and the NGP/VAN files are in green.
  • Generally, only the first and last files in each group are shown, though a few additional files of interest have been added.
  • A blue arrow shows where the CF files fit into time slots in the NGP/VAN collection. The OFA directory in the CF file collection precedes the earliest NGP/VAN file by approximately one minute.
  • As discussed earlier, a few of the CF files match the NGP/VAN files by name. They are shown in this chart as occupying a time slot in the NGP-VAN file time line, because there is room and their position in the time line is consistent with the rounding rules for FAT-based time stamps.

One Hour Time Difference Explained: Were the CF Files First Copied to a USB Drive While in the Central Time Zone?

We turn to the apparent one hour time stamp difference between the CF files (dated 2016-07-05) and the NGP/VAN files disclosed by Guccifer 2 a few weeks earlier. (The NGP/VAN file metadata was subsequently analyzed by the Forensicator.)

The following scenario is proposed to demonstrate how the last modified times in the CF collection ended up with a last modified time that is one hour earlier than the files in the NGP/VAN collection.

The CF files (dated 2016-07-05) were selected from the same larger file collection that the NGP/VAN files were derived from.  The CF files were first copied to a FAT-formatted media (likely a thumb drive) where Central (CDT) time zone settings were in force.  Subsequently, the final 7zip file was built from the data on that thumb drive, at a location where Eastern (EDT) time zone settings were in force.

This scenario is shown in the diagram below and detailed in the text that follows.

NOTE: in the diagram below and the following text, a hypothetical date/time value of “18:51:58.123” is used.  The “.123” is there to indicate that the timestamp has a fractional seconds value.   Technically, it might be more accurately shown as “.1234567”, because NTFS file stamps have 0.1 microsecond (100 nanosecond) resolution.  The fractional part has been shortened to “.123” for illustrative purposes and to avoid cluttering the chart and subsequent discussion.

Guc2-cf-7z-cdt-usb-edt-time-change

Let’s take DonorsByMM.xlsx with a hypothetical last mod time of “2016-07-05 18:51:58.123” before being copied to a FAT-formatted media.

  • We assume that the source data is sitting on an NTFS-formatted media; its time stamp is encoded in UTC.
  • When this file is viewed on the East Coast its time will be displayed as “18:51:58.123”.
  • If that same file’s last mod time is queried in the Central time zone, it will display as “17:51:58.123” – one hour earlier.
  • If, while in the Central time zone, that file is copied from its NTFS-formatted location to a FAT-formatted USB drive, its time will be recorded as “17:52:00” (local). The rounding to the next higher multiple of 2 seconds occurs here.
  • The thumb drive is then transported back to the East Coast and the cf.7z (7zip) file is built there.  The 17:52 (local, EDT) time value will be re-encoded into UTC when the data on the thumb drive is copied back onto the local system’s hard drive (which is formatted as NTFS).  We confirm that this second copy happened by noting that the directories recorded in the 7zip archive are dated 2016-10-03 and have 0.1 microsecond granularity.
  • The file times recorded in the 7zip file will retain the UTC time base.

The scenario described above offers one possible sequence of events that explains how the CF files end up with last modified times that are one (1) hour earlier than those in the NGP/VAN files previously analyzed by the Forensicator.

For East Coast researchers who want to adjust the CF file last modified times so that they fall into the same 18:30 (approx) range as the NGP/VAN files, the following command will shift their times by +1 hour.  The actual increment needed can be changed, based upon the researcher’s time zone.

$ find cf_7z -type f -newermt '2016-07-05'
\! -newermt '2016-07-06' -exec touch -m -r {} -d '+1 hour' {} \;

Transfer Speed Estimates Nixed by Timestamp Inaccuracy

Transfer speed estimates for the 2016-07-05 dated files in the CF collection are complicated by two factors:

  • The (FAT based) last modified times are accurate only to the next higher two (2) seconds.
  • There are significant time gaps internal to various directories (this was not the case for the NGP/VAN collection previously analyzed by the Forensicator).

The following table shows the presence of time gaps in the CF collection. This table demonstrates that there are significant time gaps internal to the ‘OFA’ and ‘Donor Research and Prospecting’ directories.  Based on this observation, the Forensicator concludes that both of those directories were heavily curated and that the files under ‘Donor Research and Prospecting’, in particular, were likely derived from material found in several other directories.

Guc2-cf-7z-gaps

At first glance, the OFA directory looks like a good candidate for use in estimating transfer speeds because it is a fairly large (150 MB) directory.  However, as the table below shows the sources of error will dominate the calculation.

Guc2-cf-7z-ofa-xfer-speed

From the table above, we observe:

  • The time allocated to gaps is 99% of the total elapsed time.
  • With 5 gaps and an average round off error of 2 seconds per gap, we have a total average round off error of 10 seconds.
  • The average round off error (10 seconds) is 5 times the sample value (2 seconds).
  • The sample value (OFA Transfer Time) itself has its own large source of error.

Given the above, we cannot make a reliable estimate of the transfer speed for the files in the OFA directory. The other directories in the CF collection dated 2016-07-05 have similar, if not greater, accuracy issues.  In contrast, the NGP/VAN calculations used millisecond time resolutions: 1,000 times more accurate than the data we are working with in the CF collection and had no significant gaps in the NGP/VAN file timestamps that were internal to a top-level directory.

Clinton Foundation Hack? Probably Not

In closing, we address Guccifer 2’s claim:

So, this is the moment. I hacked the Clinton Foundation server and downloaded hundreds of thousands of docs and donors’ databases.

Given that there are 160 megabytes of files in the CF collection that were (per our analysis) derived from the same source as the NGP/VAN collection (with a few files matching in both name and content), one cannot help but wonder how the same files could be the result of both hacking the DNC’s NGP/VAN server and the Clinton Foundation’s server?