Over the last five years, leaps in graphics and machine learning methods have enabled the creation and distribution of efficient and easy-to-use tools for synthesizing fake media. These tools enable non-expert users to modify or synthesize audiovisual media that is indistinguishable from the media capture of real-world events. Although subtle artifacts may be detected in some cases by experts or by statistical classifiers developed with machine learning, we expect that the march of technical advances will soon make it impossible to distinguish real from fake media. A perfect storm is brewing with the creation and distribution of disinformation via the coupling of increasingly powerful tools for modifying and generating media with the rise and dominance of social media platforms. Tools for media synthesis, coupled with wide-scale distribution of social media threatens to undermine the Fourth Estate of journalism and cause harm to individuals, institutions, and nations. More generally, widespread distribution of fake media has the potential to undermine society’s trust in the veracity of all video, audio, and imagery. Given the rising sea of fake media, what can we do to protect the veracity of media and provide a pathway to trust? We are pursuing an answer via harnessing a set of advances in computer security: We seek to provide users with reliable information about the source and authenticity of a media object via a verifiable, trustworthy media authentication service. Knowing the origin of a media object, and having a certification that the media has not been altered, allows the consumer to rely upon the reputation of the media producer to make informed decisions about the media’s trustworthiness. For example, a media company or publisher can attest that it published a work in accordance with their editorial standards, or a camera in the hands of a reporter, can attest that it recorded a video clip at a certain location and time. The simplest building block for proving provenance is to digitally sign the media object. However, the variety of mechanisms for media distribution, with many of them modifying the media files or streams, means that maintaining digital signatures is difficult.
Building a media authentication service requires solutions to multiple challenges. For example, in typical processes of redistribution and rendering, media content is routinely re-encoded by a content distribution network (CDN). A streaming system (e.g., Netflix) has pre-encoded streams, so one could try to embed authentication information into the media’s metadata; however edits such as screen capture lose this metadata, and thus this content cannot be authenticated properly. Furthermore, to ensure an adequate quality of service (QoS) level, frames may be dropped. Thus, each type of transformation must be tracked throughout the transportation of the media.
We present AMP, a system that seeks a practical solution to the authentication of a media content’s source based on provenance, while accounting for a wide variety of production and distribution scenarios at Internet-scale. The paper provides a design for the AMP system. The design is framed by rising threats to the integrity of news sources, and thus to democracy. Threats to the integrity of sources include the use of a range of techniques, from simple modifications of timing to more sophisticated uses of graphics and generative models, for manipulating or synthesizing audiovisual content that is perceived by consumers as capturing world events.
Approaches to securing media from a validated source to its consumption include (1) strong authentication and (2) fragile watermarking. A complementary approach involves (3) the detection of manipulation or synthesis via pattern recognition employing technical methods, such as machine-learned classifiers. Finally, there are opportunities to explore (4) event-certification methods for certifying that media as captured is linked to actual physical events, rooted in activities that are certified via a combination of methods to a time and place. The approach employed in the AMP system to securing media is based on the joint use of (1) and (2), coupled with the certification of the identity and trust of the source of the media. Strong authentication and fragile watermarking are individually promising for authenticating the provenance and veracity of media, per establishing certification and/or certifiable links to original sources of media as it was captured by reliable sources. We also seek a method that leverages the joint use of strong authentication and watermarking to provide valuable complementarity, per efficiencies and attack vectors.
This paper describes Microsoft’s AMP initiative to develop technologies, a platform, and standards, for providing strong authentication and provenance information for media. AMP is Microsoft’s proposed technical solution to mitigate the negative societal impact of fake/synthetic media, based on certifiable provenance. The AMP effort brings together expertise in security and media, leveraging multiyear efforts in cryptography, watermarking and recently released cloud security and ledger services. The AMP system consists of four main modules including the AMP Registration Service, the Media Provenance Ledger, the Manifest Database, and the AMP Tools. AMP authenticates media using a digitally-signed data structure called a Manifest, and the AMP Service allows content providers to upload their media Manifests to AMP. Manifests are registered in the Media Provenance Ledger, which is a public distributed ledger based on the Confidential Consortium Framework (CCF) [CcfTech, CcfDoc]. Manifest can be distributed together with media contents, whereas the ledger ensures integrity and auditability of the full history of media publishing operations. In addition, manifests are indexed by media fragments in a Manifest Database for fast querying. Once a Manifest or group of Manifests have been uploaded, media players can thus use the AMP Service to validate the authenticity of the corresponding media contents, even if it was distributed without its manifest. A set of tools allows content providers to interact with the AMP Service when the content is published. In addition to the service and tools, media players (browsers, smartphone applications, etc.) need to be extended to check and display provenance information.
Enabling large scale media provenance will need the cooperation of multiple participants, including content producers, publishers, and technology providers. We envision AMP supporting an media provenance consortium with open governance rules, where all the governance operations are transparent and recorded in the AMP ledger, for auditability. We hope that the AMP project can be a starting point for broadly-adopted standards for media provenance verification. We are working toward that end in cooperation with media industry partners.
Ii AMP System Overview
This section provides an overview of the core AMP concepts and how they are combined to form an end-to-end media authentication and verification system. Figure 1 illustrates how the AMP components are integrated into a production, distribution and rendering pipeline.
A content provider uses the AMP Tools to register the media as it is published, so that it can be authenticated by the AMP service. Other organizations such as a CDN or internet service provider (ISP) can similarly record transformations made to the content provider’s original media content, and similarly registers them using the AMP Service. The AMP Service stores the important information in a Manifest, and these Manifests are then stored in the Manifest Database (DB) for fast verification. Finally, a target application such as a browser, a web site, or a media player can then use the AMP Libraries for verifying (i.e., authenticating) that a media item which purports to have been published by a specific content provider was previously registered in the AMP Service by that provider.
The AMP Service integrates the following major functions:
AMP Manifest. The AMP Manifest is the central data structure in AMP. The Manifest authenticates media objects and links publisher-provided metadata. The Manifest also supports derived works through “back-pointers” to one or more source objects, as well as descriptions of how the original works were transformed. Manifests support simple media objects, progressively streamed video and adaptive streaming.
Media Provenance Ledger. Manifests are stored on a public blockchain using CCF. CCF operates the public ledger (i.e., blockchain) of published works, relying on a distributed network of replicas running on trusted hardware and synchronized using Practical Byzantine Fault Tolerance (PBFT) [PBFT] or RAFT [RAFT]. CCF supports the registration of new Manifests and issues Manifest receipts. The receipts complement the producer’s signatures; they enable media consumers to independently verify that the work they receive has been published with the corresponding metadata. CCF also supports online querying and validation of ledger transactions and their endorsing certificates, as well as the governance of a consortium of media producers.
Manifest Database. Eventually we hope that AMP Manifests will be distributed with the media object themselves, but to support a gradual transition, the Manifest Database allows clients to obtain a Manifest for media obtained through other means (e.g., streamed from YouTube).
Fragile Watermarking. In some cases, the media may be manipulated and the Manifest is not stored in the AMP Manifest Service. To handle these scenarios, the content provider can insert a watermark using the AMP Publishing Tools. This watermark stores a unique identifier which can be recovered by the AMP Service. This identifier can then be used to query whether or not the content’s source has previously been authenticated.
AMP Service REST APIs. The AMP Service functionality is exposed through a set of REST APIs.
Client Tools and Libraries. We have developed a set of tools and libraries for interacting with AMP. The tools cover: (a) Manifest creation and ingestion of content/Manifests into the AMP system, (b) querying the AMP system for media authentication information and to check that media objects are intact, and (c) AMP service governance (adding/removing members and users, etc.).
Future Work. Several other system components can facilitate efficient interactions withe the AMP Service including a Provenance Service and a Provenance Browser Extension. We leave these two components as future research efforts.
Provenance Service. This is a web front-end for the client tools. The main purpose is to allow provenance discovery, verification, and analysis of media that has been processed with AMP.
Provenance Browser Extension. This is a browser front-end for the client libraries in the form of a Chrome browser extension. It allows authentication “at a glance” and the ability to query for detailed information.
Initially, AMP has been implemented to run on Linux (Ubuntu LTS 18.04). This decision was motivated, in part, because the CCF framework has been developed and tested on Ubuntu 18.04. The core AMP components were primarily implemented in C# using .NET Core 3.0. This allows us to develop and test AMP using on Linux and later port the existing system to Windows in the future.
Iii AMP Manifests
An AMP Manifest is a data structure that cryptographically authenticates media objects and their associated metadata. Hashes of AMP Manifests are placed on the Media Provenance Ledger (Section IV) and inserted into a complementary AMP Manifest Database (Section VI) which stores the AMP Manifests. The purpose of AMP Manifests and the Manifest Database is to allow media player clients to quickly and easily determine the publisher (or publisher and distributor) of a media object. The values stored in the AMP Manifest data structure are generated by the content provider that publishes the media object.
There are two types of Manifests: Static and Streaming. A Static Manifest contains the hash of its associated media object (e.g. a JPEG) or a collection of objects with different encodings (facsimiles). A Streaming Manifest contains an array of hashes corresponding to “chunks” of the associated media. For example, a chunk might correspond to a few seconds of video or audio.
AMP Manifests can be used to authenticate the original source material or can authenticate transformations from one format to another. Note that checking whether a transformation is faithful is not discussed here.
AMP Manifests are signed by publishers, CDNs, etc. The cryptographic hash of an AMP Manifest is called its AMP Manifest ID (AMID). The AMID serves as a unique identifier for the Manifest. AMIDs are also digitally signed by content producers or distributors, and the AMIDs are recorded on the ledger.
The details of a Static Manifest are provided in Table I. The publisher assigns an ObjectID to identify a particular media object. In addition, the ObjectID is encoded into the media object as a watermark and may also be sent as metadata.
The CodecInfo field contains a string which indicates the media type (e.g., “JPEG”, “MP4”). This field helps to guard against media hashes being wrongly interpreted.
AMP Manifests can also authenticate media objects that are derived from other media objects by means of “back pointers” to one or more source Manifests. These “Transformation Manifests” can be used by publishers or CDNs to record transcoding and re-compressions of source material. Transformation Manifests can also be used to record the original media objects that were edited together to make a composite derived work.
The value of the OriginAMID field includes one or more AMIDs that describe the source media used to create a derived work. If a media object is a simple transcoding of another media object, this will be a single element array. If a media object is created from several source objects (e.g., a news video created from several original media objects) then additional AMIDs can be recorded in the array. Note that OriginAMID is not authoritative on its own: it should only be trusted if the AMID that describes the transform is signed by a trusted authority.
The AMP Manifest includes a Copyright field which can be used to provide the copyright string associated with the media object. This field provides a simple and legally enforceable way of limiting fake or misleading Manifests. Allowed strings may also be dictated in the AMP terms-of-service.
In the simplest case (e.g., a picture or a text file), the Manifest contains the hash of the image or text and its associated metadata in the ObjectHash array field. Optionally, the publisher can create and authenticate more than one encoding of a media object to optimize for client screen resolutions or network conditions. We call these alternate representations facsimiles.
In addition to Manifest fields we have described, we expect additional elements may need to be added, as we identify additional needs from new usage scenarios.
|ObjectID||Publisher-assigned identifier for the media object.|
|CodecInfo||String describing the media type (e.g., “JPEG”, “MP4”).|
|OriginAMID||One or more AMIDs that describe the source media used to create a derived work.|
|Copyright||Copyright string associated with the media object.|
|ObjectHash||Hash of the associated simple media object (or collection of related media objects).|
|ObjectID||This is a publisher-assigned identifier for the media object.|
|CodecInfo||String describing the media type (e.g., “JPEG”, “MP4”).|
|OriginAMID||One or more AMIDs that describe the source media used to create a derived work.|
|Copyright||Copyright string associated with the media object.|
|ChunkInfo||Array of data structures describing chunks of a media stream.|
|ChunkHash||Cryptographic hash of a chunk.|
|ChunkStart||Offset pointing to the start of the chunk.|
|ChunkEnd||Offset pointing to the end of the chunk.|
AMP authenticates media objects with digital signatures. It is straightforward to do this with text and images: we simply hash and then sign picture.jpg or doc.html. Streaming media is more problematic because (a) an application should not have to wait to download the entire file before it can check the signature, (b) streaming services support changing the stream resolution to match network constraints (adaptive streaming), (c) some transport layers are lossy, and (d) users can often navigate back and forth in streams. These issues imply that AMP must authenticate much smaller regions (i.e., “chunks”) in the stream.
Table II provides a description of a Streaming Manifest which is similar to the Static Manifest shown in Table I. While a Static Manifest contains one or more hashes of an image or text document in the ObjectHash field, a Streaming Manifest contains an array of ChunkInfo data structures described in Table III. Each ChunkInfo element contains the cryptographic hash of a chunk, together with an indication of where the chunk starts and ends.
Clients must be able to quickly determine where individual chunks start and end in order to be able to calculate chunk hashes and compare the hashes against the entries in a AMP Manifest. Unfortunately, different media formats and network delivery mechanisms require different chunking strategies.
In one case, the AMP system currently supports file offset-based chunking, which works well for http-GET-based streaming (which is most common on today’s internet). Practically, streaming players process a chunk hash every few seconds. In most scenarios, consecutive chunks delivered to the client will map to consecutive ChunkInfo entries in a single AMP Manifest. However, if a server is dynamically switching streams, then more than one AMP Manifest may be needed to authenticate a stream.
AMP also supports adaptive streaming protocols such as DASH and HLS. Adaptive streaming requires several different encodings of a media object, optimized for different network conditions and client capabilities. Adaptive streams are supported in AMP either by publishing several AMP Manifests authenticating the different encodings, or by using a single AMP Manifest that authenticates multiple facsimiles.
Iv Media Provenance Ledger
AMP utilizes a distributed tamper-proof ledger to ensure the authenticity of the media. Our implementation places several requirements on the ledger service. These requirements are addressed by the Confidential Consortium Framework (CCF) [CcfCode, CcfTech].
CCF is a framework for building permissioned confidential applications [CcfCode]. AMP uses the framework to build a ledger-based application which is designed to store manifests and their hashes securely. Any application built with CCF is designed to be administered by a group of consortium members via CCF’s governance features. AMP also utilizes CCF’s ledger to store these hashes and manifests [CcfTech]. Additionally, AMP utilizes self verifying receipts as proof that data was added to the ledger.
CCF exposes to its users a key-value store. This key-value store provides a simple abstraction of keys being a hash of a manifest and the value being a signature over the hash by a media organization. Once written, these key-value are stored in a Merkle tree, and the Merkle tree is replicated and stored on persistent storage. To ensure that any tampering can be detected, CCF maintains a private key that the service protects and occasionally uses to sign the Merkle root in the distributed ledger.
One of the core features that AMP utilizes from CCF is self-verifying receipts. These receipts are periodically provided by the CCF service and cover all the committed transactions since the last receipt. The receipt validates the request that was sent along with the response that was received, and most importantly, it certifies that this execution was recorded on the ledger. The value of a receipt is that it is possible to validate that a manifest and accompanying signature was successfully added to the ledger, and to do this only the receipt and the public key of the CCF service must be present [CcfTech].
CCF provides a flexible governance model. This allows for AMP to define the governance by writing scripts in the Lua language. These scripts specify rules for actions such as adding new members, adding or removing users, adding and removing nodes from the system, user access control, etc. The specifics of the governance model will be defined as part of the media consortium that controls AMP, and these rules will evolve with time by modifying the governance Lua scripts.
Trust and Integrity
CCF is designed to support two different types of consensus algorithms that are based on two different trust models. The first leverages trusted execution environments (TEEs) and specifically Intel’s SGX. By using this trust model, CCF is able to utilize a variant of RAFT [RAFT] which can handle malicious attacks as long as Intel’s SGX is not compromised. The second variant uses PBFT [PBFT] which is a consensus algorithm that can make progress if less than of the nodes are actively malicious. This distinction means that even if some of the CCF nodes, which are running in a SGX enclave, are compromised, the Media Provenance Ledger will not lose integrity. This added security comes at an increased performance and latency cost when committing data to the ledger.
Critically, both of these consensus protocols offer finality. This property states that once a transaction has been committed, it cannot be reverted. CCF provides proof-of-finality via a self-verifying receipt.
CCF utilizes TEEs to ensure that the operator of the Media Provenance Ledger is not able to perform malicious acts on the service. This is designed to provide the AMP consortium members with confidence that they can run the ledger service in a cloud datacenter, and that an operator (such as Azure) cannot compromise the service’s confidentiality or integrity.
CCF is primarily written in C++ although it allows applications to be written in either C++ or Lua. The CCF framework has been developed and tested on Ubuntu 18.04 [CcfDoc].
V Fragile Watermarking
We have developed fragile watermarking technology and a watermark payload (i.e., the data to be inserted), and we characterize it against benign and adversarial media manipulation. The purpose of watermarking is to modify the media content in an imperceptible way. Faint noise-like patterns are inserted within the media content at production, and they can be read back at rendering. Watermarking is an approach to embed metadata within the content, thus preserving such metadata even when the media is slightly edited via tools that may not preserve header-style metadata.
We propose the use of fragile watermarking techniques using a spread-spectrum approach. Using techniques such as spread-spectrum watermarking, it is possible to add low-level pseudo-random noise patterns within the media payload, be it audio, pictures, or video. The added noise is low enough (comparable to the small distortions due to the compression formats) and can be embedded in a way that makes it imperceptible to human eyes and ears. Such watermarking patterns will not meaningfully affect machine-learning-based analytics performed on the content. The term “fragile” comes from the concept that the watermark will still be detectable after benign edits (typically minor recompression or light cropping), but will not survive major edits on media (for example, the technique of face replacements used commonly in present-day fake media).
For each type of media and application scenario, we can design watermarking parameters that influence the thresholds on allowed changes so that various kinds of minor modifications are considered as benign editing. In addition, we propose the use of keyless watermarking for AMP that simplifies system design and makes watermarking detection open, so it can be performed by any entity in the media distribution path. We also propose the use of soft media hashes within the bit string that is carried by the watermark, so that watermark patterns are not transferrable from one piece of media to another.
Given some derived content, AMP’s client software can automatically extract the ID of some related content originally published by a provider.
The ID itself is assigned by the provider and must resist forgery.
The ID is embedded in the content as an invisible watermark. It is bound to this particular media object (not transferrable to unrelated contents).
The ID can easily be erased; by design, the watermark is fragile.
The original and derived content may differ, so that the watermark is preserved by the distribution of the media. This is best effort. It will exclude some fake media, while for others, it may just provide the original content as input for forensics analysis, or help establish the intent to create fake media.
This client may be embedded in a UI, enabling it to display the provider’s name with a link to the original content, or some warning that trust (and thus veracity) cannot be established in case extraction failed.
The watermarking insertion process transforms contents and embeds a signed watermark before contents publication. It should always succeed.
Extraction is keyless: either it fails, or it returns a GUID that includes the identity of its provider. It may require collateral public information about the provider, such as a certificate that identifies the provider and binds its public verification key. It may also benefit from metadata embedded in the media, but that is not required. The watermarking payload string contains the following: a content ID, a provider ID, and a signature tag. Those are read back by the watermark decoder. The verifier then checks the signatures against the IDs
Watermark decoding is reasonably robust, but depending on the scope of media’s modification, the decoded strings may contain some noise. Therefore, we expect to add provisions for higher robustness on the provider ID field, as well as leverage error-correcting codes for a good balance between payload capacity and decoding robustness.
The audio watermarking code as been implemented in C. This implementation enables efficient porting to many different processing environments.
Vi Manifest Database
Ideally, AMP Manifests are delivered as metadata with media objects. However, if a Manifest is not available, then the client can use the Manifest Database to map a content item or chunk to a suitable Manifest.
The AMP Manifest Database is a public service that lets clients obtain one or more AMP Manifests that authenticate a published or transcoded media object. To perform this function efficiently, the Manifest Database indexes Manifests on (a) the media ObjectID (delivered via the metadata or a watermark), and (b) the media ObjectHash or, in the case of streaming media, the hashes of all of the contained chunks (ChunkHash). Media players can quickly and easily extract or calculate the ObjectID, ObjectHash or a ChunkHash, and then use the Manifest Database to find a matching Manifest.
The Manifest Database can be centralized or distributed. Because authoritative truth is stored in the ledger, the security requirements for the Manifest Database are much less than for the ledger itself. Note that AMP Manifests do not address problems that arise from more than one publisher signing the same original content – either the same simple object or one or more ChunkHashes. Similarly, the AMP Service does not stop a rogue CDN from claiming that one media object is a faithful transformation of an original when in fact it has been maliciously authored. We believe that these issues can be addressed by a combination of client policies (e.g., only consider the oldest Manifest of a media object) and server-side terms-of-service.
A Transformation Service takes one or more media objects and creates a derived object. A CDN is a simple example: CDNs can take a single media object and re-encode the object into several derived objects with different compression parameters to optimize for bandwidth and network losses. AMP Manifests support transformation services by allowing entities to indicate the AMID of one or more source objects that were used to create the derived object.
Note that a Transformation Manifest does not in itself guarantee that a derived object is indeed a high-fidelity transformation of a source object. It is entirely possible that the “purportedly derived” object is unrelated to the stated original. Trust assessments should involve the entity that signed the Transformation Manifest. In the simple case, this might be the original publisher. For example, a media publisher creates a master media object and a dozen copies with different compression factors. A more complex example might be a CDN acting on behalf of the media publisher. Policies can be developed for transitive trust that work for common scenarios. These policies can be enforced with a combination of client- and server-side rules, as well as server-side terms of service. Other entities might create and sign Transformation Manifests. For example, a third-party service might use heuristics to compare the semantic content of two videos and create and sign Transformation Manifests they are semantically identical. Once more, AMP makes no trust assumptions: it is up to clients to use trust policies that are appropriate for a given scenario. In the case of Streaming Manifests, there is no requirement that source-chunks map 1:1 to transformed chunks: chunks are “natural” for each stream.
As noted previously, CCF’s ledger is immutable; once a Manifest is stored on the CCF ledger, it cannot be removed. Therefore when a publisher wants to revoke a Manifest from the ledger, it must insert a revocation object to the ledger. To enable efficient queries, the Manifest Database deletes this Manifest in this case.
Vii Client Tools and Governance
There are two parts to the authoring and management back-end. The first part supports the publishing flow. We have developed tools that create a signed manifest (AMP Manifest Creation Tool, AMP Signing Tool), watermark the media (AMP Watermark Tool) and record the manifest on a ledger (AMP Ledger Insertion Tool). These tools and tool-chains can be used by an ISP, CDN, or another media editing tool to support “authenticated transformations” of an original work, as well as tools that allow authentication information to be added to legacy media (e.g., videos already hosted on YouTube).
The second part of the authoring back-end relates to governance. We use the Microsoft CCF (Confidential Consortium Framework) technology to maintain a ledger of published works and provide a governance model over it. CCF provides a flexible governance model, allowing for a group of members to vote on everything from adding and removing users to updating the CCF service code. We will collaborate with our media partners to create a governance model, and when additional partners join the partnership, we will use CCF to evolve the governance rules as required.
Viii Example Media Publishing Flow
The purpose and operation of the various AMP components is demonstrated by tracing a typical flow of media through the system. The media publishing flow consists of two phases: publishing and playback. We present here how various AMP Service components can be used during the publishing and playback phases.
Assume a content producer generates two media objects: picture.jpg, and video.mp4. The publisher:
Uses ffmpeg to convert video.mp4 into a set of re-compressions, video[n].mp4, at various quality levels (e.g., using DASH).
Uses the AMP Registration Tool to obtain ObjectIDs for the objects to be authenticated.
Uses the AMP Watermarking Tool to insert encoded versions of the ObjectIDs into the picture.jpg and all videos that are to be published.
Uses the AMP Manifest Creation Tool to create a Manifest for the media objects.
Uses the AMP Signing Tool to sign the Manifest with a publisher’s key.
Registers the Manifest hashes with the Media Provenance Ledger with the AMP Ledger Insertion Tool.
Uploads the Manifests to the AMP Manifest Database.
Broadcasts (i.e., stages on a web site, etc.) picture.jpg and video[n].mp4.
Optional step for CDNs, ISPs, etc:
CDNs take video[n].mp4 and picture.jpg and create further derived copies using steps 2 through 8 except that the Manifest refers back to the original AMP Manifest and may use a different PKI.
A client application (e.g., browser, media player, etc.):
Links to the AMP Client Provenance Library, in order to hash a video object into suitable “chunks” or image object (e.g., a JPEG).
Looks for locally cached Manifests that match the identifier (ObjectID or ChunkHash).
Consults a web service to obtain a suitable Manifest or Manifests if a local cached Manifest is not found.
Consults the ledger to ensure that the Manifest is valid (e.g., present in the ledger, not revoked).
Displays the authentication information (simple or more complex information) if the media is authenticated.
Searches for a ObjectID watermark if the Manifest is not in the AMP Database.
Ix Technology Showcases Demos
In this section, we describe two prototypes that we plan to implement to test and showcase the AMP technology.
Media Authentication Web Service. This web service allows a user to upload a video or enter the URL of a publicly available web video (e.g., YouTube.) The service analyzes the video stream and displays detailed the provenance information.
Browser-Based Media Authentication. This showcase uses the same technology but integrates it into an browser extension (e.g., Chrome, Edge). The browser extension allows users to click through to obtain detailed provenance information, but will also display a simple green/yellow/red banner with publisher information when it is available, indicating that the video is fully authenticated, watermarked, or there is clear evidence of tampering.
X Performance Evaluation
Media Provenance Ledger
We begin by measuring the time required to insert a manifest into the Media Provenance Ledger. In this test, we insert small manifests which consist of a hash and a copyright string into the ledger. Manifests do not need to be addressable in CCF since the fact that they are recorded in the ledger is sufficient. To the end, we measure the maximum sustainable rate at which manifests can be submitted.
Application We built a C++ application that customizes the CCF framework to produce a Media Provenance Ledger. The ledger application is small and can be expressed in several hundred lines of C++ code. The following is an example of the data that the ledger application stores:  ”method”: ”LOG_record”, ”params”: ”id”: 0, ”msg”: ”Copyright (c) Microsoft Corporation. All rights reserved. 88c3ba2b25cef698d9ca6775b7fd5c5e8bbc246098a55ad51b8078834c4add44”
Experimental setup We ran the performance application in 3 cluster configurations:
Single Azure Region - Each machine is a Intel(R) Xeon(R) E-2176G CPU @ 3.70GHz and the application runs inside a 4 core virtual machine.
2 Geographically distributed Azure Region - Each machine is a Intel(R) Xeon(R) E-2176G CPU @ 3.70GHz and the application runs inside a 4 core virtual machine. The machines are every distributed between the east USA and west Europe Azure regions.
Emerging hardware - A cluster that is running in our own datacenter. All machines are under the same 40G switch and the machines are running Intel(R) Xeon(R) E-2288G CPU @ 3.70GHz which has 8 cores.
All of these machines are running Ubuntu 18.04 and the results are shown in table IV. We expect that there will be up to 1 billion entries added to the ledger every day, this results in an expected load of 11,575 operations per second. We can conclude from these results that our implementation of the Media Provenance Ledger can comfortably handle this load. Even with just a few nodes, we can achieve latencies that are low enough to not interfere with the user’s experience in media consumption.
|Configuration 1||Throughput (tx/s)||Avg. latency (ms)|
|Configuration 2||Throughput (tx/s)||Avg. latency (ms)|
|Configuration 3||Throughput (tx/s)||Avg. latency (ms)|
Next, we estimate the maximum scale requirements for the AMP Service assuming the following parameters:
100 10-minute original video clips uploaded each day
The video is hashed into 10 second chunks (10 mins is 60 chunks)
Each original video is transformed into 100-1 variants by the CDN
Using these parameters, this translates into
3.7 million original videos/year
370 million original and transformed videos/year
220 million original chunks/year
22 billion total chunks/year
Since the AMP Service is independent of the CCF nodes, we can use large-scale VMs for implementing the index. If the index is a 32-byte hash and 32 bytes of other data, the total index size for all known chunks is 1.4 TBytes. At the moment, the largest VMs in Azure are 16 GBytes RAM and 16 GBytes SSD. As a result, the index will fit in one VM without sharding. If the AMP Service exceeds these estimates, we can shard the index. Scaling through sharding is easy: the indices are hashes so they will be uniformly distributed. Therefore, we believe that it will be practical to have the Manifest Service index on ChunkHashes.
Xi AMP Partnership and Standards Strategy
We believe that the proposed AMP media provenance certification and verification system can only be successful if it becomes a widely adopted industry standard. We hope to form a partnership with media organizations and additional technology providers in the near future. We plan to put this collaboration on a formal footing through the formation of an industry alliance similar to the Alliance for Open Media. Other companies can join, either as active contributors or supporters. Such a model can move quickly for ratification of a detailed design, with the goal of developing reference code and performing sufficient testing to assess the efficiency and performance of the proposed provenance certification and verification system. Once such an alliance develops more detailed designs, the specifications can be brought to formal media standards committees, such as the ITU or MPEG. That would allow additional improvements to the system design, as appropriate, and would strengthen the goal of fostering industry adoption. A key goal of such an effort should be to promote the development of an open, royalty-free standard, with focus on interoperability. We believe that an open standard will motivate faster adoption. We also believe that increasing trust in media will benefit the business models of all bona fide industry entities involved in the creation and distribution of media.
Currently, AMP does not include any components for detecting fake media. If successfully adopted, it will take a number of years before Manifests for a large percentage of online media are stored in the ledger. However, we believe that fake media will rapidly improve and become more widely encountered before this time. Therefore, additional fake media detection algorithms will need to incorporated into the media processing pipeline in the near future. A number of academic and industry efforts are currently underway to improve the detection of deep fakes. We see this work and orthogonal to the provenance solution proposed by AMP, and these detection methods can also be included in the future as part of the the AMP Service.
It is import to note that the purpose of AMP is to authenticate that a media item was published by a known source. AMP is not a digital rights management (DRM) system that is designed to enforce copyright of the media content providers. Media provenance and AMP are about verifying the producing entity, not verifying/tracking/authorizing the consuming entity.
Xiii Related Work
Previous related research to the AMP system and effort span three main areas including previously proposed provenance systems, deep fake detection, and content generation.
Provenance Systems. Provenance systems for the prevention of deep fakes is a new and relatively understudied area. The provenance-based system that is most closely related to AMP was recently proposed by Hasan [Hasan19]. Like AMP, this system also employs blockchain. However, it is based on the Ethereum blockchain and smart contracts. Since AMP utilizes CCF, it is much more efficient, allowing the speedup of manifest insertion and queries by several orders of magnitude which is required for widespread deployment.
In addition to [Hasan19], several startups have proposed provenance-based systems including: Amber and Witness. Amber’s technology [amber, Newman19] is aimed at camera manufacturers and adds a cryptographic hash to the video at a user specified rate. Similar to AMP, these hashes are then stored on an Ethereum blockchain.
Similarly, Truepic [Truepic] also provides a photo and verification service where the cryptographic signature is written to a blockchain.
Witness is a non-governmental organization which aims to help ensure that human rights abuses can be documented in a verifiable manner. Witness published the ProofMode Android application [ProofMode] in 2017 which stores metadata about images and videos taken by those seeking to provide evidence of human rights abuses. The app includes a hash of the media and its metadata along with a cryptographic signature that helps to ensure the chain of custody.
Deep Fake Detection. Deep fake detection is an alternate method to provenance solutions and rely on the algorithmic detection of synthetically generated media. A number of deep fake detection algorithms have been proposed in the literature.
In [Li18_Oculi], Li et al. the authors describe their recognition that deep fake videos which had been created prior the paper’s publication 2018 often had eyes which failed to blink, which is natural for humans. Thus, they created a eye blink detector and used it as a proxy to detect deep fake videos.
McCloskey and Albright [McCloskey18]
noted that GANs fail to accurately reproduce colors that are captured naturally by photosensitive cells in a camera’s sensor. Their approach to detecting deep fakes is to train a convolutional neural network (CNN) to detect this mismatch in the color.
Face warping artifacts can be introduced during the generation of deep fake videos. Li and Lyu trained a CNN to detect these artifacts to detect some types of deep fake attacks in [Li19_FaceWarping]. Similarly, Yang et al. [Yang19] also trained a CNN to detect inconsistencies in head poses.
In [Korshunov18], Korshunov and Marcel explore trying to jointly use the audio and video, but there experiments indicated that adding the audio did not help.
In the FaceForensics++ system proposed by Rössler et al. [roessler2019faceforensics++]
, the Xception computer vision object recognition model which also employes CNNs were also used to various types of deep fakes. A leader board of deep fake detection algorithms on the FaceForensics++ dataset can be found at[FFLeader]. videos which were generated by different methods.
Content Generation. Recent research in GANs has enabled talking head models to be quickly adapted with just a few frames [Zakarov19].
Provenance Partnerships. Several other partnerships have been created to ensure the provenance of media. The New York Times Company is working with IBM on The News Provenance Project [NewsProv]. This collaboration is also using a blockchain to provide a provenance solution for media.
The Content Authenticity Initiative is a second partnership with Adobe, The New York Times Company and Twitter [CAI].
Increasingly sophisticated methods for synthesizing media, coupled with the wide reach of social media, has become a threat to private and public institutions as well as to individuals. Fake media has the potential to significantly undermine trust in media and journalism, threatening the foundations of democracy. We will not be able to rely on algorithmic detection of deep fake media in the long-term. Provenance solutions will be required to properly authenticate the source and veracity of media.
To this end, we have proposed and constructed a prototype of the AMP system. AMP allows trusted content providers to form one or more consortiums that allow applications such as a browser to provide an indication to users that the content they are viewing has been verified to come from the purported source. Beyond the core security pipeline, human factors and design will play an important role. Inspired by the TLS lock icon, indicators can be provided by the browser or media player to alert users that the transmitted content can be traced back to its original source.
In order for a provenance solution to be successfully adopted, such a system must be formally adopted by a recognized standards body. We are seeking such coordination and adoption for the AMP system or a variant that provides similar functionality. We also believe that it is important to open source the code for a widely used provenance systems. We plan to open source the AMP system in near future to facilitate its widespread adoption.