Amazon S3 Adapter

With the Amazon S3 or Amazon Simple Storage Service adapter, a user can access S3 buckets to fetch stored data and send new data to S3 buckets for storing.

This document provides an overview and the functional specifications of the Amazon S3 Adapter. An S3 bucket is a public cloud object-storage offering. In Amazon S3, data is stored as an object, which is the fundamental storage unit. An object is comprised of data and the descriptive metadata of the data. These objects are organized into buckets. Like file folders contain files, these buckets store objects. A single object can be up to 5 terabytes in size. The S3 buckets are located at various geographic regions as served by Amazon Web Services (AWS). The function of an Amazon S3 adapter is to fetch objects or data from S3 buckets and send data to S3 buckets for storing as objects. The S3 adapter needs to be first authenticated by an AWS access key and secret key (based on bucket permission) for gaining access to an S3 bucket. The AWS denotes every geographic region by a region-code. This code needs to be in the exact format to that of AWS’s when specifying a region in the adapter properties.

The Amazon S3 adapter does not support the predefined-schema-import option. An AWS Identity and Access Management (IAM) user can be created in AWS for accessing the S3 services. This user represents a person or application that will interact with AWS. It is not the same as the AWS account root user. For more information, refer: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users.html

Adapter properties and commands

This section provides the details of the adapter properties to be set to define connections and actions, and import schemas in the new web UI; in the HIP. The following table lists their properties, scope, whether they are required or not, and the adapter commands they map to for use in the compiled maps and map executions. “DI” scope means the property is dynamically enumerated - the adapter is invoked during action configuration to provide the set of allowed values for that property. A detailed description of the properties is provided after the table.

Table 1.
Adapter property Scope
  • CO: Connection
  • DI: Discovery
  • IC: Input card
  • OC: Output card
  • G: GET function
  • P: Put function
Required Corresponding adapter command must be used directly in GET and PUT functions
accessKey= access_key CO Yes -AK access_key
secretKey= secret_key CO Yes -SK secret_key
region= region_name CO Yes -R region_name
bucket= bucket_name DI/IC/OC/G/P Yes -B bucket_name
folder= folder_name IC/OC/G/P No -F folder_name
key= object_name IC/OC/G/P Yes -K object_name
contentType= content_type OC/P No -CT content_type
contentLang= content_language OC/P No -CL content_language
encryption= encryption OC/P No -E encryption
logging=info|errors|verbose|off IC/OC/G/P No -T[E|V]?[+]? [file_name]
append_log=true|false IC/OC/G/P No -T[E|V]?[+]? [file_name]
log_file_name=file_name IC/OC/G/P No -T[E|V]?[+]? [file_name]

accessKey= access_key

This property specifies the access key of AWS user for the connection. The access key is used to sign programmatic requests to AWS API calls through the adapter. The access key can be used with -ACCESSKEY or -AK command.

secretKey= secret_key

This property specifies the secret key of AWS user for the connection. Like the user name and password, both the access key ID and secret access key are mandatory to authenticate S3 object for a Get or Put request. The secret key can be passed to the adapter with -SECRETKEY or -SK command.

region= region_name

The adapter uses the region name to connect to an AWS region for the IAM user account. A region can be passed to the adapter using -REGION or -R command.

bucket= bucket_name

This property specifies the bucket name in which an S3 object would reside. During the action configuration, S3 Adapter provides options to select bucket name among available buckets. Also, the bucket property can be passed to adapter input or output cards using -BUCKET or -B command.

folder= folder_name

The folder name specifies the folder where S3 object may reside inside a bucket. This property is optional because an object may directly reside inside a bucket or under folder. When the folder property is not specified, adapter takes the direct path of the bucket to access the object. Otherwise, the folder name is prefixed to the key name. The folder name can be passed to adapter input or output cards using -FOLDER or -F command.

key= object_name

Key specifies the name of the S3 object to be accessed. A key can reside directly in a bucket or a folder inside the bucket. The object name can be passed to adapter input or output cards using -KEY or -K command.

contentType= content_type

The content type field indicates the media type of the entity-body sent to Amazon S3. If the type is not set, adapter takes the default type, which is a standard MIME-type that describes the format of the content. For more information, go to https://www.w3.org/Protocols/rfc1341/4_Content-Type.html.

Type: String

Default: binary/octet-stream

Valid Values: MIME types

Constraints: None

The content type can be used with -CONTENTTYPE or -CT command.

contentLang= content_language

The content language property of Amazon S3 adapter sets the Content-Language HTTP header, which specifies the natural language(s) of the intended audience for the enclosed entity. If no language is set, S3 adapter takes the default value ‘en’. The content-type can be used with -CONTENTLANG or -CL command.

For valid values, refer:https://www.w3schools.com/tags/ref_language_codes.asp.

encryption= encryption

This property specifies the server-side encryption algorithm while encrypting the object using the AWS-managed keys. When the property is not specified, data uploaded to S3 remains unencrypted. Encryption can be passed to adapter input or output cards using -ENCRYPTION or -E command.

Valid Values: AES256, KMS.

AES256: 256 bit – Advanced Encryption Standard is a data or file encryption technique. The result of the process is downloadable in a text file. AES performs all its computations on bytes rather than bits. Hence, AES treats the 256 bits of a plaintext block as 16 bytes. It uses 14 rounds for 256-bit keys. For more information refer https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingServerSideEncryption.html.

KMS: It represents the AWS Key Management Service (AWS KMS). The master keys that the user creates in AWS KMS are used to encrypt or decrypt the object uploaded to S3. Refer the below link for more information on KMS encryption technique: https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html.

logging=info|errors|verbose|off

This property specifies the level of logging to be used for the log (trace) file produced by the adapter. The default is off. The value info means log informational and error messages, the value error means log error messages only, and the value verbose means log debug and trace level messages along with the informational and error messages.

append_log=true|false

This property specifies the flag that indicates the action to take when the specified log file already exists. When set to true, the log messages are appended to the file. When set to false: the file is truncated, and the messages are written to the empty file. The default value is true.

log_file_name=file_name

This is the name of the log file, where the log messages are written. If not specified, the default log file name m4s3.trc is used, and the file is stored to the directory in which the executed compiled map resides.


Examples

Examples of GET map function

In the following example, assume that the adapter being used in a GET function is defined as follows:

GET("S3", " -AK AKSOMEACCESSKEY -SK SkSomeSecretKey/dBLo -R US_EAST_1 -B bucket-for-test -K persons.csv", inputdata)

As the -FOLDER/-F command is not present, the adapter looks for the S3 object directly under the mentioned bucket.

Now, assume the GET map function was defined as follows:

GET("S3", " -AK AKSOMEACCESSKEY -SK SkSomeSecretKey/dBLo -R US_EAST_1 -B bucket-for-test -F testfolder -K persons.csv", inputdata)

In this case, the adapter prepends the folder name to the key name passed, and the object is accessed from the folder residing under the specified bucket.


Examples of PUT map function

In the following example, assume that the adapter is used in a PUT function defined as follows: PUT("S3", " -AK AKSOMEACCESSKEY -SK SkSomeSecretKey/dBLo -R US_EAST_1 -B bucket-for-test -K persons.csv ", inputdata)

As –ENCRYPTION command is not present, the object uploaded to Amazon S3 is stored in its raw format.

Now assume the PUT map function was defined as follows:

PUT("S3", " -AK AKSOMEACCESSKEY -SK SkSomeSecretKey/dBLo -R US_EAST_1 -B bucket-for-test -K persons.csv -ENCRYPTION KMS ", inputdata)

In this case, the adapter validates the algorithm passed to it and uses the same to set object metadata, which eventually stores the data using the opted encryption technique to Amazon S3.