Amazon AWS S3 Integration (Old version)

Integrate with AWS S3 to perform automated content classification on your data buckets

Amazon AWS S3 Integration

This integration is obsolete.
Please use the new Amazon AWS S3 Integration for better security.

About the Amazon AWS S3 Integration 

What it does: 

  • Performs content scanning on objects in an S3 Bucket to detect and map the types of data that are stored inside of it.
  • The integration supports scanning and identifying data types inside many different file and document types. For more information, see below.

Before setting up this integration:

  • Be sure to add Amazon S3 to your Inventory. To learn how to add systems to your Inventory, click here.
  • Make sure your MineOS plan supports automatic integrations.

How to set up

On the system side:

  1. Log into your AWS account
  2. Go to IAM -> Users -> Add New Users
  3. Type "mine-os" as the name (or any other name you prefer), select "Access Key" and click Next
  4. Select "Attach existing policies directly" and type s3 on the search.
  5. Select AmazonS3ReadOnlyAccess and click Next
  6. Leave the tags page empty and click Next
  7. Click Create User
  8. Copy the Access Key ID and Secret access key from this page to MineOS

Notes:

  • The AWS Multiple buckets integration (using regex) requires the s3:ListAllMyBuckets permission.
  • Encrypted buckets require the kms:Decrypt permission.

On your Privacy Portal: 

  1. Head to your Data Inventory and select Amazon AWS S3
  2. Scroll down to the component titled “Request handling”
  3. Select “Handle this data source in privacy requests”
  4. Select “Integration” as the handling style (see image below).
  5. Paste the Access Key ID and Secret Access Key into the designated fields
  6. Under Bucket Regex type a bucket name or regular expressions to match the bucket names you want to scan. Here are a few examples for using regular expressions for scanning multiple buckets:
    1. .+ will scan all the buckets the user account has access to.
    2. prod-.+ will scan all the buckets that start with "prod-".
    3. .+-data will scan all the buckets that end with "-data".

  7. Click "Test your integration" so Mine can verify your settings and save them. Once you do, the bucket names that match the regex will appear. Verify the buckets are as you intended:
    Screenshot 2023-08-01 at 13.26.50
  8. If successful, click "Test & save" to enable the integration. 

If you would like to add more buckets or more regexes, click the "+ Create Instance" link at the bottom and type in another bucket name or a regex. You can reuse the same Key secret & ID.

 

Troubleshooting

Getting an error "Failed to run ListBucketsAsync": This is a permissions or authentication issue usually caused by one of the following reasons:

  1. Wrong Account Key ID: Using an account ID that is wrong, contains extra characters (such as space) or one that is inactive or has been deleted will result with this error. Also, double check you typed the account secret in the account secret field, and the account ID in the account ID field (and did not confuse between them). For more information refer to this article: https://repost.aws/knowledge-center/s3-access-key-error
  2. Missing permissions: You should go through the guide and make sure all required permissions listed were granted to the account. You can use the IAM Policy Simulator to check which permissions the account has.

 

Supported File Types

Mine's content classification supports the following file types by extracting text from the files and performing classification:

  1. Apache Avro (.avro) - There are limits on maximum block size, file size, number of columns etc.
  2. Apache Parquet (.parquet)
  3. .csv .tsv
  4. PDF - File size limit: 30MB
  5. Textual files
  6. Microsoft Word - File size limit: 30MB
  7. Microsoft Excel - File size limit: 30MB
  8. Microsoft Powerpoint - File size limit: 30MB
Note: Encrypted buckets are supported.
Other file types not listed are not supported, including:
  1. Archives - are not supported.
  2. Image files (with OCR) - not yet supported, although it is planned.

Limitations

  1. "Requestor pays" buckets are not supported.
  2. Buckets with Glacier storage class are not supported.
  3. Compressed objects (gzip) are not currently supported.

 

Talk to us if you need any help with integrations via our chat or at portal@saymine.com, and we'll be happy to assist!🙂