Sharing access to data¶
By default the contents of private buckets can only be accessed by authenticated members of your project. This section explains how to manage access by either sharing a link to your data or by setting more granular access restrictions than just "private" (only project members can view or edit) or "public". The data in LUMI-O can be shared in a variety of ways:
-
Sharing a link to data: All public objects can be accessed with a link, but a link can also be used to grant access to a private object. This can be done with a presigned URL that is a temporary signed link that anyone (who has the link) can use. A presigned URL can be useful in cases when the data needs to be accessed over the internet without credentials, but is not supposed to remain publicly accessible.
-
Granting access to other LUMI projects: With access control lists (ACLs) or Policies it's possible to share your data in a limited manner to other projects. You can e.g. grant a collaboration project authenticated read access to your datasets.
-
Making the bucket or object public: It's possible to use ACLs or Policies to make individual buckets or objects publicly accessible. One can set e.g. public read access to a specific object in an otherwise private bucket.
With Policies it's also possible to e.g. restrict object access to specific IP:s.
Sharing a link to data¶
For public data¶
If the data is uploaded in a public rclone endpoint, or otherwise made public (e.g. with ACLs), it can be accessed over internet with a link. E.g. the data of an example project 465000001 stored in an object objectname
in a bucket bucketname
could be accessed with the link:
Read-only presigned URLs¶
Presigned URLs are URLs generated by the user which grant time-limited "public" access to an object. It's also possible to generate an URL which allows time-limited upload for a specific object (key) in a bucket.
You can generate a presigned url using e.g. s3cmd:
For example command:
generates an URL that remains valid for 3600 s (1h). To give access until a specific date, it needs to be expressed in Unix epoch time:
To get the required unix epoch time, it's possible to use online calculators.
You can also define the expiration time of the link by adding the desired duration to the current time:
Irregardless of the set expiry time, presigned urls will become invalid when the access key used for the signing expires.
It's also possible to use e.g. an aws
command to create a presigned URL:
Writable presigned URLs¶
There is no way to create presigned urls for PUT
operations
using either s3cmd
or aws
. Below is a short example script
using boto3 to generate a valid url that can be then used to add an object called file.txt
to the defined bucket.
presign.py
import boto3
import argparse
def generate_presigned_url(s3_client, client_method, method_parameters, expires_in):
try:
url = s3_client.generate_presigned_url(
ClientMethod=client_method, Params=method_parameters, ExpiresIn=expires_in
)
except:
print("Couldn't get a presigned URL")
raise
return url
def usage_demo():
parser = argparse.ArgumentParser()
parser.add_argument("bucket", help="The name of the bucket.")
parser.add_argument("key", help="The name of the bucket")
args = parser.parse_args()
s3_client = boto3.client("s3")
client_action = "put_object"
url = generate_presigned_url(
s3_client, client_action, {"Bucket": args.bucket, "Key": args.key}, 1000
)
print(f"Generated put_object url: {url}")
if __name__ == "__main__":
usage_demo()
Granular Access management¶
The examples here assume that you have properly configured the tools to use LUMI-O, otherwise they will usually default to using amazon aws s3.
See also LUMI-O vs Amazon s3
Note that some advanced operations which are supported by AWS will complete successfully when run against LUMI-O, e.g object locks, but will actually have no effect. Unless it is explicitly stated that a feature is provided by LUMI-O, assume that it will not work and be extra thorough in verifying correct functionality.
Warning
Be very careful when configuring and updating access to buckets and objects.
It's possible to lock yourself out from your own data, or alternatively make
objects visible to the whole world. In the former case, data recovery might not be possible
and your data could be permanently lost.
ACLs vs Policies¶
There are two ways to manage access to data in LUMI-O:
- Access control list (ACL)
- Policies
While ACLs are simpler to configure, they are an older method for access control and offer much less granular control over permissions.
Some other differences include:
- ACLs can only be used to allow more access, not restrict access from the defaults
- ACLs can be applied to buckets or objects, while policies can only be applied to buckets
- You can create ACLs which only affect specific objects in the bucket.
- This also means that you will have to individually/recursively apply ACL changes to all objects in a bucket + the bucket itself.
Configuring Access control lists (ACLs):¶
You can apply ACL:s to buckets or individual objects.
Important
ACL:s are not inherited, e.g new objects created in a bucket with an ACL will not have any ACLs. By default created objects are private (unless you have created a policy changing this and applied it to a bucket ).
Checking existing ACLs¶
To view existing ACLs of buckets or objects you can use
or
aws s3api get-bucket-acl --bucket <bucket_name>
aws s3api get-object-acl --bucket <bucket_name> --key <object_name>
Important
Always after modifying ACLs it's good to verify that the intended effect was achieved.
I.e check that things which should be private are private and that public objects
and buckets are accessible without authentication. Public buckets/objects are available using the url
https://<proj_id>.lumidata.eu/<bucket>/<object>
*, use e.g wget
, curl
or a browser to check the access permissions.
*) For accessing shared content by another project, see accessing shared buckets/objects
Granting public access¶
Would make all the objects in the bucket readable by everyone. The corresponding operation usingaws s3api
:
aws s3api put-bucket-acl --acl public-read --bucket <bucket_name>
aws s3api put-object-acl --acl public-read --bucket <bucket_name> --key <object_name>
--recursive
option.
The commands:
or Would make the bucket but not the object readable for the world → Only possible to list the objects but not download them. The inverse situation where the bucket is not readable but the objects are is similar to a UNIX directory with only executable permissions and no read permissions. I.e file/object can be retrieved from the directory/bucket, but it's not possible to list the content.To remove the public access you would run:
or
aws s3api put-bucket-acl --acl private --bucket <bucket_name>
aws s3api put-object-acl --acl private --bucket <bucket_name> --key <object_name>
put-object-acl
has to be run separately for each object.
Granting access to a specific project¶
Would grant read access to all objects in the <bucket_name>
bucket for the <proj_id>
project.
The single quotes are important as otherwise the shell might interpret $<proj_id>
as an (empty) variable
The corresponding command for aws s3api
would be:
aws s3api put-bucket-acl --bucket <bucket_name> --grant-read id='<proj_id>$<proj_id>'
aws s3api put-object-acl --grant-read id='<proj_id>$<proj_id>' --bucket <bucket_name> --key <object_name>
The public rclone remotes configured by lumio-conf use acl settings to make
created objects and buckets public, and the same goes for s3cmd put -P
So if you need to "unpublish" or "publish" some data you can use the above commands
Warning
Permissions granted with --acl-grant
are not revoked automatically when running --acl-private
and they have to be explicitly removed with --acl-revoke
Only for authenticated users¶
The aws
cli has a larger selection of acl settings than s3cmd
, e.g
Can be used to grant read-only access to all authenticated users of LUMI-O. Useful if data is semi-public but for some reason or another only people with lumi access. Note here that we are only granting read access to the bucket itself not any of the objects.
See the s3cmd documentation and aws s3api documentation for a full list of ACLs.
Configuring Policies¶
You can apply policies to a bucket using s3cmd
or aws
commands:
or
You can list the existing polices on a bucket with:
or
The following example policy would allow the project 465000002
to:
- Download the object
out.json
from our bucket calledfortheauthenticated
- List all objects in the
fortheauthenticated
bucket - Create/modify (by overwriting) to the
upload.json
object in thefortheauthenticated
bucket
The critical part is the format of the Principal which is of the format
The full policy:
policy.json
{
"Version": "2012-10-17",
"Statement": [
{
"Action": ["s3:GetObject"],
"Effect": "Allow",
"Resource": "arn:aws:s3:::fortheauthenticated/out.json",
"Principal": {
"AWS": [
"arn:aws:iam::465000002:user/465000002"
]
}
},
{
"Action": ["s3:ListBucket"],
"Effect": "Allow",
"Resource": "arn:aws:s3:::fortheauthenticated",
"Principal": {
"AWS": [
"arn:aws:iam::465000002:user/465000002"
]
}
},
{
"Action": ["s3:PutObject"],
"Effect": "Allow",
"Resource": "arn:aws:s3:::fortheauthenticated/upload.json",
"Principal": {
"AWS": [
"arn:aws:iam::465000002:user/465000002"
]
}
}
]
}
Another potentially useful policy is a restriction on incoming IP:s
{
"Statement": [
{
"Sid": "IPAllow",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::almostpublic/data*" ,
"Condition":
{
"IpAddress": {"aws:SourceIp": "193.167.209.166"}
}
}
]
}
This would allow any one connecting from "lumi-uan04.csc.fi" to upload
objects starting with data
to the bucket called almostpublic
(but not download or list them).
Warning
IP restrictions should never be the only measure to protect your data. Especially if there are multiple users on the system. Source IPs can also be spoofed.
For a full list of Actions and resources see the AWS documentation
Don't use an action which you do not understand
To remove policies you can do:
or
Accessing shared buckets/objects¶
The authentication information used when interacting with LUMI-O partially defines the scope for buckets.
Public buckets/objects for a project are located under
https://<proj_id>.lumidata.eu/<bucket>/<object>
But making the request to the same url while authenticated will
try to fetch <bucket>
from your own project not proj_id
.
Instead the format https://lumidata.eu/<proj_id>:<bucket>/<object>
must be used.
For public objects the above two URLs are equivalent. Note that the authorization header of any request is checked before any access rules are verified -> using invalid credentials will lead to an access denied even for public objects.
Due to the format of the URL, currently there is no known way to use boto3 or aws
cli
to interact with data which is specifically shared with your project.
s3cmd and rclone
To access buckets and subsequently objects not owned by the authenticated project:
Where465000001
would be your own project you have configured authentication for
and <proj_id>
is the numerical project id for the other project.
Curl
Don't use curl unless you have to, main point here is that the project id owning the bucket has to be included with the bucket and object name when generating the signature.
object=README.md
bucket=BucketName
project=465000001
resource="/$project:$bucket/$object"
endPoint=https://lumidata.eu$resource
contentType="text/plain"
dateValue=`date -R`
stringToSign="GET\n\n${contentType}\n${dateValue}\n${resource}"
s3Key=$S3_ACCESS_KEY_ID
s3Secret=$S3_SECRET_ACCESS_KEY
signature=`echo -en ${stringToSign} | openssl sha1 -hmac ${s3Secret} -binary | base64`
curl -X GET -s -o out.tmp -w "%{http_code}" \
-H "Host: https://lumidata.eu/" \
-H "Date: ${dateValue}" \
-H "Content-Type: ${contentType}" \
-H "Authorization: AWS ${s3Key}:${signature}" \
$endPoint