For more information check the AWS CLI user guide or call the command-line help (here for the cp command): aws s3 cp help Once the AWS CLI is installed, the command to copy a file to your local machine is: aws s3 cp s3://commoncrawl/path_to_file You may first look at the data e.g, to list all WARC files of a specific segment of the April 2018 crawl: > aws s3 ls s3://commoncrawl/crawl-data/CC-MAIN-2018-17/segments/1524125937193.1/warc/ 10:27:49 931210633 10:28:32 935833042 10:29:51 940140704 The command to download the first file in the listing is: aws s3 cp s3://commoncrawl/crawl-data/CC-MAIN-2018-17/segments/1524125937193.1/warc/ The AWS CLI supports recursive copying, and allows for pattern–based inclusion/exclusion of files. Please see our blog announcement for more information. Please note, access to data from the Amazon cloud using the S3 API is only allowed for authenticated users. Please follow the installation instructions. It’s easy to install on most operating systems (Windows, macOS, Linux). The AWS Command Line Interface can be used to access the data from anywhere (including EC2).
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |