Copy data from an AWS S3 Bucket to Azure Blob Storage using Azure Data Factory Pipelines
    • Dark
      Light
    • PDF

    Copy data from an AWS S3 Bucket to Azure Blob Storage using Azure Data Factory Pipelines

    • Dark
      Light
    • PDF

    Article Summary

    #ServerlessTips - Azure Data Factory
    Author: Dave McCollough, CTO, Geokey

    This article will move data from an AWS S3 Bucket to Azure Blob Storage using Azure Data Factory Pipelines.

    Prerequisites

    • Active Azure Subscription. If you don’t have a subscription, you can sign up for a free one here.
    • Data that resides in an AWS S3 Bucket.
    • Azure Data Factory Instance. You can learn how if you’ve never created an Azure Data Factory instance

    Configure the Pipeline

    1. Open Azure Data Factory Studio

    2. Select Author from the side navigation bar


    Graphical user interface, application  Description automatically generated


    3. Click the ellipsis next to Pipelines and select New Pipeline


    Graphical user interface, application  Description automatically generated


    4. Expand the Move & transform section


    Table  Description automatically generated


    5. Click and drag Copy data into the visual editor


    Graphical user interface, text, application  Description automatically generated


    6. Rename the pipeline. For this example, we’re importing food data.


    Graphical user interface  Description automatically generated


    7. Select Source and click + New.


    Graphical user interface, application  Description automatically generated


    8. Select Amazon S3 and click the Continue button.


    Graphical user interface, application, Teams  Description automatically generated

    9. Select the format of the data you are ingesting and click the Continue button.
    For this example, we are using CSV/DelimitedText.


    Graphical user interface, application, Teams  Description automatically generated


    10. Enter a Name and click the Linked service dropdown.
    Select New.


    Graphical user interface, application  Description automatically generated


    11. Enter the Name of your linked service
    Enter the Access key ID of your AWS user.
    Enter the Secret access key of your AWS user.
          Click Test connection to verify the successful connection.
    Click the Create button.

    Graphical user interface, text, application  Description automatically generated


    12. Your linked service will be created.

    13. Click the icon to browse your AWS S3 Bucket file.

    Graphical user interface, text, application, email  Description automatically generated


    14. Select the file and click the OK button.

    Graphical user interface, text, application  Description automatically generated


    15. If the first row of your file is the header row, click the checkbox for the First row as the header
    Click the OK button.

    Graphical user interface, text, application, email  Description automatically generated


    16. Select Sink and click + New

    Graphical user interface  Description automatically generated with medium confidence


    17. In this step, we define where the data is copied. For this example, we are using Azure Blob Storage.
          Click the Continue button.

    Graphical user interface, application, Teams  Description automatically generated


    18. Select the format type of your data. For this example, we will use CSV/DelimitedText.
    Click the Continue button.

    Graphical user interface, application, Teams  Description automatically generated


    19. Enter a Name and click the Linked service dropdown.
    Select New if you do not have an existing linked service.

    Graphical user interface, application, email  Description automatically generated

    20. Enter the appropriate Azure Storage Account information
    Click Test connection to verify the successful connection.
    Click the Create button.

    Graphical user interface, text, application, email  Description automatically generated


    21. Your linked service will be created.

    22. Click the browse icon to browse to the appropriate folder.

    Graphical user interface, text, application  Description automatically generated


    23. Select the appropriate location and click the OK button.

    Graphical user interface  Description automatically generated with medium confidence


    24. If the first row of your file is the header row, click the checkbox for the First row as the header
    Click the OK button.

    Graphical user interface, text, application, email  Description automatically generated


    25. Once the Source and Sink are configured, click the Publish all button.


    26. Click the Publish button.

    Table  Description automatically generated


    27. When publishing is completed, you will receive a notification.


    28. To manually run your pipeline, Add a trigger and select Trigger now

    Graphical user interface, application  Description automatically generated



    29. You will be notified when the pipeline has been completed.

    30. You can browse your blog storage account and view the file copied from your AWS S3 Bucket.



    31. Click Add trigger and New/Edit to schedule this pipeline.

    Graphical user interface, application  Description automatically generated


    32. Select + New from the Choose trigger dropdown.
    Graphical user interface, application  Description automatically generated with medium confidence

    33. Configure the options for your new trigger, including:
           Name
           Type
           Start Date
           Time zone
           Recurrence
           Click the OK button

    Graphical user interface, text, application, email  Description automatically generated


    34. Click the Publish all button to start your scheduled trigger.


    35. Click the Publish button

    Text, table  Description automatically generated


    36. This will schedule your pipeline based on the parameters configured in step 33.

    Summary

    In this article, we configured Azure Data Factory pipelines to copy data from an AWS S3 Bucket to Azure Blob Storage, triggered manually and via a schedule.


    Was this article helpful?