AWS Learning
Storage

AWS DataSync

Data Migration, Bulk Transfer, On-premises to AWS Sync

Tổng quan

AWS DataSync là dịch vụ data transfer giúp di chuyển large amounts of data giữa on-premises và AWS, hoặc giữa các AWS storage services.

┌─────────────────────────────────────────────────────────────────┐
│              AWS DATASYNC                                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  On-premises                     AWS                            │
│  ┌─────────────┐                ┌─────────────┐                 │
│  │  NFS/SMB    │                │     S3      │                 │
│  │  Storage    │ ──────────────►│     EFS     │                 │
│  └──────┬──────┘    DataSync    │     FSx     │                 │
│         │                       └─────────────┘                 │
│    ┌────┴────┐                                                  │
│    │ DataSync│  ← Agent chạy trên VM                            │
│    │  Agent  │                                                  │
│    └─────────┘                                                  │
│                                                                 │
│  ⚡ Up to 10 Gbps transfer speed                                │
│  🔄 Automatic retry, verification                               │
│  📊 Bandwidth throttling                                        │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Key benefits:

  • Nhanh - Up to 10x faster than open-source tools
  • 🔐 Secure - TLS encryption in-transit
  • Reliable - Automatic integrity verification
  • 📊 Controlled - Bandwidth limiting, scheduling

Cách hoạt động

┌─────────────────────────────────────────────────────────────────┐
│              DATASYNC WORKFLOW                                  │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  1. Deploy Agent (nếu on-premises)                              │
│     ┌─────────────┐                                             │
│     │  VMware/    │                                             │
│     │  Hyper-V/   │  ← Download OVA/VHD từ AWS                  │
│     │  KVM/EC2    │                                             │
│     └──────┬──────┘                                             │
│           │                                                     │
│  2. Create Locations (Source + Destination)                     │
│           │                                                     │
│            ▼                                                    │
│     ┌─────────────┐         ┌─────────────┐                     │
│     │   Source    │ ──────► │ Destination │                     │
│     │  Location   │         │  Location   │                     │
│     └─────────────┘         └─────────────┘                     │
│           │                                                     │
│  3. Create Task (defines what/how to transfer)                  │
│           │                                                     │
│            ▼                                                    │
│     ┌─────────────┐                                             │
│     │    Task     │ ← Schedule, filters, options                │
│     └──────┬──────┘                                             │
│           │                                                     │
│  4. Run Task (manual or scheduled)                              │
│           │                                                     │
│            ▼                                                    │
│     ┌─────────────┐                                             │
│     │  Transfer   │ ← Monitor progress in Console               │
│     │  Execution  │                                             │
│     └─────────────┘                                             │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Source và Destination

Supported Locations

Location TypeAs SourceAs Destination
NFS (on-prem)
SMB (on-prem)
HDFS (on-prem)
S3
EFS
FSx for Windows
FSx for Lustre
FSx for OpenZFS
FSx for NetApp ONTAP
Other cloud (Google, Azure)

Transfer Scenarios

┌─────────────────────────────────────────────────────────────────┐
│              COMMON TRANSFER PATTERNS                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  1. On-prem NFS/SMB ──► S3/EFS/FSx (Migration)                  │
│                                                                 │
│  2. S3 ──► EFS (Transform object to file)                       │
│                                                                 │
│  3. EFS ──► EFS (Cross-region replication)                      │
│                                                                 │
│  4. FSx ──► S3 (Backup/Archive)                                 │
│                                                                 │
│  5. Google Cloud Storage ──► S3 (Cloud migration)               │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

DataSync Agent

Khi nào cần Agent?

ScenarioAgent cần?
On-premises NFS/SMB → AWS✅ Required
HDFS → AWS✅ Required
AWS → AWS (S3, EFS, FSx)❌ No agent
Other cloud → AWS❌ No agent (API-based)

Agent Deployment

On-premises:
┌─────────────────────────────────────────────────────────────────┐
│  Download Agent VM image from AWS Console:                      │
│  • VMware ESXi (.ova)                                           │
│  • Microsoft Hyper-V (.vhd)                                     │
│  • KVM (.qcow2)                                                 │
│                                                                 │
│  Requirements:                                                  │
│  • 4 vCPUs, 32 GB RAM                                           │
│  • 80 GB disk                                                   │
│  • Network access to AWS (port 443)                             │
│  • Network access to source storage (NFS/SMB)                   │
└─────────────────────────────────────────────────────────────────┘

AWS (for self-managed storage on EC2):
┌─────────────────────────────────────────────────────────────────┐
│  Launch EC2 instance with DataSync Agent AMI                    │
│  • Use for EC2 instances running NFS/SMB servers                │
└─────────────────────────────────────────────────────────────────┘

Agent ↔ AWS Communication

Agent ──────► AWS DataSync Service (port 443, TLS)

   ├── Control channel (task instructions)
   └── Data channel (encrypted data transfer)
       
⚠️ Agent KHÔNG store data, chỉ transfer

Task và Transfer Options

Task Configuration

Task:
  Name: "daily-backup-to-s3"
  SourceLocation: "nfs://10.0.1.100/data"
  DestinationLocation: "s3://my-backup-bucket/data"
  
  Options:
    # What to transfer
    TransferMode: CHANGED  # or ALL
    VerifyMode: POINT_IN_TIME_CONSISTENT
    
    # How to handle existing files
    OverwriteMode: ALWAYS  # or NEVER
    PreserveDeletedFiles: PRESERVE  # or REMOVE
    
    # Metadata preservation
    PreservePosixPermissions: OWNER_AND_GROUP
    PreserveTimeStamps: PRESERVE
    
    # Performance
    BytesPerSecond: 100000000  # 100 MB/s limit
    
  Schedule:
    ScheduleExpression: "cron(0 0 * * ? *)"  # Daily at midnight
    
  Filters:
    - FilterType: SIMPLE_PATTERN
      Value: "*.log"  # Exclude log files

Transfer Modes

ModeBehavior
CHANGEDOnly transfer new/modified files
ALLTransfer everything (slower, full sync)

Verify Modes

ModeBehavior
POINT_IN_TIME_CONSISTENTVerify after transfer
ONLY_FILES_TRANSFERREDVerify only transferred files
NONENo verification (fastest, least safe)

Use Cases

1. Data Migration

On-premises Data Center → AWS

┌─────────────┐     ┌──────────┐     ┌─────────────┐
│ NFS/SMB     │ ──► │ DataSync │ ──► │   S3/EFS    │
│ (100 TB)    │     │  Agent   │     │             │
└─────────────┘     └──────────┘     └─────────────┘

                     10 Gbps line
                     ~2-3 days

2. Ongoing Replication

Hybrid Cloud - Keep data in sync

On-prem ←──────────────────────────────►  AWS
  │           Scheduled sync              │
  │           (every hour)                │
  ▼                                       ▼
NFS Share                              EFS/FSx

3. Cold Data Archiving

Move old data to S3 Glacier

Production ──► DataSync ──► S3 ──► Lifecycle ──► Glacier
Storage                            Policy

So sánh với các giải pháp khác

FeatureDataSyncTransfer FamilyS3 Transfer AccelSnow Family
PurposeBulk data syncPartner file exchangeSpeed up S3Physical transfer
ProtocolNative (NFS/SMB)SFTP/FTPS/FTPS3 APIPhysical device
SpeedUp to 10 GbpsNetwork limitedEdge optimized100 TB/device
DirectionBidirectionalBothS3 onlyBoth
AgentRequired (on-prem)NoNoNo
Best forLarge migrationsB2B exchangeGlobal uploadsMassive/offline

DataSync vs DIY (rsync, robocopy)

AspectDataSyncDIY
SpeedOptimized, parallelSingle-threaded
ReliabilityAuto-retry, verificationManual handling
ManagementConsole/APIScripts, cron
CostPer-GB pricingFree (but time cost)
MonitoringCloudWatch integratedCustom logging

Pricing

ComponentCost
Data copied$0.0125 - $0.025/GB
AgentFree (software)
EC2 for agentStandard EC2 pricing

Ví dụ:

  • 10 TB migration = ~$125 - $250
  • Monthly sync 100 GB = ~$1.25 - $2.50

💡 Lưu ý: Pricing varies by region và destination type


Exam Tips

Khi đề bài nói:

  • "Migrate on-premises NFS/SMB to AWS"
  • "Sync data between AWS storage services"
  • "Transfer large amounts of data"
  • "Schedule recurring data transfers"
  • "Need data integrity verification"

→ Nghĩ đến AWS DataSync

⚠️ Phân biệt:

  • DataSync: Bulk migration/sync, needs agent for on-prem
  • Transfer Family: Partner SFTP/FTP access
  • Storage Gateway: Ongoing hybrid access (cache locally)
  • Snow Family: Petabyte-scale, offline transfer

Liên kết