Storage
AWS DataSync
Data Migration, Bulk Transfer, On-premises to AWS Sync
Tổng quan
AWS DataSync là dịch vụ data transfer giúp di chuyển large amounts of data giữa on-premises và AWS, hoặc giữa các AWS storage services.
Key benefits:
- ⚡ Nhanh - Up to 10x faster than open-source tools
- 🔐 Secure - TLS encryption in-transit
- ✅ Reliable - Automatic integrity verification
- 📊 Controlled - Bandwidth limiting, scheduling
Cách hoạt động
Source và Destination
Supported Locations
| Location Type | As Source | As Destination |
|---|---|---|
| NFS (on-prem) | ✅ | ✅ |
| SMB (on-prem) | ✅ | ✅ |
| HDFS (on-prem) | ✅ | ❌ |
| S3 | ✅ | ✅ |
| EFS | ✅ | ✅ |
| FSx for Windows | ✅ | ✅ |
| FSx for Lustre | ✅ | ✅ |
| FSx for OpenZFS | ✅ | ✅ |
| FSx for NetApp ONTAP | ✅ | ✅ |
| Other cloud (Google, Azure) | ✅ | ❌ |
Transfer Scenarios
DataSync Agent
Khi nào cần Agent?
| Scenario | Agent cần? |
|---|---|
| On-premises NFS/SMB → AWS | ✅ Required |
| HDFS → AWS | ✅ Required |
| AWS → AWS (S3, EFS, FSx) | ❌ No agent |
| Other cloud → AWS | ❌ No agent (API-based) |
Agent Deployment
Agent ↔ AWS Communication
Task và Transfer Options
Task Configuration
Transfer Modes
| Mode | Behavior |
|---|---|
| CHANGED | Only transfer new/modified files |
| ALL | Transfer everything (slower, full sync) |
Verify Modes
| Mode | Behavior |
|---|---|
| POINT_IN_TIME_CONSISTENT | Verify after transfer |
| ONLY_FILES_TRANSFERRED | Verify only transferred files |
| NONE | No verification (fastest, least safe) |
Use Cases
1. Data Migration
2. Ongoing Replication
3. Cold Data Archiving
So sánh với các giải pháp khác
| Feature | DataSync | Transfer Family | S3 Transfer Accel | Snow Family |
|---|---|---|---|---|
| Purpose | Bulk data sync | Partner file exchange | Speed up S3 | Physical transfer |
| Protocol | Native (NFS/SMB) | SFTP/FTPS/FTP | S3 API | Physical device |
| Speed | Up to 10 Gbps | Network limited | Edge optimized | 100 TB/device |
| Direction | Bidirectional | Both | S3 only | Both |
| Agent | Required (on-prem) | No | No | No |
| Best for | Large migrations | B2B exchange | Global uploads | Massive/offline |
DataSync vs DIY (rsync, robocopy)
| Aspect | DataSync | DIY |
|---|---|---|
| Speed | Optimized, parallel | Single-threaded |
| Reliability | Auto-retry, verification | Manual handling |
| Management | Console/API | Scripts, cron |
| Cost | Per-GB pricing | Free (but time cost) |
| Monitoring | CloudWatch integrated | Custom logging |
Pricing
| Component | Cost |
|---|---|
| Data copied | $0.0125 - $0.025/GB |
| Agent | Free (software) |
| EC2 for agent | Standard EC2 pricing |
Ví dụ:
- 10 TB migration = ~$125 - $250
- Monthly sync 100 GB = ~$1.25 - $2.50
💡 Lưu ý: Pricing varies by region và destination type
Exam Tips
✅ Khi đề bài nói:
- "Migrate on-premises NFS/SMB to AWS"
- "Sync data between AWS storage services"
- "Transfer large amounts of data"
- "Schedule recurring data transfers"
- "Need data integrity verification"
→ Nghĩ đến AWS DataSync
⚠️ Phân biệt:
- DataSync: Bulk migration/sync, needs agent for on-prem
- Transfer Family: Partner SFTP/FTP access
- Storage Gateway: Ongoing hybrid access (cache locally)
- Snow Family: Petabyte-scale, offline transfer
Liên kết
- S3 - Common destination
- EFS - File storage destination
- FSx - Managed file systems
- AWS Transfer Family - SFTP/FTP service
- Snow Family - Physical data transfer
- AWS Storage Gateway - Hybrid storage
- Direct Connect - Dedicated network (faster DataSync)