Automating site transfers from Acquia Site Factory to Pantheon

Automating site transfers from Acquia Site Factory to Pantheon

I was given the assignment of migrating a set of sites from Acquia Site Factory to Pantheon. I wrote a script to perform the repetitive task of transferring databases and files between platforms.

This script assumes that there is already a working codebase installed on Pantheon (we use a custom upstream) equivalent to that running on Acquia Site Factory. It requires properly-configured Drush, Acquia CLI and Terminus, and administrative privileges on Acquia and Pantheon. The script accepts up to 4 arguments—the site’s domain name on Site Factory, the site’s name on Pantheon, the environment on Pantheon, and an option to run only database or file operations:

./acquia_pantheon_sync_site.sh www.mydomain.com my-domain
./acquia_pantheon_sync_site.sh www.mydomain2.com my-domain-2 test db
./acquia_pantheon_sync_site.sh www.mydomain3.com my-domain-3 test files
./acquia_pantheon_sync_site.sh www.mydomain4.com my-domain-4 live

Because I had to coordinate with site stakeholders and network services for DNS changes, running this script across all the sites in one sitting wasn’t feasible and I had to batch the deployments over a few releases. Nevertheless I performed an initial run on all sites as a test, which copied their databases and sets of files on my local machine. I realized that if I left those files in place, I could re-run the script later to sync the database and retrieve only files that had changed since the original run. This would be a much faster operation, and allowed each site to be easily re-synced at any time.

Script setup

I begin by using the domain name to get the site ID of the domain on Site Factory, and then check to make sure that the site name specified is found on Pantheon before continuing with operations.

#!/bin/zsh -i
ACQUIA_APPLICATION_ID="01234567-89ab-cdef-0123-456789abcdef"  # Edit the Site Factory application id.
ACQUIA_ENVIRONMENT_ID="0123-456789ab-cdef-0123-4567-89abcdef0123"  # Edit the Site Factory environment (likely 01live) id.
    
DOMAIN_NAME="$1"
PANTHEON_SITE_NAME="$2"
PANTHEON_ENVIRONMENT="$3"
SCOPE="$4"
    
if [[ -z "$DOMAIN_NAME" ]]; then
    echo "Must provide domain name." 1>&2
    exit 1
fi
if [[ -z "$PANTHEON_SITE_NAME" ]]; then
    echo "Must provide Pantheon site name." 1>&2
    exit 1
fi
if [[ -z "$PANTHEON_ENVIRONMENT" ]]; then
    echo "Using Pantheon environment dev." 1>&2
    PANTHEON_ENVIRONMENT="dev"
    sleep 2
fi
    
cd ~/Projects/acquia-pantheon-sync || exit  # Edit the location on your local where you'll save files.
echo "========================================================================================"
echo "\u001b[37mSyncing $DOMAIN_NAME from Acquia to Pantheon\u001b[0m"
echo "----------------------------------------------------------------------------------------"
    
echo "Determining Acquia site ID..."
STATUS=$(ssh -t -o PasswordAuthentication=no myfactorysf.01live@myfactorysf01live.ssh.enterprise-g1.acquia-sites.com "drush9 status --uri=$DOMAIN_NAME --root=/var/www/html/myfactorysf.01live/docroot" | grep 'Site path')  # Edit the ssh user and docroot paths.
ACQUIA_SITE_ID="${STATUS##*/}"
ACQUIA_SITE_ID="$(echo "$ACQUIA_SITE_ID" | cut -c 1-10)"
if [[ "$ACQUIA_SITE_ID" =~ 'default' ]]; then
  echo "Domain not found, exiting."
  echo "========================================================================================"
  exit 1
fi
echo "Found site $ACQUIA_SITE_ID."
echo "----------------------------------------------------------------------------------------"
    
echo "Determining Pantheon site ID..."
PANTHEON_SITE_ID=$(terminus site:info "$PANTHEON_SITE_NAME" --field id)
if [[ -n "$PANTHEON_SITE_ID" ]]; then
  echo "Found site $PANTHEON_SITE_ID."
else
  echo "Site not found, exiting."
  echo "======================================================================================="
  exit 1
fi
echo "----------------------------------------------------------------------------------------"

Database operations

I start by executing a SQL dump of the site database on the Site Factory server, and then use that public URL to import the database on Pantheon.

Warning: We considered this to be acceptable because our sites don’t contain PII or other sensitive information. Still, we put the dump in an oddly-named and non-browsable folder to provide a minimal level of obscuration. This almost certainly wouldn’t be permitted to maintain HIPAA compliance, and you almost certainly will need to find a more secure location to drop the database files.

After the database export and import, I enable and disable some modules as needed and update some configuration.

if [[ ! $SCOPE == 'files' ]]; then
  echo "Creating a database backup on Acquia..."
  drush @myfactorysf.01live --root=~/Sites/my-factory -l "$DOMAIN_NAME" sql-dump --result-file=sites/default/files/random-backup-folder-7890xyz/"$DOMAIN_NAME".sql --gzip # Edit the drush alias, local site root and the backup destination folder. 
  echo "----------------------------------------------------------------------------------------"
    
  echo "Importing the database backup on Pantheon..."
  terminus import:db "$PANTHEON_SITE_NAME".$PANTHEON_ENVIRONMENT http://myfactorysf.01live.acsitefactory.com/sites/default/files/random-backup-folder-7890xyz/"$DOMAIN_NAME".sql.gz -y  # Edit to match the backup destination above.
  echo "----------------------------------------------------------------------------------------"
    
  echo "Uninstalling Syslog module on Pantheon..."
  terminus drush "$PANTHEON_SITE_NAME".$PANTHEON_ENVIRONMENT pm:uninstall syslog -y
  echo "Uninstalling ACSF modules on Pantheon..."
  terminus drush "$PANTHEON_SITE_NAME".$PANTHEON_ENVIRONMENT pm:uninstall acsf acsf_theme acsf_variables acsf_duplication -y
  echo "Enabling Dblog module on Pantheon..."
  terminus drush "$PANTHEON_SITE_NAME".$PANTHEON_ENVIRONMENT pm:enable dblog -y
  echo "Enabling Pantheon Advanced Page Cache module on Pantheon..."
  terminus drush "$PANTHEON_SITE_NAME".$PANTHEON_ENVIRONMENT pm:enable pantheon_advanced_page_cache -y
  echo "Setting cron interval to 3 hours on Pantheon..."
  terminus drush "$PANTHEON_SITE_NAME".$PANTHEON_ENVIRONMENT cset automated_cron.settings interval 10800 --no-interaction
  echo "Setting cache max-age to 1 hour on Pantheon..."
  terminus drush "$PANTHEON_SITE_NAME".$PANTHEON_ENVIRONMENT cset system.performance cache.page.max_age 3600 --no-interaction
  echo "Clearing the Drupal cache on Pantheon..."
  terminus drush "$PANTHEON_SITE_NAME".$PANTHEON_ENVIRONMENT cr -y
  echo "----------------------------------------------------------------------------------------"
fi

File operations

I first check to see if there is a record of the Pantheon server in known_hosts, and if not add it; this allows the script to run without stopping for user intervention. I then use Acquia CLI to pull the files to my local machine, and rsync them to Pantheon. I originally intended to find a solution that didn’t use my computer as an intermediary location for synced files, but this worked well enough for my purposes and didn’t require provisioning more paid resources. Here I also tried the Terminus rsync plugin, but didn’t like that I couldn’t visually monitor the individual file transfers.

if [[ ! $SCOPE == "db" ]]; then
  echo "Checking if Pantheon server is known..."
  PANTHEON_SERVER="appserver.$PANTHEON_ENVIRONMENT.$PANTHEON_SITE_ID.drush.in"
  if ! ssh-keygen -F "[$PANTHEON_SERVER]:2222" ; then
    echo "Adding $PANTHEON_SERVER:2222 to known hosts"
    ssh-keyscan -t rsa -p 2222 "$PANTHEON_SERVER" >> ~/.ssh/known_hosts
  fi
  echo "----------------------------------------------------------------------------------------"
    
  echo "Pulling site files from Acquia..."
  acli pull:files $ACQUIA_ENVIRONMENT_ID "$ACQUIA_SITE_ID"
  echo "----------------------------------------------------------------------------------------"
    
  echo "Pushing site files to Pantheon... (this may take a few minutes)"
#  terminus remote:rsync ./docroot/sites/"$ACQUIA_SITE_ID"/files/ "$PANTHEON_SITE_NAME".$PANTHEON_ENVIRONMENT:files -y
  rsync -rLvz --size-only --checksum --ipv4 --progress -e 'ssh -p 2222' ./docroot/sites/"$ACQUIA_SITE_ID"/files/ --temp-dir=~/tmp/ $PANTHEON_ENVIRONMENT."$PANTHEON_SITE_ID"@appserver.$PANTHEON_ENVIRONMENT."$PANTHEON_SITE_ID".drush.in:files/
  echo "----------------------------------------------------------------------------------------"
fi

Results

This script ran quite reliably for most sites. Occasionally I would encounter some issues with rsync, and re-running the files script would usually fix the issue. I think it was mostly related to system-generated files, so adding some --exclude parameters to the rsync command might be helpful.