Skip to content

Add iOS support via Appium WebDriverAgent#8

Draft
blah-mad wants to merge 208 commits into
ghost-in-the-droid:rc/v1.3.0from
blah-mad:ios
Draft

Add iOS support via Appium WebDriverAgent#8
blah-mad wants to merge 208 commits into
ghost-in-the-droid:rc/v1.3.0from
blah-mad:ios

Conversation

@blah-mad

@blah-mad blah-mad commented Jun 9, 2026

Copy link
Copy Markdown

Summary

This PR turns Ghost in the Droid from an Android-only automation harness into a cross-platform mobile agent with first-class iOS support. Android keeps the existing ADB/Portal path; iOS devices use Appium XCUITest and WebDriverAgent, with real iPhone support as the product target and simulator support for development/CI.

Addresses #7.

What Changed

  • Adds an IOSDevice backend in gitd/bots/common/ios.py with Appium/WDA session management, screenshots, UI tree dump/normalization, taps, swipes, typing, app launch, clipboard, notifications, browser navigation, and recovery-oriented health checks.
  • Adds platform-aware routing for phone APIs, MCP tools, in-process agent tools, skills, explorer, recorder, scheduler jobs, test runner, streaming, app state, app listing, clipboard, notifications, and browser utilities.
  • Adds iOS WDA MJPEG streaming, screenshot polling fallback, stream metadata, viewer pages, recording through WDA MJPEG + ffmpeg, and dashboard stream controls that distinguish Android Portal/WebRTC from iOS WDA streaming.
  • Adds iOS Appium/WDA health and recovery: Appium reachability, session validation, stale session eviction, WDA signing/trust/locked-device classification, RemoteXPC tunnel guidance, simulator discovery through simctl, and per-device iOS config via env/JSON/file.
  • Adds browser/news automation as the first release-quality iOS workflow: open Chrome/Safari, navigate to https://text.npr.org/, extract visible headlines/articles through web context/native/OCR fallback, open articles, and return structured snippets plus screenshots/log evidence.
  • Adds iOS skills and platform metadata: elements_ios.yaml, platform compatibility fields, the iOS browser/news demo skill, and smoke-level TikTok iOS workflows.
  • Updates the frontend dashboard, Skill Creator, Skill Hub, Tools Hub, scheduler, tests tab, and reusable stream widget to understand iOS device refs and platform-specific limitations.
  • Updates docs and README to describe the product as Android + iOS, including setup, API/MCP references, stream behavior, scheduler/browser workflows, and iOS troubleshooting.

Issue #7 Coverage

The RFC asked for:

  • Backend choice: WebDriverAgent through Appium XCUITest. This PR implements that route and documents why it is the closest practical analog to ADB for real iPhones.
  • File layout: iOS backend lives alongside Android under gitd/bots/common/ios.py, with shared routing in gitd/bots/common/device.py and platform-aware services/routers.
  • Minimal IOSDevice POC: Implemented well beyond screenshot/tap/dump UI, including launch, text entry, browser controls, app state, clipboard, notifications, and session recovery.
  • One end-to-end task: Implemented and live-tested the browser/news workflow on a real iPhone and an iOS simulator.
  • Setup docs: Added docs/SETUP_IOS.md, README updates, API/MCP references, and dashboard recovery copy.

Validation

Automated checks run on this branch:

uv run pytest tests/api/test_streaming.py tests/test_ios_chrome_news_smoke.py tests/test_ios_device.py -k 'ios_chrome_news_smoke or xctrace_devices or simctl_simulators or stream'
# 21 passed, 87 deselected

Other validation performed while building the branch:

uv run pytest
# 377 passed, 4 skipped

Live validation performed locally:

  • Real iPhone (ios:00008110-0016443101D0401E) through Appium/WDA.
  • iOS dashboard WDA MJPEG stream decoded in browser at 1170x2532.
  • Chrome news smoke against https://text.npr.org/: extracted 5 headlines and 3 article bodies.
  • Booted iPhone 15 Pro simulator (ios:32918B6B-71E5-4A14-94C6-97F0B8B2DC44) through Appium/WDA.
  • Simulator Safari news smoke: extracted 5 headlines and 3 article bodies.
  • Simulator dashboard WDA MJPEG stream decoded in browser at 1178x2556.

Known Limits / Follow-Ups

  • iOS does not support Android-style Wireless ADB. Wireless iPhone operation needs separate Xcode/CoreDevice/RemoteXPC handling and manual trust/network setup; this PR surfaces stable unsupported errors for Android wireless routes.
  • iOS app enumeration is not equivalent to Android package listing. The implementation combines configured apps, common bundle IDs, and Appium verification.
  • TikTok iOS support is smoke-level: launch/search/profile evidence is implemented; upload/posting is intentionally not included in this PR.
  • Real devices still require a Mac, Xcode, Developer Mode, trust prompts, UI Automation permission when prompted, and WDA signing/provisioning.
  • The PR is intentionally broad because iOS support cuts across device routing, APIs, dashboard, agent tools, skills, scheduler, docs, and tests.

Maintainer Trial Guide

Use this section as a quick handoff guide for reviewing the iOS path.

  1. Check out the PR branch.
gh pr checkout <pr-number>
  1. Install Appium XCUITest.
npm install -g appium
appium driver install xcuitest
appium --base-path /
  1. Prepare a real iPhone or simulator.

For a real iPhone:

  • Trust the Mac on the phone.
  • Enable Developer Mode.
  • Keep the phone unlocked for the first WDA session.
  • Make sure WebDriverAgentRunner is signed with a valid Apple development team.

For a simulator:

xcrun simctl list devices available
xcrun simctl boot '<simulator-udid>'
xcrun simctl bootstatus '<simulator-udid>' -b
  1. Configure Ghost for one iOS target.
export IOS_DEVICE_UDID='<udid>'
export IOS_APPIUM_URL='http://127.0.0.1:4723'
export IOS_BUNDLE_ID='com.google.chrome.ios'      # use com.apple.mobilesafari on stock simulators
export IOS_MJPEG_SERVER_PORT='9100'               # use a unique port per iOS target

For multiple iOS devices/simulators, prefer IOS_DEVICES_JSON so each target can have its own bundle ID and MJPEG port.

  1. Start Ghost.
python3 run.py
cd frontend
npm install
npx vite --host 0.0.0.0 --port 6175

Open http://127.0.0.1:6175, choose the ios:<udid> device, and start the stream. iOS should show WDA MJPEG; Android should keep its existing WebRTC/MJPEG behavior.

  1. Run the iOS product-path smoke.
uv run python scripts/ios_chrome_news_smoke.py \
  --device "ios:<udid>" \
  --bundle-id "$IOS_BUNDLE_ID" \
  --url https://text.npr.org/ \
  --max-headlines 5 \
  --max-articles 3 \
  --fix-health \
  --out-dir data/ios_chrome_news_smoke

Expected result: result.json has ok: true, at least 5 headlines, and article snippets for the requested articles. If setup fails, inspect health.json; it should include actionable connection.status, recommended_fix, and recovery steps.

  1. Try the same route through MCP/agent tools.
list_devices()
device_health("ios:<udid>")
open_url("ios:<udid>", "https://text.npr.org/", "com.google.chrome.ios")
extract_articles("ios:<udid>", 5)
read_news("ios:<udid>", "https://text.npr.org/", 5, 3)
  1. Review platform behavior.
  • Android serials should continue to use ADB/Portal.
  • ios:<udid> refs should route to WDA-backed implementations where available.
  • Android-only operations such as ADB shell, Android intents, Portal overlay, Play Store helpers, and wireless ADB should return clear unsupported-platform errors on iOS.

blah-mad added 30 commits June 8, 2026 00:37
blah-mad added 30 commits June 8, 2026 11:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant