AI-Powered Android Device Controller
Complete User Guide & Tutorial
Control one phone or an entire farm of devices using natural language. Just tell the AI what to do — it sees the screen, taps, types, and scrolls for you.
Version 1.0 • March 2026
DroidAI is a desktop application that lets you control Android devices using natural language. Instead of manually tapping through apps, you simply describe what you want done — and DroidAI's AI agent takes over. It observes the device screen, decides which actions to take, executes them, and repeats until the task is complete.
DroidAI supports controlling multiple devices simultaneously, making it ideal for phone farms, testing labs, and automation workflows at scale.
| Component | Requirement |
|---|---|
| Operating System | Windows 10/11 (64-bit) |
| RAM | 4 GB minimum, 8 GB recommended |
| GPU | OpenGL ES 2.0 compatible (for device mirroring) |
| Android Devices | Android 9 (API 28)+ |
| USB | USB 2.0+ cable (data-capable, not charge-only) |
| Internet | Required for LLM API calls |
| LLM API Key | Anthropic, OpenAI, Gemini, DeepSeek, Grok, or custom |
ADB is required for DroidAI to communicate with your Android devices. DroidAI bundles its own ADB, but if you encounter issues:
C:\platform-tools)adb versionadb devices
You should see your device serial number followed by device:
List of devices attached
ABCD1234 device
If you see unauthorized, check your phone for the authorization popup.
The Portal APK is a companion app that provides:
| Button | Color | Meaning |
|---|---|---|
| P | Green | Portal installed, accessibility ON, up to date |
| P | Yellow | Portal installed but outdated — click to update |
| P | Red | Installation failed — check USB debugging permissions |
| K | Green | Portal IME keyboard active and ready |
| K | Red | IME not available — click to enable |
DroidAI requires an LLM API key to power its AI agent. The AI reads the device screen and decides what actions to take.
| Provider | Recommended Models | Vision | Notes |
|---|---|---|---|
| Anthropic | Claude Sonnet 4, Claude Haiku | Yes | Best overall. Supports prompt caching for cost savings. |
| OpenAI | GPT-4o, GPT-4o-mini | Yes | Good alternative with fast response times. |
| Google Gemini | Gemini 2.0 Flash | Yes | Cost-effective option. |
| DeepSeek | DeepSeek Chat | No | Budget option, text-only (no screenshots). |
| Grok | Grok-2 | Yes | xAI |
| Ollama Cloud | Various | Varies | Self-hosted models via Ollama. |
| Kimi | Moonshot | No | Chinese provider. |
| Custom | Any OpenAI-compatible API | Varies | Use with any OpenAI-format provider. |
sk-ant-)sk-)If you have an OpenAI-compatible API endpoint (e.g., local model, proxy, third-party):
http://localhost:11434/v1 for Ollama)DroidAI has a split-panel layout: Chat Panel on the left and Device Grid on the right.
| Button | Function |
|---|---|
| Device | Device management: connect/disconnect, Portal, WiFi, restart ADB, resize |
| Mirror | Toggle mirror mode (green = ON). Forwards mouse/keyboard to device. |
| Settings | Open settings (API keys, agent config, display, streaming) |
| Report | Submit bug reports with diagnostic logs |
| User Menu | Subscription status, email, sign out |
The chat panel has four tabs:
Live execution log. Shows commands, AI reasoning, tool calls, and results.
Manage saved commands, AI personas, AI rules, and app cards.
Macros (record/replay), Workflows (multi-step chains), Triggers.
Raw debug/diagnostic logs for troubleshooting.
| Button | Function |
|---|---|
| Stealth | Toggle stealth mode (red = ON). Human-like delays and jitter. |
| Loop | Toggle repeat/loop mode. Set count and interval. |
| Screenshot | Cycle: OFF → AUTO → ALWAYS. |
| App Cards | Open app card grid for quick app-specific tasks. |
| All / None | Select or deselect all connected devices. |
Each connected device appears as a panel with live screen mirror. Panel controls:
| Button | Function |
|---|---|
| Selection badge | Click to select/deselect for commands |
| P | Portal APK status & install |
| K | Keyboard IME status & enable |
| R | Reconnect device |
| L | Open per-device activity log |
| Enlarge | Full-screen device view |
| × | Disconnect device |
Let's walk through your very first DroidAI task, step by step.
Open YouTube and search for "lofi hip hop radio"| Command | What It Does |
|---|---|
Open Instagram and like 3 posts in my feed | Launches Instagram, scrolls feed, likes 3 posts |
Go to Settings and turn on WiFi | Opens Settings, navigates to WiFi toggle |
Open Chrome and go to google.com | Launches Chrome, navigates to URL |
Send "Hello!" to Mom on WhatsApp | Opens WhatsApp, finds contact, sends message |
Click the red Stop button during execution, or press Ctrl+C while the input field is focused.
For each step of a task, the AI agent follows this cycle:
| Action | Description | Example |
|---|---|---|
tap(x, y) | Tap at screen coordinates | Tap a button |
click(index) | Click element by UI tree index | Click "Send" by index |
type(index, text) | Type text into input field | Type a search query |
scroll(dir) | Scroll up/down/left/right | Scroll feed |
swipe(dir) | Swipe gesture | Swipe through stories |
long_tap(x, y) | Long press at coordinates | Open context menu |
launch(pkg) | Open app by package name | Launch Instagram |
back / home / enter | System navigation buttons | Go back |
wait(ms) | Pause execution | Wait for content |
done | Mark task finished | Task complete |
When enabled, the AI generates a step-by-step plan before execution. Useful for complex tasks where you want to see the approach first.
Maximum actions per task (default: 30, configurable: 10-100). Prevents infinite loops. Increase for long tasks; decrease for simple actions.
DroidAI is built for scale. Connect and control dozens of Android devices simultaneously from a single PC.
For multiple phones, you need a powered USB hub:
| Hub | Ports | Best For | Notes |
|---|---|---|---|
| Sipolar A-423 | 20 | Medium farm (10-20) | Industrial grade, dedicated power supply |
| Sipolar A-400 | 10 | Small farm (5-10) | Compact, desk-friendly |
| Sipolar A-812 | 30 | Large farm (20-30) | Rack-mountable |
| Anker USB 3.0 | 7-13 | Small setups (3-7) | Consumer-grade, reliable |
| Phone | Price (Used) | Android | Pros |
|---|---|---|---|
| Samsung Galaxy S8/S9 | $40-60 | 9-10 | Cheap, reliable ADB, common |
| Samsung Galaxy A series | $50-80 | 11-13 | Good value, newer Android |
| Google Pixel 3/4 | $50-70 | 12-13 | Stock Android, fast ADB |
| Xiaomi Redmi Note | $40-60 | 11-13 | Budget, good specs |
Click "All" to select every device, then type your command. Each device gets its own independent AI agent.
Click the selection badge on each device to target. Commands run only on selected devices.
| Devices | Recommended PC | Notes |
|---|---|---|
| 1-5 | Any modern PC, 8 GB RAM | Runs smoothly |
| 5-15 | 16 GB RAM, decent GPU | Lower resolution to 480p |
| 15-30 | 32 GB RAM, dedicated GPU | Use 360p resolution |
| 30+ | Multiple PCs recommended | Split devices across PCs |
App Cards are pre-configured guides that help the AI understand specific apps. They contain navigation hints and app-specific tips that significantly improve accuracy.
When you send a command, DroidAI checks if the current app has an app card. If found, the card's instructions are injected into the AI's system prompt.
Instagram, Facebook, X (Twitter), Threads, TikTok, Snapchat, Reddit, Pinterest, LinkedIn
WhatsApp, Telegram, Discord, KakaoTalk, LINE, WeChat
YouTube, Spotify, Netflix
Chrome, Gmail, Google Maps, Play Store, Settings, Amazon, Naver
Mirror Mode lets you manually control a device using your PC's mouse and keyboard.
| PC Input | Device Action |
|---|---|
| Left click | Tap at position |
| Click and drag | Swipe/drag gesture |
| Mouse wheel | Scroll on device |
| Keyboard typing | Text input |
| Right click | Back button |
When multiple devices are selected, input is forwarded to all selected devices simultaneously. Useful for setting up multiple phones with the same configuration.
Loop Mode repeats the same command multiple times with optional delays between cycles. Essential for repetitive tasks.
Loop: 10 cycles, 5 minute interval
Command: "Open Instagram, scroll feed, like 3 posts, then close the app"
Result: Every 5 minutes, each device opens Instagram,
likes 3 posts, and closes. Repeats 10 times over ~50 minutes.
Stealth Mode makes the AI's actions appear more human-like by introducing natural variations.
| Feature | Description |
|---|---|
| Tap Jitter | Random offset on tap coordinates (±12px) |
| Speed Variation | Random ±20% variation in action timing |
| Reading Pauses | Random pauses (0.5-3s) between actions |
| Action Delays | Variable delays between consecutive actions |
Save frequently-used commands for one-click execution:
Personas customize the AI agent's behavior. Only one persona can be active at a time.
Examples: Speed Runner (fast, skip verifications), Careful (verify before/after each action), Social Media Expert (navigate social apps expertly).
Rules are constraints always injected into the AI's system prompt. Multiple rules can be active simultaneously.
Examples: "Never purchase anything", "Close ads immediately", "Always use search instead of scrolling", "Skip sponsored content when liking".
Macros record AI actions and replay them without the LLM, saving API costs.
Select the macro and click Play — replays exact actions without calling the LLM.
Chain multiple commands into sequential flows. Each step runs only after the previous one completes.
Condition-based auto-execution. Set conditions and DroidAI evaluates them with the LLM to decide when to act.
DroidAI can send screenshots to the AI for visual understanding. Three modes available:
| Mode | Description |
|---|---|
| OFF | UI tree only, no screenshots |
| AUTO | UI tree + screenshot when tree empty or agent stuck (≥2 failures) |
| ALWAYS | UI tree + screenshot every iteration |
Control DroidAI remotely from your phone using a Telegram bot.
/newbot and follow prompts to create a bot| Command | Description |
|---|---|
/help | List all available commands |
/devices | Show all connected devices |
/select [device] | Set default device |
/run [command] | Execute a task |
/screenshot | Take and receive a screenshot |
/repeat [n] [interval] [cmd] | Loop execution remotely |
/stop | Stop current execution |
/status | Check device and agent status |
| Setting | Default | Description |
|---|---|---|
| Provider | Anthropic | LLM provider selection |
| API Key | — | Your provider's API key |
| Model | — | Specific model to use |
| Prompt Caching | ON | Cache system prompts (Anthropic only) |
| Setting | Default | Range | Description |
|---|---|---|---|
| Max Steps | 30 | 10-100 | Maximum actions per task |
| Action Delay | 0s | 0-5s | Pause between actions |
| Conversation History | 20 | 5-40 | Messages kept in context |
| Action History | 5 | 1-15 | Prior actions in prompt |
| Full Context | 3 | 1-5 | UI tree detail level |
| UI Tree Filter | Concise | — | Concise or Detailed output |
| Setting | Default | Description |
|---|---|---|
| Language | English | UI language (13 languages) |
| Device Panel Size | 100% | Scale device panels (40-200%) |
| Font Scale | 100% | UI text size (50-200%) |
| Setting | Default | Range | Description |
|---|---|---|---|
| Resolution | 720p | 240-1080p | Device screen resolution |
| Bitrate | 4 Mbps | 1-12 Mbps | Stream quality |
| Max FPS | 30 | 5-60 | Frame rate |
| Symptom | Solution |
|---|---|
| Device not listed | Check USB cable (data, not charge-only). Try adb devices. |
unauthorized in adb | Check phone for USB debugging popup. Tap "Allow". |
offline in adb | Unplug and replug USB. Try different port. |
| Device appears then disappears | Faulty cable or USB port. Try different cable/port. |
| Symptom | Solution |
|---|---|
| P button stays red | Enable "Install via USB" in Developer Options. |
| P button yellow | Click P to reinstall latest version. |
| Accessibility not enabling | Manually: Settings → Accessibility → DroidAI Portal → ON. |
| K button red | Settings → Language & Input → enable DroidAI Keyboard. |
| Symptom | Solution |
|---|---|
| Agent does nothing | Check API key. Verify internet. Check Activity for errors. |
| Agent taps wrong elements | Enable Screenshot mode (AUTO or ALWAYS). |
| Agent stuck in loop | Click Stop. Try rephrasing your command. |
| "Max steps reached" | Increase Max Steps in Settings (up to 100). |
| Text input fails | Ensure Portal IME is active (green K). |
A typical task (10-15 steps) costs ~$0.01-0.03 with Claude Sonnet, ~$0.005-0.01 with GPT-4o-mini. Screenshots add ~$0.01-0.02 each.
Internet is required for LLM APIs. However, you can use a local model via Ollama with the Custom provider for offline use.
No hard limit. Practical limits depend on hardware. Users commonly run 10-30 devices per PC.
Yes. Any ADB-compatible device works: BlueStacks, NoxPlayer, Android Studio AVD. Connect via adb connect localhost:PORT.
API keys are stored locally in settings.json in your AppData folder. Never sent to DroidAI servers.