Robotune

Introduction

This article will discuss Robotune, an iOS vocal processing App(lication) built with Csound. Background and motivation will be discussed, followed by an overview of the App. The CSD file used to process the audio input will then be walked through, with notable code excerpts highlighted. iOS programming will then be considered before the article is concluded. An experiential approach is taken to the article, reflecting on the primarily 'trial and error' approach taken to the development of the App. Source code is available on GitHub RoboTune GitHub [1], the App is currently also available on the Appstore RoboTune Appstore [2].

I. Background/Motivation

Context: Teaching Csound

The initial work for this App was completed as an example of the versatility and power of Csound for a module in Audio Programming delivered specifically to Sound Engineering students. The intention was to break down the common production technique of 'tuning' a vocal to first principles. This helped in the analysis of the effect and was intended to introduce Csound in an approachable, audience specific manner.

Subsequently, a more academic module was required in the area. The iOS aspect was thus introduced to illustrate the power of the Csound API. Contemporary vocal processing was also chosen for the broad educational benefit of the algorithms involved; discussion of audio programming, spectral processing, digital signal processing, acousics, psychoacoustics, music production and music theory are all contextually relevant.

Preliminary Work: Critical Listening

In arriving at the effects and algorithms required, a period of critical listening was embarked upon. Contemporary tools for vocal processing were investigated, as well as their application by contemporary artists. Of particular note at the time were compositions/performances by artists such as Volcano Choir [3] and Kimbra [4] ; proprietary, relatively expensive hardware and software appears to be used here. Robotune aims to offer similar effects with some novel additions and eccentricities, as well as an intuitive, immediate experience. Both advanced and casual users should ideally have a positive experience with the App.

References suggest pitch effects, harmonisation, delays, reverbs and looping as the main processes involved.

II. App Design and Use

The App user is immediately presented with a screen of presets. These highlight a number of optimal settings. Presets can also be stored here. On this screen and throughout, contextual help is available.

The second user area allows for alterations to be made to the source, harmony voices and effects; it is accessed to the right of the presets page.

Finally, on the left of the presets area, loops can be recorded and played back. The short video below illustrates the sound world of the App.

III. Code

Structure/Signal Flow

The broad code structure of the csd file will now be discussed. The CSD file (and indeed all code) is available here RoboTune GitHub [1], and will be walked through gradatim.

Note a relatively high value for ksmps is used, to avoid dropouts in the realtime environment intended. Global variables are used as virtual 'patch cords'.

chnget is used throughout to obtain values from the UI.

UDOs make up a significant portion of the file. The first is the main processing instrument. Optimisation is focused upon anywhere that is possible. For example, intensive processing, such as spectral processes are bypassed whenever possible, as user preferences suggest.

The Robotune UDO analyses the incoming audio, using ptrack. A number of pitch tracking options are available in Csound, and many were auditioned in what is most honestly described as a trial and error process. The practice of critical listening to references, researching the processes suggested and then experimenting with various implementations in the context of real time audio was frequently employed. Most processing is performed in the frequency domain.

'Tuning' is achieved by converting incoming audio pitches to a midi scale, and 'shifting' an incoming midi note to the nearest scale appropriate value. A table stores the scale used; the table can be changed according to the tonality desired.

Given that a phase vocoder stream of the incoming audio as well as the desired musical scale are available, the App possesses all information required for automated harmonies. The tuned incoming audio is harmonised by considering its pitch and scale. For example, if the scale chosen by the user is major, the chords built on the tonic, subdominant and dominant will be major. Similarly, all notes in the scale can be harmonised accordingly.

if (kscale == 0) then		; major
    if (kharmtest == 0 || kharmtest == 5 || kharmtest == 7) then
    kgoto major
    endif
    if (kharmtest == 2 || kharmtest == 4 || kharmtest == 9) then
    kgoto minor
    endif
    if (kharmtest == 11) then
    kgoto diminished
    endif
    endif

Major chords are then defined as the addition of a major third, an octave below and an octave and a fifth below.

major:
    if (kauto == 1) then
    kharm1 = 4		;M3
    kharm2 = -12	;-oct
    kharm3 = -17	;--5
    endif

A higher automatic harmony option is also available, which simply employs a fifth above, as opposed to below.

Tuning, is implemented using pvscale. Portamento allows for 'hard' or 'soft' tuning, defining the speed at which the pitch of the input changes. Hard tuning achieves the effect that has become somewhat ubiquitous in the pop charts (it is suggested that the effect has far more widely reaching creative uses). Formants can also be maintained, allowing the 'chipmunk' effect to be employed only if desirable.

fauto pvscale fsig, kratioport, 1		;transpose
if abs(kharm1) > 0 then
if kharm1chng == 1 then
kscl1 = 2 ^ (kharm1 / 12)
endif
fharm1 pvscale fsig, kratioport * kscl1, kharm1form	;formant option
endif

Note, harmony notes are only processed if required.

A novel cross synthesis effect is also offered as an option. The spectral frequencies of a harmonic buzz, pitched at the tuned input, are paired with user definable amplitudes in a cross synthesis process. The spectral amplitudes can be from the input source, the buzz source or a user-defined mix of the two.

abuzz buzz 30000 * ampdb(kamp), ktarget, 50, 1		;buzz source
fbuzz pvsanal    abuzz, ifftsize, ifftsize / 4, ifftsize, iwtype
; cross synthesis: voice to buzz, freqs of buzz, amps of voice, amps can be mix of both...
fauto pvscross fbuzz, fsig, kcross, 1. - kcross

Resynthesis again considers optimal processing, only synthesising non-zero harmonics. Output then offers a dynamic stereo field.

The second UDO implements a simple threshold-based gate to remove noise, typically present when the on-board microphone is used.

instr 1 then reads several values from the UI and applies the above UDOs and sends audio to global effects; these effect instruments are turned on and off by sending score statements when the UI suggests changes. Effects are faded in and out smoothly using linenr. There is scope for further effects in future releases.

Looping of audio was more complex than initially envisaged. instr 20 writes a mono audio loop to a table. To allow syncing to other performers or audio, the loop can be tempo locked. This is achieved by extending or truncating to the nearest 'beat' when the user chooses to stop recording. Visual feedback to the UI is implemented here also, using chnset. Layering of loops, and playback of tracks from a user's music library are potential developments to this section of the App. The user can also audition in the key they are ideally performing, and in the chosen tempo. This is implemented using a simple oscillator-based instrument.

instr 21 implements loop playback, using flooper2.

The code ends with an overall limiter to avoid severe output clipping.

More generally, portamento is used throughout to remove any 'zipper noise'. Control of flow in the code has also been considered in some detail, as many approaches are possible.

Testing and addressing outlying cases took up quite a high proportion of development time. This process is significantly less interesting than initial development, but is obviously critical for release. The approach taken was to consider every possible case in isolation, then holistically.

The code is by no means deemed complete. It is suggested that efficiencies, extra features and generally better ways to code would be welcome additions. The project is proving relatively robust, with 30 crash reports received after over 4500 downloads thus far.

iOS Coding

Overview

RoboTune can be viewed as an experiment in using the relatively new Csound library optimised for iOS. The code is written in objective-c as opposed to the newer iOS Swift language.

Organisation

The application is built using standard MVC architecture. The UI is built using XCode storyboards - one for iPhone and iPad. These storyboards are a series of View Controllers each representing a navigable section of the UI which are wrapped in a Navigation Controller to provide intuitive navigation. The code is intended to be very straightforward to follow. The initial View Controller containing presets appears blank in the storyboard view because the controls on this view are drawn dynamically in code to use the DKCircleButton control to distribute the buttons evenly on all screen sizes.

Csound Interfacing

The following classes are of note for interfacing with Csound:

`ChannelNames.h`

This contains the list of available channels in simple string format and references the CSD file with which the application communicates.

`InputBinding.h`

This provides a SyncDelegate to receive values from channels. It ultimately allows UI controls to be programmatically updated by the current state of a channel. For example, a slider updating the progress of recording a loop with the current time.

`OutputBinding.h`

This provides methods to initialise and set a channel's value. Typically a UI control will cause a channel value to update with some user interaction, for example, moving a slider control.

`CsoundUI.h`

This is the helper class provided as an example with Csound for adding the bindings to CsoundObj. It was modified to provide both Input and Output bindings.

`CsoundSingleton.h`

This contains a reference to the CsoundObj class - the main interface for the Csound library. It sets itself as the listener for CsoundObj, instantiates all of the Input Bindings and Output Bindings dictated by the channels, provides methods to update and receive values from these channels, contains a reference to the ChannelNames.m class, and defines a naming convention for all of the actual channels themeselves, prefixing each with 'channel_' which is used as a key lookup to communicate with channels via reflection-based code. The methods to set channel values and update channels are reflection-based methods using this naming convention to provide generic methods to update any channel by name.

As the name suggests this class is a singleton instance to ensure only 1 object of Csound ever exists in the application. This is useful for complex UI setups where the view is refreshed and/or views can be recycled as users switch context. Initialisation of channel values occurs at first setup and hooks are registered for the detection of plugged in headphones. To avoid feedback issues the App works best with headphones/microphone plugged in.

Default values for preset channels are loaded at the first construction of the class and user presets are loaded from saved storage, if previously saved.

UI Code

To tie the channels configured in the CsoundSingleton class, each 'View' representing a group of UI controls initialises their values from the configured defaults from the channels. Each UI control object is responsible for setting a channel or being updated by a channel.

Note: The View Controller classes should be implementing interfaces as there are methods that they are required to implement such as initCsound from the Csound Singleton class, and this leaves room for a future enhancement.

Following the MVC pattern, each View Controller will ask for an instance of the Csound Singleton object and update the channel values with standard UI callbacks to values being changed. Similarly, some controls are programmatically updated with their values in response to Csound Input Binding type channels, such as loop and record. This is done through the use of the NSTimer class to poll the channels at an appropriate interval to update their value which is kept in sync due to the Input Binding types SyncDelegate. The use of NSTimer to perform updates is to keep the update rate at a reasonable level to keep the UI responsive.

Third Party Libraries

The application makes use of several third party libraries to enhance the UI of the application. These are:

Enhanced horizontal slider [5]
Informational popup modal [6]
Circle buttons with effects [7]
Blur the background and show modal [8]
Glow controls [9]
A circular progress bar [10]
Android-like toast notifications [11]
CSS styling for basic controls [12]

IV. Future Work and Conclusion

Possible additions include playback over music in a user's library, syncing with other Music Apps, and extension into a suite of Apps offering synthesis while keeping the same brand and feel.

The authors feel Robotune achieves a high quality audio effect, with efficient and hopefully educational coding.

References

[1] Brian Carty, and Alan O'Sullivan, RoboTune [Online] Available: https://github.com/robotune [Accessed October, 2016].

[2] Brian Carty, and Alan O'Sullivan, RoboTune [Online] Available: https://itunes.apple.com/ie/app/robotune/id841976751?mt=8 [Accessed October, 2016].

[3] Volcano Choir, "Comrade" [Online] Available: https://www.youtube.com/watch?v=Vvp305B9FoQ [Accessed September, 2016].

[4] Kimbra, "Settle Down" [Online] Available: https://www.youtube.com/watch?v=sd7GLvMYSHI [Accessed September, 2016].

[5] "ASValueTrackingSlider" [Online] Available: https://github.com/alskipp/ASValueTrackingSlider [Accessed October, 2016].

[6] "ENPopUp" [Online] Available: https://github.com/evnaz/ENPopUp [Accessed October, 2016].

[7] "DKCircleButton" [Online] Available: https://github.com/kronik/DKCircleButton [Accessed October, 2016].

[8] "RNBlurModalView" [Online] Available: https://github.com/rnystrom/RNBlurModalView [Accessed October, 2016].

[9] "UIView-Glow" [Online] Available: https://github.com/thesecretlab/UIView-Glow [Accessed October, 2016].

[10] "UAProgressView" [Online] Available: https://github.com/UrbanApps/UAProgressView [Accessed October, 2016].

[11] "Toast" [Online] Available: https://github.com/scalessec/Toast [Accessed October, 2016].

[12] "pixate-freestyle-ios" [Online] Available: https://github.com/Pixate/pixate-freestyle-ios [Accessed October, 2016].

Biography

Carty image Brian Carty is currently Director of Education at Sound Training in Temple Bar, Dublin (soundtraining.com). He completed a PhD, funded by IRCSET and NUI Maynooth (Hume Scholarship), at NUI Maynooth, Ireland in 2011.
His main research interest is headphone based audio with a particular interest in how we can locate sound sources in our auditory environment, and how sound sources at particular locations / moving sound sources can be artificially recreated.
Although this research focuses on digital signal processing algorithms and software development, he is ultimately interested in the creative application of this work.

email: brian AT soundtraining.com