This article will discuss Robotune, an iOS vocal processing App(lication) built with Csound. Background and motivation will be discussed, followed by an overview of the App. The CSD file used to process the audio input will then be walked through, with notable code excerpts highlighted. iOS programming will then be considered before the article is concluded. An experiential approach is taken to the article, reflecting on the primarily 'trial and error' approach taken to the development of the App. Source code is available on GitHub RoboTune GitHub, the App is currently also available on the Appstore RoboTune Appstore.
Context: Teaching Csound
The initial work for this App was completed as an example of the versatility and power of Csound for a module in Audio Programming delivered specifically to Sound Engineering students. The intention was to break down the common production technique of 'tuning' a vocal to first principles. This helped in the analysis of the effect and was intended to introduce Csound in an approachable, audience specific manner.
Subsequently, a more academic module was required in the area. The iOS aspect was thus introduced to illustrate the power of the Csound API. Contemporary vocal processing was also chosen for the broad educational benefit of the algorithms involved; discussion of audio programming, spectral processing, digital signal processing, acousics, psychoacoustics, music production and music theory are all contextually relevant.
Preliminary Work: Critical Listening
In arriving at the effects and algorithms required, a period of critical listening was embarked upon. Contemporary tools for vocal processing were investigated, as well as their application by contemporary artists. Of particular note at the time were compositions/performances by artists such as Volcano Choir and Kimbra ; proprietary, relatively expensive hardware and software appears to be used here. Robotune aims to offer similar effects with some novel additions and eccentricities, as well as an intuitive, immediate experience. Both advanced and casual users should ideally have a positive experience with the App.
References suggest pitch effects, harmonisation, delays, reverbs and looping as the main processes involved.
II. App Design and Use
The App user is immediately presented with a screen of presets. These highlight a number of optimal settings. Presets can also be stored here. On this screen and throughout, contextual help is available.
The second user area allows for alterations to be made to the source, harmony voices and effects; it is accessed to the right of the presets page.
Finally, on the left of the presets area, loops can be recorded and played back. The short video below illustrates the sound world of the App.
Note a relatively high value for
ksmps is used, to avoid dropouts in the realtime environment intended.
Global variables are used as virtual 'patch cords'.
chnget is used throughout to obtain values from the UI.
UDOs make up a significant portion of the file. The first is the main processing instrument. Optimisation is focused upon anywhere that is possible. For example, intensive processing, such as spectral processes are bypassed whenever possible, as user preferences suggest.
Robotune UDO analyses the incoming audio, using
ptrack. A number of pitch tracking options are available in Csound, and many were auditioned in what is most honestly described as a trial and error process. The practice of critical listening to references, researching the processes suggested and then experimenting with various implementations in the context of real time audio was frequently employed. Most processing is performed in the frequency domain.
'Tuning' is achieved by converting incoming audio pitches to a midi scale, and 'shifting' an incoming midi note to the nearest scale appropriate value. A table stores the scale used; the table can be changed according to the tonality desired.
Given that a phase vocoder stream of the incoming audio as well as the desired musical scale are available, the App possesses all information required for automated harmonies. The tuned incoming audio is harmonised by considering its pitch and scale. For example, if the scale chosen by the user is major, the chords built on the tonic, subdominant and dominant will be major. Similarly, all notes in the scale can be harmonised accordingly.
if (kscale == 0) then ; major if (kharmtest == 0 || kharmtest == 5 || kharmtest == 7) then kgoto major endif if (kharmtest == 2 || kharmtest == 4 || kharmtest == 9) then kgoto minor endif if (kharmtest == 11) then kgoto diminished endif endif
Major chords are then defined as the addition of a major third, an octave below and an octave and a fifth below.
major: if (kauto == 1) then kharm1 = 4 ;M3 kharm2 = -12 ;-oct kharm3 = -17 ;--5 endif
A higher automatic harmony option is also available, which simply employs a fifth above, as opposed to below.
Tuning, is implemented using
pvscale. Portamento allows for 'hard' or 'soft' tuning, defining the speed at which the pitch of the input changes. Hard tuning achieves the effect that has become somewhat ubiquitous in the pop charts (it is suggested that the effect has far more widely reaching creative uses). Formants can also be maintained, allowing the 'chipmunk' effect to be employed only if desirable.
fauto pvscale fsig, kratioport, 1 ;transpose if abs(kharm1) > 0 then if kharm1chng == 1 then kscl1 = 2 ^ (kharm1 / 12) endif fharm1 pvscale fsig, kratioport * kscl1, kharm1form ;formant option endif
Note, harmony notes are only processed if required.
A novel cross synthesis effect is also offered as an option. The spectral frequencies of a harmonic buzz, pitched at the tuned input, are paired with user definable amplitudes in a cross synthesis process. The spectral amplitudes can be from the input source, the buzz source or a user-defined mix of the two.
abuzz buzz 30000 * ampdb(kamp), ktarget, 50, 1 ;buzz source fbuzz pvsanal abuzz, ifftsize, ifftsize / 4, ifftsize, iwtype ; cross synthesis: voice to buzz, freqs of buzz, amps of voice, amps can be mix of both... fauto pvscross fbuzz, fsig, kcross, 1. - kcross
Resynthesis again considers optimal processing, only synthesising non-zero harmonics. Output then offers a dynamic stereo field.
The second UDO implements a simple threshold-based gate to remove noise, typically present when the on-board microphone is used.
instr 1 then reads several values from the UI and applies the above UDOs and sends audio to global effects; these effect instruments are turned on and off by sending score statements when the UI suggests changes.
Effects are faded in and out smoothly using
linenr. There is scope for further effects in future releases.
Looping of audio was more complex than initially envisaged.
instr 20 writes a mono audio loop to a table. To allow syncing to other performers or audio, the loop can be tempo locked. This is achieved by extending or truncating to the nearest 'beat' when the user chooses to stop recording.
Visual feedback to the UI is implemented here also, using
Layering of loops, and playback of tracks from a user's music library are potential developments to this section of the App.
The user can also audition in the key they are ideally performing, and in the chosen tempo. This is implemented using a simple oscillator-based instrument.
instr 21 implements loop playback, using
The code ends with an overall limiter to avoid severe output clipping.
More generally, portamento is used throughout to remove any 'zipper noise'. Control of flow in the code has also been considered in some detail, as many approaches are possible.
Testing and addressing outlying cases took up quite a high proportion of development time. This process is significantly less interesting than initial development, but is obviously critical for release. The approach taken was to consider every possible case in isolation, then holistically.
The code is by no means deemed complete. It is suggested that efficiencies, extra features and generally better ways to code would be welcome additions. The project is proving relatively robust, with 30 crash reports received after over 4500 downloads thus far.
RoboTune can be viewed as an experiment in using the relatively new Csound library optimised for iOS. The code is written in objective-c as opposed to the newer iOS Swift language.
The application is built using standard MVC architecture. The UI is built using XCode storyboards - one for iPhone and iPad. These storyboards are a series of View Controllers each representing a navigable section of the UI which are wrapped in a Navigation Controller to provide intuitive navigation. The code is intended to be very straightforward to follow. The initial View Controller containing presets appears blank in the storyboard view because the controls on this view are drawn dynamically in code to use the
DKCircleButton control to distribute the buttons evenly on all screen sizes.
The following classes are of note for interfacing with Csound:
This contains the list of available channels in simple string format and references the CSD file with which the application communicates.
This provides a
SyncDelegate to receive values from channels. It ultimately allows UI controls to be programmatically updated by the current state of a channel. For example, a slider updating the progress of recording a loop with the current time.
This provides methods to initialise and set a channel's value. Typically a UI control will cause a channel value to update with some user interaction, for example, moving a slider control.
This is the helper class provided as an example with Csound for adding the bindings to
CsoundObj. It was modified to provide both Input and Output bindings.
This contains a reference to the
CsoundObj class - the main interface for the Csound library.
It sets itself as the listener for
CsoundObj, instantiates all of the Input Bindings and Output Bindings dictated by the channels, provides methods to update and receive values from these channels, contains a reference to the
ChannelNames.m class, and defines a naming convention for all of the actual channels themeselves, prefixing each with
'channel_' which is used as a key lookup to communicate with channels via reflection-based code. The methods to set channel values and update channels are reflection-based methods using this naming convention to provide generic methods to update any channel by name.
As the name suggests this class is a singleton instance to ensure only 1 object of Csound ever exists in the application. This is useful for complex UI setups where the view is refreshed and/or views can be recycled as users switch context. Initialisation of channel values occurs at first setup and hooks are registered for the detection of plugged in headphones. To avoid feedback issues the App works best with headphones/microphone plugged in.
Default values for preset channels are loaded at the first construction of the class and user presets are loaded from saved storage, if previously saved.
To tie the channels configured in the
CsoundSingleton class, each 'View' representing a group of UI controls initialises their values from the configured defaults from the channels. Each UI control object is responsible for setting a channel or being updated by a channel.
Note: The View Controller classes should be implementing interfaces as there are methods that they are required to implement such as
initCsound from the Csound Singleton class, and this leaves room for a future enhancement.
Following the MVC pattern, each View Controller will ask for an instance of the Csound Singleton object and update the channel values with standard UI callbacks to values being changed. Similarly, some controls are programmatically updated with their values in response to Csound Input Binding type channels, such as loop and record. This is done through the use of the
NSTimer class to poll the channels at an appropriate interval to update their value which is kept in sync due to the Input Binding types
SyncDelegate. The use of
NSTimer to perform updates is to keep the update rate at a reasonable level to keep the UI responsive.
Third Party Libraries
The application makes use of several third party libraries to enhance the UI of the application. These are:Enhanced horizontal slider
Informational popup modal
Circle buttons with effects
Blur the background and show modal
A circular progress bar
Android-like toast notifications
CSS styling for basic controls
IV. Future Work and Conclusion
Possible additions include playback over music in a user's library, syncing with other Music Apps, and extension into a suite of Apps offering synthesis while keeping the same brand and feel.
The authors feel Robotune achieves a high quality audio effect, with efficient and hopefully educational coding.
 Brian Carty, and Alan O'Sullivan, RoboTune [Online] Available: https://github.com/robotune [Accessed October, 2016].
 Brian Carty, and Alan O'Sullivan, RoboTune [Online] Available: https://itunes.apple.com/ie/app/robotune/id841976751?mt=8 [Accessed October, 2016].
 Volcano Choir, "Comrade" [Online] Available: https://www.youtube.com/watch?v=Vvp305B9FoQ [Accessed September, 2016].
 Kimbra, "Settle Down" [Online] Available: https://www.youtube.com/watch?v=sd7GLvMYSHI [Accessed September, 2016].
 "ASValueTrackingSlider" [Online] Available: https://github.com/alskipp/ASValueTrackingSlider [Accessed October, 2016].
 "ENPopUp" [Online] Available: https://github.com/evnaz/ENPopUp [Accessed October, 2016].
 "DKCircleButton" [Online] Available: https://github.com/kronik/DKCircleButton [Accessed October, 2016].
 "RNBlurModalView" [Online] Available: https://github.com/rnystrom/RNBlurModalView [Accessed October, 2016].
 "UIView-Glow" [Online] Available: https://github.com/thesecretlab/UIView-Glow [Accessed October, 2016].
 "UAProgressView" [Online] Available: https://github.com/UrbanApps/UAProgressView [Accessed October, 2016].
 "Toast" [Online] Available: https://github.com/scalessec/Toast [Accessed October, 2016].
 "pixate-freestyle-ios" [Online] Available: https://github.com/Pixate/pixate-freestyle-ios [Accessed October, 2016].
Brian Carty is currently Director of Education at Sound Training in Temple Bar, Dublin (soundtraining.com). He completed a PhD, funded by IRCSET and NUI Maynooth (Hume Scholarship), at NUI Maynooth, Ireland in 2011.
His main research interest is headphone based audio with a particular interest in how we can locate sound sources in our auditory environment, and how sound sources at particular locations / moving sound sources can be artificially recreated.
Although this research focuses on digital signal processing algorithms and software development, he is ultimately interested in the creative application of this work.
email: brian AT soundtraining.com