How to communicate with your video production team

Video production as house building

When you hire a (good) contractor to renovate your house, they will guide you on when, where and how you should be providing input and making decisions, to ensure that the process goes smoothly and on budget with the best end result.

Conversely, if you – as the client – were to suddenly decide while the painter was doing final touches that you actually would prefer the bathroom on the other side of the house, it’s going to get expensive – at least for someone.

Homes, like many things, are built in a somewhat fixed linear progression. Some things can progress in parallel, but for the project to stay on track, every department has to be following the plan and on the same page. Moving a wall at a late stage doesn’t just affect the carpenter – it’s more work for the plumber, the electrician, the dry wall guy, the plasterer and so on. What if that pushes the schedule which then blows the window for another contractor to be able to do their work?

On the flip-side, decisions like the colour temperature of the ceiling lamps may be a more trivial affair and don’t need to be discussed early on when more crucial blueprint plans (with flow-on impacts) are being committed to.

Of course we don’t need to know everything – that’s the whole point of hiring someone else – but it’s important for us to understand at least the broad strokes of how the sausage gets made, so we can participate in that process and get the result that we want.

 

It’s all about layers

I will be making some very broad generalisations here, for the sake of simplicity, but here goes.

It’s tempting for those who don’t ordinarily work in the medium of video, to think of it kind of like a powerpoint slideshow with some more movement and sound underneath.

For those of us familiar with adding and deleting slides, copying and pasting layouts and so forth, it’s natural for us to think in those terms and assume that video projects can be manipulated in a similar fashion. For various reasons we will touch on, this is not really the case.

A more useful way to think about video production is in terms of LAYERS, both of video and audio.

From a video point of view, there will be multiple items stacked at any time, but we – just like when looking from above at a stack of papers – will only see whatever is on the top at that moment. As the playhead proceeds forward in time lower layers are revealed or obscured depending on what is stacked.

The audio layers – conversely – are all playing simultaneously and being mixed together, though we can shift the emphasis of each over time depending on what we want the audience to hear most. For example, we can dip the music track down while someone is speaking, and then lift it back up again during a montage. There are a multitude of sophisticated mixing techniques and tools employed and there are good arguments to be made that the audio is more than half of the audience experience, but we will leave this here for now.

 

Layer by layer

The creation of these layers – almost like phases of a house’s construction – will also (generally) correspond with the phases of how the project is developed.

 

The dialogue track

A documentary will usually be ‘driven’ by a dialogue track of some kind. This means that it is the main ‘skeleton’ that is leading the audience’s attention on a moment to moment basis. This could be a scripted voice-over (from someone the audience never sees), or it could be a sit-down delivery from a teleprompter, or it could be a combination of pieces from one or more casual style interviews from one or more people, which are seamlessly cut together in such a fashion that they flow naturally.

With all of these – in theory – if the audience was to shut their eyes, the video in many cases will still essentially work as long as the verbal content from the speaker (or speakers) flows and progresses cleanly.

Of course in most cases outside of some kind of live presentation, we are referring to an edited dialogue track. The content from the speaker has in many cases been heavily shuffled and cut together far more than the audience might expect, in order to create a natural sounding but tight and well-flowing delivery. It’s not just removing umms and ahhs, but avoiding redundant phrases and repetitive sentences (which we all use far more than we realise) and restructuring content to convey the message in as succinct and compelling a way as possible.

It’s important to remember that video is a real-time medium. Videos are playing at a fixed speed – you can’t cast your eyes forward quickly like reading a book, or glance back to re-read something you missed. The show marches on and if you miss it you miss it. Of course if you’re controlling the youtube progress bar you can go back and see what you missed, but generally if you are being forced to pause and replay just to track with a video, it’s not a pleasant experience and something is wrong.

The main point is that videos in many ways have more in common with a piece of music than a powerpoint deck. Everything is locked together in time, and whether it ‘feels’ right and flows naturally or not has a lot to do with rhythm, pacing and transitions rather than merely the visual content which is being presented over the top. Which brings us to… music.

 

MUSIC

You might be wondering why I’ve jumped from the dialogue edit – which is essentially the base level driver of the narrative – to music.

What about the b-roll footage? What about the graphics? What about those drone shots we got? Isn’t music just an afterthought that you do a 5 minute browse through a library for, then slap it on at the end and call it a day?

Actually, from a documentary point of view (or video in general, really) the interrelationship between the dialogue track and the music underneath it, is a huge part of creating the tone of the piece and is critical to getting everything to feel ‘right’.

Not only does music totally shift the tone of how the content is being presented, but for sections where there IS no dialogue – for example a montage sequence – music will have a massive impact on how THAT material is perceived and processed.

The difference between comedy and horror is the music.Jordan Peele

The Jordan Peele quote sounds like hyperbole but it really isn’t. Search youtube for [your favourite comedy] as a horror movie trailer (or vice versa) if you don’t believe me.

Perception-wise, the difference between a well chosen music track and a lazy slap-on, is like the difference between walking into a party with a tailored Armani suit, and wandering in wearing pyjamas with food down the front. We’ve all seen videos with generic cliche music running on loop for 5 minutes, which apparently bears no relevance to what is occurring on screen.

The alternative to bad music, is well-chosen and thought-out music, well-timed, so that the audience is (almost) invisibly led emotionally to react beat-by-beat in the way that you intend. Music can non-verbally evoke all sorts of things, either literally, or ironically, in all sorts of ways.

Watch a Hollywood movie (which has been scored by a pro who has been doing this for decades) and pay close attention to how the music cues.. CUE your emotional reaction to what is occurring on screen. The same happens (or doesn’t) with the music in your documentary.

The reason why I’m banging on about the importance about music, is that way way your dialogue track and music interplay with each other, really is the story and tonal backbone to a documentary video, and so you want to get it working as early as possible, and avoid messing with it late in the process wherever you can.

If there’s a nice music track where the editor has managed to arrange is so that the crescendo builds and hits the climax at the perfect point in the dialogue track – and then they are asked to remove even a few words from the speaker’s last sentence – those two seconds of shifting everything back could easily be enough to ruin the timing and ‘break’ something that previously was working great. Good editors will always shuffle things and do what they can to get things to flow and work as well as possible, but it’s important to be aware that removing words isn’t like removing a powerpoint slide and there’s no guarantee the attempt to recapture the magic of a previous cut will return when late-stage structural modifications occur.

Of course, while you’re hearing dialogue and subtle well-chosen (or composed) music, you also probably want to be seeing something other than the speaker’s face continually, and that is where we get to b-roll.

 

B-ROLL

A very short history lesson: traditionally, when documentaries were shot on celluloid motion picture film (as opposed to digital), the crew would use two separate film magazines.

  1. A-roll – spoken on-camera interviews
  2. B-roll – all the other shots that will be overlaid on top, during the edit

This means that when they processed the film the editor would end up with two reels – one with all the interview footage, and the other with all the clips that they could (literally) cut and use ‘over the top’. History lesson over.

So b-roll in a documentary essentially refers to any kind of (non-graphics) footage that you might see – other than the literal footage of the interview subject speaking.

Of course you CAN just shoot an interview of someone sitting in a room talking, then cut the sound grabs you like together and call it a day – and in some particular cases if that is the more powerful option – but generally it’s nice to keep things moving with shots out and about that open things up a bit visually.

There is a huge range of what might constitute b-roll and it’s really only limited by time, imagination, and (like anything) budget. The easiest and fastest go-to option (which we see in news gathering stories on television for these reasons) is to capture the interview subject doing a few generic things that can be easily cut into some logical-looking sequences almost irrespective of what verbal content ends up getting used in the final edit.

We have all seen our interview subject hero walking down a corridor, going into a room, looking at books on a book shelf, sitting at a computer typing, or staring off into the distance pensively. If you shoot all of these with a fairly neutral expression, you could cut these shots over pretty much any story you can imagine and it wouldn’t seem entirely out of place. As always though, safe and generic is the enemy of interesting, and if we want to aim a little higher we have to get a little more specific and imaginative.

You will notice when watching high-end documentaries the b-roll and sequencing is sometimes abstract and multi-purpose, but often it is quite specific and deliberate. This means that planning and thinking went into it, and most likely it was planned and shot some days or weeks AFTER the primary interviews took place, not just tacked onto the end of a shoot day.

This is a bigger topic, but specificity is a huge element (alongside lighting, tone, mood, framing and so on) that contributes to a shot or sequence feeling ‘cinematic’. When something specific is occurring in a certain part of the frame, and we know why we’re looking at it, and what it means to us, our minds get focused and a kind of hypnosis sets in (that’s what we want).

A screen capture from the documentary ‘The Imposter’.

It’s important then that when working with a documentary crew you have a very clear understanding of the kind of material you want them to capture, and why. Otherwise, you will likely find that they will revert to ‘autopilot’ and grab whatever is around and convenient, which may well be material that you don’t want to include in your piece. Ideally these shots are captured so that they can be assembled to form logical visual sequences which will help the edit flow naturally and feel deliberate. This way the video will give the audience the confidence that the story is ‘going somewhere’, and that they are in good hands.

Communicating to your team before the shoot what NOT to capture is equally important – if they know that there are things that you specifically want to avoid seeing, then they won’t waste time and effort trying shoot things that will never stand a chance of ending up in the edit anyway. That precious time during the shoot can be then spent on capturing and improving shots that will find their way on screen.

Shooting b-roll is a huge topic but I just want to emphasise here that when watching videos and films, audiences are extremely sensitive visually to what they see on screen. Most of us are quite perceptive in our regular lives as it is – if a colleague comes into work a little unkempt we might think they had a rough night. If we’re on a first date we’re gauging eye-contact, posture, body language, clothing choices and so on.

All of these mechanisms are in play when watching people in a documentary, but since the images are deliberately concentrated inside a rectangle, it’s even more so. A famous hollywood rom-com director once said that an entire film – which does everything else right – can be ruined by a bad haircut.

That is not to say that everyone needs a hair and wardrobe team watching over their every move to keep each follicle in place, but it’s vital to consider what kind of b-roll is being captured and what messages are being sent overtly and otherwise. One great strength of film (and why it’s such an effective propaganda tool) is that it can easily embed unconscious messaging in its presentation.

We’ve all seen the cliche of the ‘expert’ on the TV programme sitting in front of a wooden bookshelf. We all know that it’s intentional and designed to lend the subject a learned and authoritative aura, but it STILL WORKS even though we’re consciously aware of it. Conversely if we stood that expert in an alley and interviewed him while smoking a cigarette and finishing off some McDonalds fries, it wouldn’t quite have the same mystique.

The same goes for b-roll – especially when watching people interact, we have the sensation that we’re watching someone in their natural habitat and behaving as they truly are. We innately trust what we see, more than what we hear. So if we see a subject chatting with colleagues, confidently using their hands, explaining something to them, we automatically start to associate them with a leadership role and some kind of authority. If we only saw that same person tapping into a keyboard in a room on their own, we don’t see those social cues and so will not make that association.

 

Graphics

‘Graphics’ we can define as anything visual that we have in the edit that wasn’t captured with a camera as part of the shooting process. This could be the name and title of an interview subject superimposed as text in the corner of the screen, or it could be a 3D animated dinosaur smashing through a wall in a visual effects shot. Graphics can be charts, diagrams, and any other kind of integrated visuals that help tell the story or convey useful information as it goes along.

The amount of work involved in a given animation of course varies wildly, but in general these graphics are animated such that they are tied to the specific timing of the edit, and in the case of documentaries often directly connected and timed to the delivery of what the person is saying. For this reason, graphics are generally produced as rough timing placeholders while the edit and story are being confirmed, and only once the picture is ‘locked’ – ie. there will be no more changes to the timing or content of the material, are the graphics finalised in their detailed form. Working in this fashion ensures that effort and time is not wasted in producing detailed graphics that may become unnecessary (or need to be rebuilt) if certain parts of the edit are removed, or retimed. It’s the same reason we don’t build, paint and wire up electrical sockets in a wall until we decide we’re sure that we really want that wall there.

The preliminary timing placeholders might just be plain text that refers to what will be seen later e.g. [INSERT HISTORICAL TIMELINE HERE] or they might be a static reference image that will later be animated to time out with the dialogue. It can be scary to look at these placeholders and have no idea what is going to be there eventually, but this is why it’s important to clearly communicate with the team and find out what this thing is going to eventually look like. They should be able to provide a reference sample from something similar (from their own work or someone else’s) so that you have a sense of the general aesthetic and level of detail they will be providing.

When a project requires a large amount of graphics, that (presumably) should be cohesive and stylistically matched, it is good practice to go back and forth on a ‘style frame’ or two, so you can come to an aesthetic, colour scheme, typeface and so forth that you’re happy with, before they go away and make a thousand things that don’t match your expectations.

This is no different to choosing a sample tile for your bathroom before the tiler lays down a tonne in a colour you hate, and any sane tiler wouldn’t begin work without getting your confirmation on that.

If the graphics are somewhat complicated in design and animation, there will generally be another step whereby still-frames of later graphics are produced (in the style of the agreed aesthetic) and confirmed before these go through the labour intensive process of animation.

Wrapping Up

As always this is really just the tip of the iceberg, but I hope this provides some context as to what might be going on the other side of the fence of a documentary production, and how you can use that awareness to achieve the best result for your project.

If you would like to discuss the potential for your project, feel free to get in touch.