Postcards

A research on a generative machine learning methodology applied to the creative process of architectural design.

Research Statement

POSTCARDS is a research on a generative machine learning methodology applied to the creative process of architectural design.

Generative Adversarial Networks (GAN) and Natural Language Processing (NLP) models have been applied to multiple areas of interest, architecture included.

Most recently (2021), the marriage of this two types of models, through the development of Contrastive Language–Image Pre-training (CLIP), has opened a world of possibilities where images and text can be matched together to create an interrelated output.

This research looks into the possibility of creating NLP//CLIP-GAN based workflows to aid in the conceptual design process of architecture, using descriptive narratives and conceptual sketching to create a generative methodology that will give multiple options very early on in the design process.

 

Prompt Engineering

Text synthesis

“Like programming, but more fluid. You’re not programming a computer, you’re writing reality. It’s strange. It’s always different. It’s never the same twice.” GPT-3

Throughout this research, a few key aspects to prompt engineering for textual synthesis have been compiled, and are to be read as generic rules-of-thumb when interacting with this models. Regarding coherency, long textual requests should be avoided as they greatly increase the chances of incoherence, which will contaminate future work as more requests are submitted. Because of that, curation is an essential part of reaching better results. Editing the answers by GPT-3 will increase the quality of sequential petitions. Another important aspect is delineation of the tasks at hand: one should be precise on the objectives of GPT-3 and clearly state how it should operate. Finally, guidance is an important short-cut for good results: by starting off a phrase to be completed, GPT-3 will more easily focus on your target.

Image synthesis

“Postcards from an unmapped latent space.  The point of a postcard is not to be itself impressive, but to evoke features of a larger region that looks fun to explore. ” Ted Underwood

Regarding prompt engineering for image synthesis, style is perhaps one of the most important aspects. Because the models were trained on data originated from the web, most popular artists are present in that set and can be used to reach a specific style. Regarding composition, it is possible to control (to a certain extent) things such as perspective or boundaries by the use of a starting image in the process of synthesis. Expressiveness is also quite achievable through prompting: if one is after a pen sketch, for example, it can be declared in the prompt. Finally, consistency between different images (regarding the three spheres mentioned earlier) can be obtainable by structuring a prompt and keeping to that structure.

 

Pipelines

Pipelines is a methodological proposition to approximate the production of a conceptual building relying solely on the outputs of natural language descriptions and prompt generation through GPT-3 and visual representation through CLIP-VQGAN.

There is a common start to all pipelines, which begins with artist selection and prompt structuring. This are extremely decisive points for the final outcome, because style and cohesion depend on them. Artists that work best are the ones which have incorporated a lot of architecture in their body of work. For a prompt structure, a description of a space from GPT3 and the naming of a target space are advisable (ex:“The marble floors reflect the columns in the city hall by schuiten and peeters”). Multi-prompting and weighting are also valuable options. A design problem to answer to is also required. For this research, the focus was placed on architecture competitions, as they provide a body of text describing the building requirements. Once this preliminary setup is established, one can choose to use one of the suggested pipelines in “listen”, “interview”, “teach”, “show” and “correct”, or even a combination of those.

Listen is a simple pipeline. One starts a phrase in order to have GPT-3 focussed on the targeted space. Once filled in, that phrase will become the prompt to generate the first image. After getting the first results, new information observed from that image is introduced to the next phrase to be filled in by GPT3, effectively creating a chain of information form image to text, resulting in coherent results one after the other.

 

Interview has particular interest, not just for the generation of images, but especially to uncover new information about a design, from the point of view of different users of a space. By having GPT3 pretending to be a visitor, the janitor or a tourist, an interview is conducted to uncover the particular experience of the building by that person.

 

Teach requires previous trials of different prompt structures and their image results. After curation of the best runs, the prompts that worked best are turned in the examples required to create a prompt generator. As if filling in a list, GPT3 will base his predictions of what a good prompt should be on those good examples, effectively creating a catalogue of iterations to be fed once again into the GAN. This method works at scale, creating multiple descriptive variations for the same space.

 

Show requires a previous discussion of the brief with GPT3, the curation of a particular idea and an initial sketch produced by the user himself. This sketch, along with the text that crystalizes the idea for the building, will become the prompts for the GAN. The more iterations are allowed to the GAN, the further away it will move from the initial sketch, gaining a life of its own.

Correct, the final pipeline, is a brute force method if previous pipelines have not worked of a specific case. In the example provided, a very particular prompt regarding location, shape and materiality shows difficult to attain through the previous pipelines. This can most often happen because the body of work of the artist (or combination of artists) that was selected is far away from what the prompt is asking for. In such a case, one can sketch on top of the first image, feed it once again to the GAN, and keep correcting back and forth until a pretended outcome is archived.

Use Case 02

Use Case 02 (Competition Entry: The Death Monument) is the result of the application of the previously shown pipelines to the development of a conceptual building that answers to an architectural competition. In this case, the brief asks for the reinterpretation of the concept of death through the design of a mausoleum.

In order to reach a submission entry to the chosen competition, a fairly simple cleaning process of the brief was carried out. Once finalized, it was used as the basis for a discussion with GPT-3, looking to allow the model to reach an idea of a design. This idea is then compiled into a final text, which is used as the basis for the application of the pipelines shown before.

Competition brief

“Since history, people have been expressing their thoughts on death in many ways. Marking the lives and deaths of important figures, rulers, and heroes has been the driving force behind many wonderful and influential structures, from memorials and mausoleums. In fact, many ideas, discussions and philosophies surrounding death have been the origin of many great works of art. Some of the quotes express both the physical and metaphysical nature of DEATH. This competition asks designers to re-think and re-imagine Death. Designers are asked to design based on their conceived ideas about death– either material or immaterial – merging considerable programmatic innovation and valuable design tools.”

Part of brief discussion with GPT-3

“The problem with designing a building that is supposed to represent death is that you end up with a bunch of designers trying to come up with the most morbid and depressing design they can think of.

The competition is asking for a reflection on death, but that doesn’t mean you have to design a building that looks like a giant cemetery.

I’m not going to do that.

Instead, I’m going to try and come up with something that is more about the idea of death rather than illustrate it with depressive and obvious concepts.” GPT-3

GPT3`s idea for the mausoleum.

“I have decided that I should focus on the idea of death as transition. The spaces will be designed to be emotionally charged. I wanted people who enter to feel that they are entering a mysterious, otherworldly place. It should have a religious quality to it. People should feel like they are walking through a spiritual journey.There will be five chambers. The first one is the greeting chamber. The second one is the transition chamber. The third one is the memory chamber. The fourth one is the rebirth chamber and the fifth one is called ‘The space in between’.” GPT-3

 

The Death Monument: A conceptual Tour

The reader is now invited to a conceptual tour through the Death Monument, designed through the textual descriptions of the different spaces by GPT3 and the visual representation of those descriptions by CLIP-VQGAN. The tour takes us through the five designated chambers. As we move through the spaces, the highlighted text in the descriptions symbolizes the original prompt used to reach the images. A plan view of the mausoleum with a location tag of the place to which the visualization refers to will help the reader throughout the tour, as if it was a map.

All descriptions by GPT-3 // All images by VQ-GAN

Final Remarks

This methodology does not yet give us buildings, but it does give us something else, something that architects are not used to have during the design process. Emotions, atmospheres, poetry, an attention to senses and details that can push us into new directions. If the intention is to control the outcomes, we still have a long way to go. But if one is after surprise, dream and a colourful take over of the white page, as a generative methodology, this can already get you started. As a glimpse into the future, this postcards hint at what might be coming next for architecture: a true democratization of the design process, where natural language will be at the core of a new way of working. This might finally open the doors of the discipline to the broader audience, where anyone can be capable of creating his own designs without the need for expertise on traditional tools usually required to output convincing results.

Will we, one day, be able to use textual descriptions as if was a parametric design workflow, creating multiple buildings at the glimpse of a sentence? Even then, the designer will need to be at the core of the system, interpreting and exploring what comes out. But even so, the future might hold what it truly means to be creative: breaking free from our own limitations.

Credits

POSTCARDS is a project of IAAC, Institute for Advanced Architecture of Catalonia developed in the Master in Advanced Computation for Architecture & Design in 2021/22 by

Students: João Silva // Faculty: Oana Taut