Appending strings in Python looks simple until scale, interpreter behavior, and readability start pulling in different directions. This guide explains every major string concatenation method, shows what happens inside CPython, and helps you choose the right approach based on correctness, clarity, and performance.
To append strings in Python, use + or an f-string when combining a small number of known values, use "".join(...) when combining many strings from a list or other iterable, and use io.StringIO when writing incrementally to a stream-like buffer. For loop-based string assembly, the safest and scalable pattern is to collect parts and call join() once at the end.
Python String Concatenation Methods
Python provides several ways to combine strings, but the right choice depends on how many pieces you are assembling and whether those pieces are known up front or generated over time. For small, direct concatenations, + and f-strings are clear and efficient. For lists, generators, and loop-built output, str.join() is the standard solution. For stream-style writing, StringIO gives you a file-like buffer that can be more natural than manual concatenation.
- Use
+for small, direct concatenations - Use f-strings for readable interpolation with variables and formatting
- Use
str.join()for collections of strings and loop-based assembly - Use
StringIOfor incremental writes in stream-oriented workflows
| Method | Best Use Case | Allocation Behavior |
|---|---|---|
+ / += |
Joining 2–4 known strings | New object per operation (CPython may optimize +=) |
| f-string | Readable interpolation with formatting | Single allocation via BUILD_STRING bytecode |
str.join() |
Collections, generators, loop-built output | One allocation — pre-computes total length |
StringIO |
Streaming writes, file-like interfaces | Internal resizable buffer |
str.format() |
Reusable or deferred templates | Single allocation per .format() call |
%-formatting |
Logging calls with deferred evaluation | Single allocation per format operation |
Virtually every Python program constructs strings dynamically at some point — building file paths, formatting log messages, assembling SQL fragments, generating CSV output, or producing user-facing text. The central rule behind all of these techniques is the same: Python strings are immutable. Once you understand that constraint, the trade-offs between the different methods become much easier to reason about.
Why String Immutability Matters
In Python, a str object cannot be modified after it is created. This is not a minor implementation detail — it is a foundational guarantee that allows strings to be safely hashed, interned, and shared across the interpreter. When you write greeting = "Hello" and then add to it, you are not editing the characters at the memory address Python handed you. You are creating a brand-new string object and pointing your variable at it.
The practical consequence is that every concatenation operation, at minimum, requires Python to allocate a buffer large enough to hold the combined result, copy both source strings into it, and return the new object. For a single operation this cost is negligible. Repeat it thousands of times inside a loop and the cost grows, because each intermediate string is thrown away immediately after the next iteration builds on it.
The Python data model specifies that str objects are immutable sequences of Unicode code points (Python Docs, Text Sequence Type — str). Everything else discussed in this article flows from that single fact.
It is also worth knowing that CPython, the standard reference implementation you almost certainly use, stores strings using the flexible representation defined in PEP 393, using 1-, 2-, or 4-byte storage depending on the widest code point present. Understanding this helps explain why benchmarks on short ASCII strings can look deceptively fast compared to benchmarks on strings containing emoji or CJK characters.
What Happens During Concatenation
When Python concatenates two strings, the interpreter must construct a new Unicode object containing the combined contents of both operands. Conceptually, the process follows a fixed sequence:
# Step 1 — Two separate string objects exist in memory
a = "Hello"
b = "World"
# Step 2 — Concatenation triggers a new allocation
c = a + b
Internally, the interpreter performs something equivalent to the following:
1. Determine the length of "Hello" (5) and "World" (5)
2. Allocate a new Unicode buffer large enough for both (10)
3. Copy bytes from the first string into the buffer
4. Copy bytes from the second string into the buffer
5. Return the new object and bind it to c
Visualized as a memory layout:
Before concatenation:
a ──► [ H e l l o ]
b ──► [ W o r l d ]
After c = a + b:
a ──► [ H e l l o ] (unchanged)
b ──► [ W o r l d ] (unchanged)
c ──► [ H e l l o W o r l d ] (new object)
The original objects pointed to by a and b remain untouched. Even when CPython applies its in-place optimization for +=, the logical model stays the same: the language guarantees immutability, and the interpreter is free to create a new object whenever necessary. This is why concatenation inside a loop can become expensive — each iteration allocates a new buffer and copies everything built so far.
The + Operator and +=
The most natural way to append one string to another is the + operator.
first_name = "Ada"
last_name = "Lovelace"
full_name = first_name + " " + last_name
print(full_name) # Ada Lovelace
For a handful of operands joined in a single expression, this works well and is easy to read. The += shorthand follows the same semantics:
log = "Request received"
log += " — processing"
log += " — done"
print(log) # Request received — processing — done
Both forms look like in-place mutation, but they are not — at least not officially. The language specification says a new string is always produced. However, CPython contains an optimization inside its evaluator loop that can change the actual behavior in a narrow but common case.
The CPython unicode_concatenate Optimization
When CPython processes an INPLACE_ADD bytecode instruction and both operands are plain str objects, it delegates to an internal function called unicode_concatenate. That function inspects the reference count of the left-hand operand. If the reference count is exactly 1 — meaning the variable on the left is the only thing pointing to that string — and the object is not interned, CPython may resize the existing Unicode object using PyUnicode_Append, avoiding allocation of a completely new string. This is an implementation optimization rather than a language guarantee.
"When Python is performing a concatenation operation between two strings, it calls unicode_concatenate, which checks if the assignment operation you're about to perform would allow you to free the operand on the left-hand side and if so, it simply mutates the existing string rather than allocating a whole new string and copying the old one over." — Paul Ganssle, String concatenation in Python
This means a simple s += chunk loop inside a function, where s is a local variable with no other references, can approach the speed of a C-level resize rather than quadratic copy behavior. Developer Austin Z. Henley demonstrated this directly by inspecting object id() values during += operations and confirming that the memory address sometimes does not change — evidence that the string was extended in place rather than replaced (Python strings are immutable, but only sometimes).
This optimization is fragile. A CPython issue filed in 2022 confirmed that Python 3.11's replacement of several arithmetic bytecodes with the generalized BINARY_OP instruction, as part of the adaptive interpreter, caused the in-place optimization path to be missed in some cases, producing measurably slower loop performance in certain programs (CPython issue #99862). Later fixes restored the expected behavior in subsequent releases. The PyPy interpreter does not implement this optimization at all because it is not reference-counted, and repeated += in loops can exhibit quadratic behavior on interpreters without the CPython reference-count optimization, such as PyPy (PyPy blog, January 2023). Do not rely on this behavior for correctness or performance portability.
The + operator also requires both operands to be strings. Python will not automatically coerce an integer or other type, so you must convert explicitly:
count = 42
# This raises TypeError:
# message = "Total items: " + count
# Correct:
message = "Total items: " + str(count)
print(message) # Total items: 42
str.join() — The Workhorse for Loops
When you need to combine a collection of strings — whether from a list, a generator, or any iterable — str.join() is the canonical Python solution. The syntax places the separator string on the left and the iterable of parts on the right:
words = ["the", "quick", "brown", "fox"]
sentence = " ".join(words)
print(sentence) # the quick brown fox
# Join with no separator
letters = ["P", "y", "t", "h", "o", "n"]
print("".join(letters)) # Python
# Join with a multi-character separator
parts = ["2026", "03", "12"]
date_string = "-".join(parts)
print(date_string) # 2026-03-12
The reason join() is faster than repeated + in a loop is structural. Before writing a single byte, join() iterates through the entire iterable to compute the total length of the final string, then allocates exactly one buffer of that size, and copies all pieces in sequentially. There is only one allocation regardless of how many strings you are combining. The official Python documentation covers this method under str.join().
The typical pattern when building a string incrementally in a loop is to collect parts in a list first, then join at the end:
lines = []
for i in range(1, 6):
lines.append(f"Line {i}: some content")
result = "\n".join(lines)
print(result)
Because join() requires an iterable of strings, you must convert any non-string items before passing them in. A concise way to do this is a generator expression: ", ".join(str(x) for x in my_list). This avoids building an intermediate list of string conversions.
One subtlety worth knowing: join() will raise a TypeError if the iterable contains any non-string items. If your data may be mixed, use a generator expression with str() conversion, as shown in the tip above.
Using += in a loop creates a new string object on each iteration. While CPython's unicode_concatenate optimization can sometimes resize in place, that behavior is not guaranteed across Python implementations. With 10,000 iterations, this approach risks quadratic performance on interpreters like PyPy.
# Avoid this pattern for large collections
result = ""
for item in my_list:
result += item + "," # new string allocated each pass
# The trailing comma is also a bug — you get "a,b,c,"
join() pre-computes the total length, allocates one buffer, and copies all pieces in a single pass. It is the standard Python idiom for combining strings from any iterable and works efficiently across all interpreters.
my_list = ["alpha", "bravo", "charlie"] # imagine 10,000 items
result = ",".join(my_list)
print(result) # alpha,bravo,charlie
# One allocation, no trailing comma, no loop needed
F-strings are designed for embedding a small, known set of expressions inline. You cannot practically write an f-string with 10,000 variable references, and even if you could, it would not be readable or maintainable. F-strings do not accept iterables — they require each value to be explicitly named inside the braces.
# F-strings work well for a few known values
name = "Alice"
score = 95
msg = f"{name} scored {score}"
# But they cannot iterate over a collection
# There is no way to write: f"{for item in my_list: item}"
Why Repeated Concatenation Can Become O(n²)
Repeated string concatenation inside a loop can exhibit quadratic behavior. Consider the following pattern:
s = ""
for chunk in data:
s += chunk
If the optimization path cannot be used, each iteration copies the entire string constructed so far:
Iteration 1 → copy 1 character
Iteration 2 → copy 2 characters
Iteration 3 → copy 3 characters
...
Iteration n → copy n characters
The total work becomes the sum 1 + 2 + 3 + ... + n, which grows proportionally to n(n+1)/2 — that is, O(n²). The cost rises rapidly as the accumulated string becomes larger.
The join() approach avoids this entirely. It computes the total length once, allocates a single buffer, and copies each piece exactly one time — making the total work O(n), where n is the combined length of all pieces. This is the fundamental reason join() is the standard recommendation for combining collections of strings, and it holds true across all Python implementations.
F-Strings for Readable Assembly
F-strings, formally called formatted string literals, were introduced in Python 3.6 under PEP 498, authored by Eric V. Smith. They allow you to embed any valid Python expression directly inside a string literal by prefixing the string with f or F and enclosing expressions in curly braces.
name = "Grace Hopper"
rank = "Rear Admiral"
year = 1947
biography = f"{name}, {rank}, helped popularize the concept of debugging in {year}."
print(biography)
# Grace Hopper, Rear Admiral, helped popularize the concept of debugging in 1947.
F-strings are not just syntactic sugar layered on top of str.format(). When they were first introduced, the CPython implementation did translate them into a series of format calls, which made them slower than expected. Later CPython versions optimized f-string execution using specialized bytecode instructions such as FORMAT_VALUE and BUILD_STRING, allowing the interpreter to construct the result efficiently. In benchmarks of simple two- to four-variable interpolations, f-strings are typically faster than str.format() and comparable to simple + concatenation for small inputs. str.join() remains superior when combining large collections of strings (Olaf Gorski, F-String Performance).
Python 3.12 extended f-string syntax further via PEP 701, which formalized their grammar using the PEG parser. Among other improvements, this allows reusing the same quote character inside an f-string expression — a restriction that existed since 3.6 due to the limitations of the original tokenizer.
# Python 3.12+ allows this (previously a SyntaxError):
names = ["Alice", "Bob"]
result = f"Users: {', '.join(names)}"
print(result) # Users: Alice, Bob
F-strings evaluate their expressions at runtime, not at parse time. This means the value of a variable is captured at the moment the f-string executes, not when it is defined. This matters if you assign an f-string result to a variable and then change the original values:
user = "Alice"
greeting = f"Hello, {user}!"
user = "Bob"
print(greeting) # Hello, Alice! — not Bob
F-strings cannot be combined with the b prefix. If you need to build a bytes object by appending pieces, you will need to encode the final string or use bytearray directly.
str.format() and %-Formatting
Before f-strings arrived, Python offered two older interpolation mechanisms. Both are still valid and appear widely in existing codebases, so you need to recognize them even if you prefer f-strings for new work.
str.format()
str.format() uses a template string with {} placeholders. It supports positional arguments, keyword arguments, and a rich format specification mini-language:
# Positional
msg = "CPU usage: {}% on core {}".format(87, 3)
print(msg) # CPU usage: 87% on core 3
# Named
msg = "Hello, {name}. You have {count} messages.".format(name="Lin", count=5)
print(msg) # Hello, Lin. You have 5 messages.
# Formatting numbers
pi_msg = "Pi is approximately {:.4f}".format(3.14159265)
print(pi_msg) # Pi is approximately 3.1416
str.format() is more flexible than f-strings in one specific scenario: when you need to store a template for later use with varying data. Because f-strings evaluate immediately, you cannot define a reusable template as an f-string without wrapping it in a function or lambda.
# A reusable template — not possible with a bare f-string
template = "Welcome, {username}. Your role is {role}."
users = [
{"username": "charlie", "role": "admin"},
{"username": "diana", "role": "viewer"},
]
for u in users:
print(template.format(**u))
%-Formatting
The oldest formatting mechanism, inherited from C's printf, uses % as a placeholder marker. It is still used heavily in logging, because the standard library's logging module deliberately defers format evaluation until the message is actually going to be emitted:
name = "Eve"
score = 99
result = "Player %s scored %d points." % (name, score)
print(result) # Player Eve scored 99 points.
%-formatting is limited: it supports only a handful of types (strings, integers, and floats), and passing a tuple as the single format argument requires an extra layer of wrapping to avoid a TypeError. PEP 498 explicitly cited these limitations as motivation for introducing f-strings.
StringIO for Data Streams
io.StringIO provides an in-memory file-like object that accepts Unicode strings via its .write() method. It shines in two specific situations: when you are building a string from a potentially large or unknown-length data stream, and when your code is already structured around file-like interfaces (such as passing an object to a function that expects a writable stream).
from io import StringIO
buffer = StringIO()
buffer.write("First segment. ")
buffer.write("Second segment. ")
buffer.write("Third segment.")
result = buffer.getvalue()
print(result)
# First segment. Second segment. Third segment.
Real Python's guide to string concatenation describes StringIO as providing "a native in-memory Unicode container with great speed and performance" for scenarios involving many strings in a data stream (Real Python, Efficient String Concatenation in Python). The key advantage over repeated += is that StringIO manages an internal resizable buffer that reduces the number of allocations required when repeatedly writing strings, avoiding the per-write allocation that naive concatenation can cause.
from io import StringIO
# Useful when integrating with streaming APIs or generators
def build_report(rows):
buf = StringIO()
buf.write("ID,Name,Score\n")
for row in rows:
buf.write(f"{row['id']},{row['name']},{row['score']}\n")
return buf.getvalue()
data = [
{"id": 1, "name": "Alice", "score": 95},
{"id": 2, "name": "Bob", "score": 88},
]
print(build_report(data))
For most programs that simply collect a fixed set of known strings and combine them, join() is simpler and equally fast. StringIO earns its place when the source of data is truly stream-like and you do not know in advance how many pieces you will receive.
Implicit String Literal Concatenation
Python allows adjacent string literals — with no operator between them — to be silently merged at parse time. This is not a runtime operation; it happens in the tokenizer before any code is executed.
# These two literals become one string at compile time
sql = (
"SELECT id, name, email "
"FROM users "
"WHERE active = 1 "
"ORDER BY name;"
)
print(sql)
# SELECT id, name, email FROM users WHERE active = 1 ORDER BY name;
This is genuinely useful for breaking a long string literal across multiple lines while keeping it readable. The parentheses are not required for the merge to happen, but they are the conventional way to allow the line break without a backslash. It is also commonly used to mix single-quoted and double-quoted segments when a string contains both apostrophes and double-quote characters.
Implicit concatenation is a known source of hard-to-find bugs inside lists and function argument lists. If you accidentally omit a comma between two string elements, Python will silently fuse them into one string instead of raising a SyntaxError. Real Python documents this exact trap with a hobbies list example where "Sculpting" and "Gardening" merge into "SculptingGardening" when the separating comma is missing (Real Python). PEP 3126 was even proposed specifically to remove this feature from Python, though it was ultimately rejected.
Choosing the Right Method
With six mechanisms available, the practical question is which one to reach for in a given situation. The answer depends on three factors: how many strings you are combining, whether you know them all at once, and what the code needs to communicate.
Combining a small, known set of values
When you have two to four pieces and all of them are available in the same expression, an f-string or the + operator both read clearly and run fast. F-strings have the edge in readability when any formatting is involved (padding, decimal places, type conversion).
# Clear and efficient for a small known set
host = "db.internal"
port = 5432
dsn = f"postgresql://{host}:{port}/mydb"
Combining many strings from a collection
Whenever you have an iterable — a list, a generator, a query result — use join(). Collect the pieces first if needed, then join once:
# Collect, then join — the standard Python idiom
tokens = []
for word in text.split():
if word.isalpha():
tokens.append(word.lower())
normalized = " ".join(tokens)
Building a string incrementally in a loop
If the number of iterations is large and you are not sure whether the CPython += optimization will activate (for example, in code that must also run on PyPy, Jython, or MicroPython), prefer the list-then-join pattern. If the code is CPython-only, short-loop, and the simpler += form is more readable, it is acceptable — but benchmark if it matters.
# The safest, most portable pattern for loop-based assembly
parts = []
for item in data_source:
parts.append(process(item))
result = "".join(parts)
Working with a reusable template
Use str.format() or string.Template when you want a template string that is defined once and applied repeatedly with different values. F-strings are evaluated immediately and cannot be deferred.
Interfacing with a stream or file-like API
Use StringIO when the context calls for a writable object rather than direct string construction, or when you are processing an open-ended stream whose size you do not know ahead of time.
Formatting legacy code or log messages
%-formatting remains appropriate inside logging calls because the logging framework intentionally defers the format step, avoiding the cost of string construction when the message will not actually be written. Changing these to f-strings would produce the formatted string even when the log level is suppressed:
import logging
# Correct — deferred formatting, no string built if DEBUG is off
logging.debug("Processing record %s of %d", record_id, total)
# Avoid — string built unconditionally even if DEBUG messages are suppressed
logging.debug(f"Processing record {record_id} of {total}")
Performance Comparison
The performance differences between concatenation techniques are easiest to see with a small benchmark. The following test combines 10,000 short strings using three different methods:
import timeit
parts = ["x"] * 10_000
def plus_concat():
s = ""
for p in parts:
s += p
return s
def join_concat():
return "".join(parts)
def stringio_concat():
from io import StringIO
buf = StringIO()
for p in parts:
buf.write(p)
return buf.getvalue()
print("+= loop: ", timeit.timeit(plus_concat, number=100))
print("join: ", timeit.timeit(join_concat, number=100))
print("StringIO: ", timeit.timeit(stringio_concat, number=100))
Typical results on CPython show str.join() as the fastest approach for combining many strings, with StringIO performing similarly for stream-style workloads. The += loop is generally slower, especially when the interpreter cannot apply its in-place optimization. The exact numbers vary depending on the interpreter, hardware, and Python version. The important takeaway is structural: join() performs a single allocation, while repeated concatenation may perform many.
Using + inside large loops is one of the most common performance issues in Python codebases. Even when it appears fast during testing, the behavior can degrade when the interpreter cannot apply its internal optimization — or when the code runs on a different Python implementation. When combining many strings, prefer the list-then-join pattern.
F-strings evaluate their expressions at runtime, at the moment the line executes. The result is an ordinary str object — it does not maintain any live connection to the variables that were used to build it. Once the f-string is evaluated, the resulting string is fixed.
city = "Tokyo"
msg = f"Destination: {city}"
city = "Berlin"
print(msg) # Destination: Tokyo
# msg is just a regular string — it does not track 'city'
F-strings are evaluated immediately when the line runs. The result is a plain string that captures the values at that instant. Changing the source variables afterward has no effect on the already-created string.
user = "Alice"
greeting = f"Hello, {user}!"
user = "Bob"
print(greeting) # Hello, Alice!
# The f-string captured "Alice" when it executed — not "Bob"
Reassigning a variable does not delete its previous value from the already-evaluated string. The f-string was fully resolved into a regular str object before the reassignment happened. No NameError occurs because the f-string never looks at the variable again after it has been evaluated.
x = 100
result = f"Value is {x}"
x = 999
print(result) # Value is 100
# No error — 'result' is already a finished string
# Changing x afterward does not affect it
Key Takeaways
- Strings are immutable by specification. Every append operation logically produces a new object. CPython's
unicode_concatenateoptimization can sometimes reuse memory under specific reference-counting conditions, but this is an implementation detail, not a language guarantee — and it is absent in PyPy. - Use str.join() for iterables. It pre-computes the total length before allocating, making it the correct choice for combining many strings at once. Always collect pieces into a list or generator first, then call
join()once. - Use f-strings for readable interpolation. Introduced in Python 3.6 via PEP 498, f-strings are evaluated at runtime, support any valid Python expression, and benefit from specialized CPython bytecode instructions such as
FORMAT_VALUEandBUILD_STRINGthat make them fast. Python 3.12 (PEP 701) relaxed syntax restrictions further. - str.format() earns its place for deferred or reusable templates. When you need a format string that is applied at a different time or place from where it is defined,
str.format()is the right tool. F-strings cannot be stored and re-applied. - Keep %-formatting in logging calls. The logging module's deferred evaluation model relies on %-style placeholders. Using f-strings in logging calls forces string construction regardless of whether the message will be emitted.
- Use StringIO for stream contexts. When your code must write to a file-like object or process an unknown-length stream,
io.StringIOprovides an efficient in-memory resizable buffer, and it fits naturally into APIs that expect writable objects.
Python's approach to strings rewards clarity over cleverness. In the large majority of programs, an f-string or a join() call will be exactly right, run fast enough that you will never notice the overhead, and communicate intent clearly to anyone reading the code six months later. The deeper understanding of immutability and CPython internals matters most when you are profiling a hot path, writing code that must run on multiple Python implementations, or inheriting a codebase built around older patterns. In those situations, the performance picture is more nuanced than the simplified advice — "never use + in a loop" — that circulates in introductory tutorials. Measure first. Choose the method that is correct, portable, and readable. Optimize only where the evidence points.