I Want Better Logging
Some context, right now at $dayjob
I work on the “platform team” which braodly
means I have all the responsibilities of your typical ops/infra girl but I am
also the developer of first and last resort for all code, 1st and 3rd party,
we use across all our products. I promise it’s not as stressful as it sounds but
it does give me a… unique view of codebases. I care a whole heck of a lot about
things like logging, error handling, tracing, and stack traces.
Let’s take a look at some real logs from one of our products.
This request will be satisfied with Forms Support Level 8
Can’t verify CSRF token authenticity.
[S3.Client 206 0.067541 0 retries] head_object(Bucket="[FILTERED]", Key="[FILTERED]", PartNumber=1)
http_get on instance credentials
Using email attribute from ’email’: [REDACTED]
Completed unread_count
None
Like I’m not putting you on, this is all I get. There’s no context I’m holding
back. You got any idea what the hell any of these mean? Where you might even
start looking in the source to see where it’s generated? Me either. I mean the
developers were clearly looking our for me by adding a log.source
where it
show’s me the class the log origi… aaaaannd they’re all JSONApiController
.
Guess I’m grepping for the string and hoping.
Okay let’s pretend that [REDACTED]
was some obvious garbage value and do an
analysis. Since we don’t have any code refs off to ag 'Using email attribute from'
and huzzah!
class JwtMixin:
...
USER_CLAIM = "ourapp/username"
EMAIL_CLAIM = "ourapp/email"
@classmethod
def find_email_claim(cls, decoded_token):
key = next(for c in [cls.USER_CLAIM, cls.EMAIL_CLAIM] if decoded_token[c])
token_value = decoded_token[key]
logger.info(f"Using email attribute from '{key}': {token_value}")
return token_value
Cool. Cool. From here I pretty much have two questions. Where did this method
get called from and what was the value of decoded_token
that is clearly
causing some kind of problem?
Let’s break out that nice little find-references feature of my editor. Ugh,
there’s like 20 different places this could have been called from. Great. Maybe
I can figure that out with tracing? Haha what tracing? Guess I’ll play “match up
the timestamps” with the frontend and lb logs and see what endpoint the user
probably hit. Nice I actually got a hit! It’s
/api/v2/notifications/unreads
which let me check urls.py
to see
where it goes and JSONApiController
. The black box into which all hope dies.
God I wish an actual error had occured because then I might have gotten a stack
trace. Okay, let’s just make this codepath actually raise an error and run the
tests. Okay it’s not so dire, all of the callsites wind up coming together at
this authenticate
method.
jwt_mixin.py:127 `find_email_claim'
jwt_mixin.py:100 `jwt_email'
jwt_mixin.py:31 `from_jwt'
jwt_backend.py:113 `user_from_token'
jwt_backend.py:47 `validate_request'
jwt_backend.py:14 `authenticate'
So now I just have to connect the dots and trace all the variables from
authenticate
to find_email_claim
to determine the most likely path that
garbage was introduced and the bug will probably show itself.
Or we could just have the logger do all of that.⌗
Proof of concept.
from inspect import currentframe, getouterframes
class Logger:
@classmethod
def log(cls, message: str):
f_cur = currentframe()
assert f_cur is not None
print(f"Message: {message}")
for f in getouterframes(f_cur.f_back):
print(f'File "{f.filename}", line {f.lineno}, in {f.function}')
if f.code_context:
print("\n".join(f.code_context), end="")
if len(f.frame.f_locals.items()):
print("Locals")
for k, v in f.frame.f_locals.items():
print(f" {k}={v!r}")
if len(f.frame.f_code.co_freevars):
print("Vars From Clojure")
for k in f.frame.f_code.co_freevars:
print(f" {k}")
and some code to use it
def outer(a=4, b=3):
local = f"foo {a} {b}"
def clojure():
cloj = local + " bar"
return middle([cloj, "meep"])
return clojure()
def middle(x):
return inner(*x)
def inner(c=15, d="bloop"):
Logger.log(f"Doing inner {c} {d}")
return "baz"
Gives us
Message: Doing inner foo 4 3 bar meep
File "[REDACTED]/ccc/logger.py", line 43, in inner
Logger.log(f"Doing inner {c} {d}")
Locals
c='foo 4 3 bar'
d='meep'
File "[REDACTED]/ccc/logger.py", line 39, in middle
return inner(*x)
Locals
x=['foo 4 3 bar', 'meep']
File "[REDACTED]/ccc/logger.py", line 33, in clojure
return middle([cloj, "meep"])
Locals
cloj='foo 4 3 bar'
local='foo 4 3'
Vars From Clojure
local
File "[REDACTED]/ccc/logger.py", line 35, in outer
return clojure()
Locals
a=4
b=3
clojure=<function outer.<locals>.clojure at 0x10ba8db80>
local='foo 4 3'
File "<stdin>", line 1, in <module>
Locals
__name__='__main__'
__doc__=None
__package__=None
__loader__=<_frozen_importlib_external.SourceFileLoader object at 0x10b9a2ca0>
__spec__=None
__annotations__={}
__builtins__=<module 'builtins' (built-in)>
currentframe=<function currentframe at 0x10bcf9b80>
getouterframes=<function getouterframes at 0x10bcf9a60>
Logger=<class 'ccc.logger.Logger'>
outer=<function outer at 0x10ba8dd30>
middle=<function middle at 0x10bd00160>
inner=<function inner at 0x10bd001f0>
😍 😍 😍 I’m in love! Serioiusly, can you imagine debugging with all this juicy info available to you?!?
Prior Art⌗
It looks like it should be possible to get this working but the docs seem to be pointing in the direction that the addition of local vars happens in the traceback formatter (i.e the pretty output) and not part of the data that’s actually logged.
# hynek/structlog:/src/structlog/processors.py#L579
class StackInfoRenderer:
"""
Add stack information with key ``stack`` if ``stack_info`` is `True`.
Useful when you want to attach a stack dump to a log entry without
involving an exception and works analogously to the *stack_info* argument
of the Python standard library logging.
…
"""
Maybe you could make it work with your logger wrapper outlined
here but
that’s as much effort as making your own Logger
class.