MAT 338 Day 24

1147 days ago by pub

Today will be as action-packed as last time was leisurely.  Monday we will talk about some of the homeworks - next time is all the ones from last time plus a few more, hand-in is probably still a bit from now.

We begin with a lemma seemingly unrelated to writing things as sums of squares.

Lemma (most of Theorem 4.6 in the book): There is a u such that u^2\equiv -1 mod (p) for p an odd prime if p\equiv 1 mod (4).

This is something we could have done immediately after Wilson's Theorem, but did not because of the first snow day.  However, you did (as I recall) do some exploration related to this.

Proof: Recall that in Wilson's Theorem, we showed that (p-1)!\equiv -1 mod (p) by pairing up all the numbers from 2 to p-2 in pairs of multiplicative inverses mod (p), thus:

(p-1)!=1\cdot 2\cdot 2^{-1}\cdot 3\cdot 3^{-1}\cdots (p-1) \equiv (p-1)\equiv -1\text{ mod (}p)\, .
 Now, assuming that Wilson's Theorem is true, we will pair up the numbers from 1 to p-1 in a different way, in pairs of additive inverses mod (p):  
(p-1)!=1\cdot (p-1)\cdot 2\cdot (p-2)\cdot 3\cdot (p-3)\cdots \frac{p-1}{2}\cdot\frac{p+1}{2}=\left[1\cdot 2\cdot 3\cdots \frac{p-1}{2}\right]\cdot \left[(p-1)\cdot (p-2)\cdots \frac{p+1}{2}\right]\, .
 This makes sense because (p-1)/2 is an integer halfway between 1 and p, as p is odd. 

But if we rethink things mod (p), we can rewrite this in a more suggestive way.  Let \left(1\cdot 2\cdot 3\cdots \frac{p-1}{2}\right) be called f (this is also \left(\frac{p-1}{2}\right)!, of course):  

\left[1\cdot 2\cdot 3\cdots \frac{p-1}{2}\right]\cdot \left[(p-1)\cdot (p-2)\cdots \frac{p+1}{2}\right]\equiv f \cdot \left[-1\cdot -2\cdot -3\cdot -\frac{p-1}{2}\right] \equiv f\cdot (-1)^{\frac{p-1}{2}}\left[1\cdot 2\cdot 3\cdots \frac{p-1}{2}\right]\equiv (-1)^{\frac{p-1}{2}}f^2\, .
 Now, when p\equiv 1 mod (4), then p=4k+1 so \frac{p-1}{2}=2k is even.  That means:
-1\equiv f^2\text{ or }f^2\equiv -1\text{ mod }(p)
so in fact there precisely two square root of negative one (as Lagrange's Theorem suggests), \pm \left(\frac{p-1}{2}\right)! - somehow a satisfying answer.

p=13 k=mod(factorial((p-1)/2),p) k;-k;k^2;(-k)^2 
       
5
8
12
12
5
8
12
12

Now we will use this to prove our conjecture from last time - that every odd prime of the form p=4k+1 can be represented as the sum of two squares!  First, let's look at the following diagram.

%auto %hide var('x,y') @interact def _(p=5): u=mod(factorial((p-1)/2),p) viewsize=ceil(math.sqrt(p))+2 g(x,y)=x^2+y^2 plot1 = implicit_plot(g-p,(-viewsize,viewsize),(-viewsize,viewsize),plot_points = 100) grid_pts = [[i,j] for i in [-viewsize..viewsize] for j in [-viewsize..viewsize]] plot_grid_pts = points(grid_pts,rgbcolor=(0,0,0),pointsize=2) lattice_pts = [coords for coords in grid_pts if (coords[1]-u*coords[0])%p==0] plot_lattice_pts = points(lattice_pts, rgbcolor = (0,0,1),pointsize=20) show(plot1+plot_grid_pts+plot_lattice_pts, figsize = [5,5], xmin = -viewsize, xmax = viewsize, ymin = -viewsize, ymax = viewsize, aspect_ratio=1) 
       

Click to the left again to hide and once more to show the dynamic interactive window

As you can see, with n=5 we are getting points on the circle x^2+y^2=n.  What magic did I do to turn that square root of -1 mod (n) into points on the circle?  The key lies within the grid of blue points; certainly you can think about it as a bunch of corners of parallelograms.   (Sometimes we generically call things like the set of blue dots a lattice, though we won't need to know that - we will usually use the word lattice only to refer to the integer lattice of the black dots.)   Also as a preliminary, for any old point (x,y) in the integer lattice, but especially for our blue dots, let's define one more thing - the norm of the point, N(x,y)=x^2+y^2.  Okay, now we're ready to go on.  

The definition of these points, assuming that p is our prime and k=\left(\frac{p-1}{2}\right)! is our square root of negative one, is that they all look like (a,ak+bp) for all integers a,b.  (This is an example like vectors generated by a basis, for those who have had linear algebra - except instead of being vectors over \mathbb{Q} or \mathbb{R}, they are over \mathbb{Z}.)

Suppose now that we find some blue dot (a,ak+bp) such that

0<N(a,ak+bp)=a^2+(ak+bp)^2<2p\, .
 Then we know, modulo p, that
a^2+(ak+bp)^2 \equiv a^2+(ak)^2\equiv a^2+a^2 k^2\equiv a^2-a^2\equiv 0\text{ mod (}p)\, ,
so p in fact divides the norm of the point (a,ak+bp).  But we just said that 0<a^2+(ak+bp)^2<2p but p\mid a^2+(ak+bp)^2!   So the only possibility is that p=a^2+(ak+bp)^2, and we would be done proving what was discovered by you all last time.

@interact def _(p=5): u=mod(factorial((p-1)/2),p) viewsize=2*p g(x,y)=x^2+y^2 plot1 = implicit_plot(g-p,(-viewsize,viewsize),(-viewsize,viewsize),plot_points = 100) plot2 = implicit_plot(g-2*p,(-viewsize,viewsize),(-viewsize,viewsize),plot_points = 100) plot3 = line([[0,0],[2*p-2*Integer(u),2],[2*p,0],[2*Integer(u),-2],[0,0]],rgbcolor=(1,0,0)) plot4 = line2d([[0,0],[2*p,0]],rgbcolor=(1,0,0),linestyle='--') grid_pts = [[i,j] for i in [-viewsize..viewsize] for j in [-viewsize..viewsize]] plot_grid_pts = points(grid_pts,rgbcolor=(0,0,0),pointsize=2) lattice_pts = [coords for coords in grid_pts if (coords[1]-u*coords[0])%p==0] plot_lattice_pts = points(lattice_pts, rgbcolor = (0,0,1),pointsize=10) plot_lattice_pts2 = points([[2*coords[0],2*coords[1]] for coords in lattice_pts], rgbcolor = (0,1,0),pointsize=20) show(plot1+plot2+plot3+plot4+plot_grid_pts+plot_lattice_pts+plot_lattice_pts2, figsize = [5,5], xmin = -viewsize/2, xmax = viewsize, ymin = -viewsize/2, ymax = viewsize/2, aspect_ratio=1) 
       

Click to the left again to hide and once more to show the dynamic interactive window

Now we will prove there is a blue dot (somewhere) that is not at the origin but also has norm smaller than 2p (remember the inequality above).  The bigger circle is the one we care about now - it has formula x^2+y^2=2p, so radius \sqrt{2p}.  If we find a blue point inside that circle but not at the origin, then the above argument proves it must be on the smaller circle. 

The way we do this is with area - surprise!  The area of the bigger circle is \pi (\sqrt{2p})^2=2\pi p.  Unlike the book, I will not assume you need multiple integration to prove this fact.  Likewise, I assume that it is fairly well known that \pi >2, so that 2\pi>2(2)=4, which mean that the area of the bigger circle is bigger than 4p.  None of this should be controversial.

We next look at the green dots, which are what you get when you double the blue dots' coordinates.  (Notice that green dots are still in locations of blue dots; this can be called a sublattice.)  It isn't quite as clear, but not too hard, to see that the thinnest triangle made by the green dots has height 2p (from the origin to (2p,0), which has a=1 and b=0, then doubled) and base 2 (to the point (2k,2), which has a=0 and b=1 doubled again).  This triangle I have put in red, and hence has area 4p/2 - so the parallelogram with the solid red lines has area 4p.  And certainly the green points repeat every time you move outside this parallelogram, infinitely often.  Is any of this paragraph controversial?

@interact def _(p=5): u=mod(factorial((p-1)/2),p) viewsize=2*p g(x,y)=x^2+y^2 plot1 = implicit_plot(g-p,(-viewsize,viewsize),(-viewsize,viewsize),plot_points = 100) plot2 = implicit_plot(g-2*p,(-viewsize,viewsize),(-viewsize,viewsize),plot_points = 100) plot3 = line([[0,0],[2*p-2*Integer(u),2],[2*p,0],[2*Integer(u),-2],[0,0]],rgbcolor=(1,0,0)) plot4 = line2d([[0,0],[2*p,0]],rgbcolor=(1,0,0),linestyle='--') grid_pts = [[i,j] for i in [-viewsize..viewsize] for j in [-viewsize..viewsize]] plot_grid_pts = points(grid_pts,rgbcolor=(0,0,0),pointsize=2) lattice_pts = [coords for coords in grid_pts if (coords[1]-u*coords[0])%p==0] plot_lattice_pts = points(lattice_pts, rgbcolor = (0,0,1),pointsize=10) plot_lattice_pts2 = points([[2*coords[0],2*coords[1]] for coords in lattice_pts], rgbcolor = (0,1,0),pointsize=20) show(plot1+plot2+plot3+plot4+plot_grid_pts+plot_lattice_pts+plot_lattice_pts2, figsize = [5,5], xmin = -viewsize/2, xmax = viewsize, ymin = -viewsize/2, ymax = viewsize/2, aspect_ratio=1) 
       

Click to the left again to hide and once more to show the dynamic interactive window

Let's look at the picture again.  The area of the circle is more than the area (4p) of the parallelogram.  Since all points inside the parallelogram (not just green, blue, or lattice points) repeat outside of it, 4p is the biggest area you can have and not repeat some point - so the circle, having a bigger area, must have two points which are repeated by the shifting of this parallelogram (called a fundamental region).  This part should sound a little hairy, but I find it better to be vague about it than to do all the details as the book does.  The point is that it is reasonable that there are two different points v and w (not necessarily lattice points) in the big circle, which can be gotten by shifting the parallelogram v is in to a different parallelogram which w is in.  

In that case, thinking of v and w as vectors, v-w is actually a green point, since the difference one shifts them by must be one of the obvious directions of the parallelogram.  That means the point (v-w)/2 is a blue point, and it's not the origin - almost there!  But is it a blue point inside the big circle?  Yes!  Since the circle is nicely symmetric about the origin, the point -w is also in the circle, and (v-w)/2=\frac{v+(-w)}{2} is the midpoint of the line segment connecting v and -w, both points in the big circle - so it's in the big circle, and we have found a blue point other than the origin in the blue circle. 

Hooray! Whew!

v=point((Integer(u)+1,-1),pointsize=20,rgbcolor=(0,0,0)) w=point((-Integer(u)+1,1),pointsize=20,rgbcolor=(0,0,0)) z=point((Integer(u)-1,-1),pointsize=20,rgbcolor=(0,0,0)) plot5=line([[Integer(u)+1,-1],[Integer(u)-1,-1]],rgbcolor=(0,0,0)) plot6=point((Integer(u),-1),pointsize=20) show(plot1+plot2+plot3+plot4+v+w+z+plot5+plot6+plot_grid_pts+plot_lattice_pts+plot_lattice_pts2, figsize = [5,5], xmin = -viewsize/2, xmax = viewsize, ymin = -viewsize/2, ymax = viewsize/2, aspect_ratio=1) 
       

I have two more handouts with interesting proofs.  First, a proof by Euler that these primes are writeable in only one way as a sum of squares, up to reordering or making some numbers negative - which is nonobvious!  Second, a completely different proof of our main theorem, and the only one I have ever seen that does not require square roots of -1 modulo p.

Finally, just to remind you - see how square roots of negative one show up here?  So it is no surprise that last time we saw that "prime numbers" in the set

\{a+bi\mid a,b\in\mathbb{Z}\}\, ,
defined with the integer square root of negative one, are closely related.  In fact, on page 200 in your book, they use this to get the precise enumeration of how many representations there are of a number as a sum of two squares - amazing!  To a large extent, all this is possible because the Bezout identity and the Euclidean algorithm hold in these Gaussian integers, namely: 

If g and h are 'relatively prime' Gaussian integers, then there are other such integers x and y such that gx+hy=1.  

Unfortunately, Sage does not have this implemented :(

gcd(2,3+3*i) 
       
Traceback (click to the left of this block for traceback)
...
TypeError: unsupported operand parent(s) for 'xgcd': 'Integer Ring' and
'Symbolic Ring'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/karl-dietercrisman/.sage/sage_notebook/worksheets/admin/69/code/5.py", line 7, in <module>
    xgcd(_sage_const_2 ,_sage_const_3 +_sage_const_3 *i)
  File "/Users/karl-dietercrisman/Desktop/sage-3.4/local/lib/python2.5/site-packages/SQLAlchemy-0.4.6-py2.5.egg/", line 1, in <module>
    
  File "/Users/karl-dietercrisman/Desktop/sage-3.4/local/lib/python2.5/site-packages/sage/rings/arith.py", line 1434, in xgcd
    return a.xgcd(b)
  File "element.pyx", line 1951, in sage.structure.element.PrincipalIdealDomainElement.xgcd (sage/structure/element.c:11756)
  File "coerce.pyx", line 740, in sage.structure.coerce.CoercionModel_cache_maps.bin_op (sage/structure/coerce.c:5848)
TypeError: unsupported operand parent(s) for 'xgcd': 'Integer Ring' and 'Symbolic Ring'

More work for me in the future to make it better for the next generation of Gordon number theorists!

The homework, which is last time's plus some new ones:

  1. Read Section 10.3 in the book; the notation S_3 describes the set of integers writeable as a sum of three squares, as S_2 might indicate the set of things writeable as the sum of two squares.  Do Exercises 10.9-10.12.
  2. Find as many integers n as possible which are only writeable as a sum of squares via n=a^2+a^2=2a^2, i.e. n is not writeable as a sum of distinct squares but is writeable as a sum of squares (obviously these will be twice a square, but which ones?).
  3. Verify Lemma 10.1 by hand (i.e. write all the algebra out).
  4. Prove there are exactly four ways to write any power of two as a sum of squares.  We give this function the name r_2 (n), so for instance r_2(2)=4 because (-1,1),(-1,-1),(1,1),(1,-1) are the four pairs which work, and you are to prove r_2(2^n)=4 for all n.
  5. Pick four random (to you) three digit numbers and decide, whether by brute force or by something we proved in class today (if we do) whether they are a sum of two squares.
  6. Show a positive integer k is the difference of two squares if and only if k\not \equiv 2 mod (4).  (Hint: Our Fermat stuff and why it ends...)
  7. Look up the concepts of "Gaussian moat" and "Gaussian zoo" online and see what you think!
  8. Prove that if n\equiv 12 mod (16), show that n cannot be written as a sum of two squares.
  9. Is there any congruence condition modulo 6 for which a number cannot be written as a sum of two squares?
  10. Read through the end of Section 10.2 and look especially at Examples 10.4 and 10.8, and then do Exercises 10.2, 10.3, 10.6, and 10.7.
  11. Check that the pictures you get from some other primes with these lattices really work.  Yes, this is vague, but it won't be handed in - it just will deepen your understanding of how geometry and numbers intersect.
  12. Check every piece of the Zagier proof:
    • The set S is finite.  Try figuring out what S is for p=5 or p=13, the smallest such primes.
    • Each (x,y,z) has exactly one of the three things to go to.
    • If you take the output and apply the function again, you get (x,y,z) back (this is a little tougher).
    • If (x,y,z) goes to (x,y,z) then it turns out that (x,y,z)=(1,1,\frac{p-1}{4}) (you will probably need to use the definition of S for this, and remember that we assume p\equiv 1 mod (4)).
    • That if the map (x,y,z)\to (x,z,y) has a point which is fixed (the output is same as input) then this, combined with the definition of S, means that p is writeable as the sum of two squares.